Categories
#rstats Data analysis ruminations Software Work

Same Developer, New Stack

I’ve been fortunate to work with and on open-source software this year. That has been the case for most of a decade: I began using R in 2014. I hit a few milestones this summer that got me thinking about my OSS journey.

I became a committer on the Apache Superset project. I’ve written previously about deploying Superset at work as the City of Ann Arbor’s data visualization platform. The codebase (Python and JavaScript) was totally new to me but I’ve been active in the community and helped update documentation.

Those contributions were sufficient to get me voted in as a committer on the project. It’s a nice recognition and vote of confidence but more importantly gives me tools to have a greater impact. And I’m taking baby steps toward learning Superset’s backend. Yesterday I made my first contribution to the codebase, fixing a small bug just in time for the next major release.

Superset has great momentum and a pleasant and involved (and growing!) community. It’s a great piece of software to use daily and I look forward to being a part of the project for the foreseeable future.

I used pyjanitor for the first time today. I had known of pyjanitor‘s existence for years but only from afar. It started off as a Python port of my janitor R package, then grew to encompass other functionality. My janitor is written for beginners, and that came full circle today as I, a true Python beginner, used pyjanitor to wrangle some data. That was satisfying, though I’m such a Python rookie that I struggled to import the dang package.

Categories
Data analysis Local reporting Software Work

Making the Switch to Apache Superset

This is the story of how the City of Ann Arbor adopted Apache Superset as its business intelligence (BI) platform. Superset has been a superior product for both creators and consumers of our data dashboards and saves us 94% in costs compared to our prior solution.

Background

As the City of Ann Arbor’s data analyst, I spend a lot of time building charts and dashboards in our business intelligence / data visualization platform. When I started the job in 2021, we were halfway through a contract and I used that existing software as I completed my initial data reporting projects.

After using it for a year, I was feeling its pain points. Building dashboards was a cumbersome and finicky process and my customers wanted more flexible and aesthetically-pleasing results. I began searching for something better.

Being a government entity makes software procurement tricky – we can’t just shop and buy. Our prior BI platform was obtained via a long Request for Proposals (RFP) process. This time I wanted to try out products to make sure they would perform as expected. Will it work with our data warehouse? Can we embed charts in our public-facing webpages?

The desire to try before buying led me to consider open-source options as well as products that we already had access to through existing contracts (i.e., Microsoft Power BI).

Categories
Software

AntennaPod: the open-source podcast app

I still like the idea of spotlighting open-source products that deliver a superior experience while operating under a model that benefits users and society. Last month I wrote about gathio, the event planning site. You can find my musings about FOSS (free, open-source software) in that post. This one will be shorter.

The obvious choice for today would be to write about Mastodon, the decentralized open-source alternative to Twitter. I’m active on the server for Washtenaw County and I support the project on Patreon. However, a good look at the project and its features would take more time than I can muster at present.

But I got this post idea from Masto. Someone asked for recommendations for a podcast app. And as I recommended the lovely AntennaPod to yet another person, I realized I could plug it here too.

I’ve been using AntennaPod for almost a decade, since its early days. It was decent even as it was getting built out, but in the past few years it has stabilized as feature-complete and rock solid.

AntennaPod has all the features I could want in a podcast player. It’s easy to use. And it doesn’t track what I listen to or serve me ads. Period.

It’s free to use. If you try to contribute to support the project, you’ll see a slew of non-monetary options. Should you manage to find the small link to donate money, you’ll be deterred by a popup suggesting you oughtn’t:

Classy <3

So I’ll continue contributing my time and money to other open-source projects while being grateful to the folks who keep AntennaPod humming. I highly recommend it as the app to enjoy podcasts without being surveilled and/or advertised to. It’s available only for Android, not iOS.

Categories
How-to

Replace Evite and Facebook with gathio

Tl;dr – check out gath.io for making chill, inclusive, not-creepy event pages. Unlike Evite, It won’t track you or serve you Bitcoin ads.


It amazes me how a free, open-source program can outperform its proprietary, commercialized equivalents. An obvious one is R, the statistical programming language. It blows away competitors like SPSS. R is a huge project, but some great open-source projects can surpass commercial competition while remaining a single person’s side project.

It touches my heart that people build great things together, transparently, and then make them freely available. I’ve long meant to write posts where I shout out a free, open-source software (FOSS) that has improved my life materially or spiritually. I was finally spurred to write when I got an Evite yesterday, for a 7-year-old’s birthday party. I opened the link on my phone and saw:

barf

Evite has always had annoying ads and links, but this took it to the next level. I buy as little as possible from Amazon. Amazon’s bad enough. But Bitcoin?? It’s a Ponzi scheme that lures in unsuspecting saps (see the Citations Needed episode on manipulative Bitcoin/crypto/NFT advertising) and sows remarkable environmental destruction. Happy birthday, kid, here’s 0.0005 Bitcoin. Good luck spending it. (Web3 Is Going Great has you covered for crypto realism and schadenfreude).

These ads put me over the edge, but I’ve disliked Evite for years. In particular, it’s creepy that the organizer can track who has opened and viewed the invite.

And then there’s Facebook events. Because I’m not on Facebook, I sometimes forget how many events are organized there. Until someone sends me one I want to attend and I’m unable to view the info or RSVP. Argh!

Why must we engage with platforms that track us, shove ads in our faces, and sell our data in order to organize a dang birthday party or seed-swap?? Well, someone else felt the same way and did something about it. Enter: gathio!

Categories
#rstats Data analysis ruminations Work

Reflections on five years of the janitor R package

One thing led to another. In early 2016, I was participating in discussions on the Twitter hashtag, a community for users of the R programming language. There, Andrew Martin and I met and realized we were both R users working in K-12 education. That chance interaction led to me attending a meeting of education data users that April in NYC.

Going through security at LaGuardia for my return flight, I chatted with Chris Haid about data science and R. Chris affirmed that I’d earned the right to call myself a “data scientist.” He also suggested that writing an R package wasn’t anything especially difficult.

My plane home that night was hours late. Fired up and with unexpected free time on my hands, I took a few little helper functions I’d written for data cleaning in R and made my initial commits in assembling them into my first software package, janitor, following Hilary Parker’s how-to guide.

That October, the janitor package was accepted to CRAN, the official public repository of R packages. I celebrated and set a goal of someday attaining 10,000 downloads.

Yesterday janitor logged its one millionth download, wildly exceeding my expectations. I thought I’d take this occasion to crunch some usage numbers and write some reflections. This post is sort of a baby book for the project, almost five years in.

By The Numbers

This chart shows daily downloads since the package’s first CRAN release. The upper line (red) is weekdays, the lower line (green) is weekends. Each vertical line represents a new version published on CRAN.

From the very beginning I was excited to have users, but this chart makes that exciting early usage seem miniscule. janitor’s most substantive updates were published in March 2018, April 2019, and April 2020, with it feeling more done each time, but most user adoption has occurred more recently than that. I guess I didn’t have to worry so much about breaking changes.

Another way to look at the growth is year-over-year downloads:

YearDownloadsRatio vs. Prior Year
2016-1713,284
2017-1847,3043.56x
2018-19161,4113.41x
2019-20397,3902.46x
2020-21 (~5 months)383,5956
Download counts are from the RStudio mirror, which does not represent all R user activity. That said, it’s the only available count and the standard measure of usage.
Categories
#rstats Making Work

That feeling when your first user opens an issue

You know how new businesses frame the first dollar they earn?

I wrote an R package that interfaces with the SurveyMonkey API. I worked hard on it, on and off the clock, and it has a few subtle features of which I’m quite proud. It’s paying off, as my colleagues at TNTP have been using it to fetch and analyze their survey results.

The company and I open-sourced the project, deciding that if we have already invested the work, others might as well benefit. And maybe some indirect benefits will accrue to the company as a result. I made the package repository public, advertised it in a few places, then waited. Like a new store opening its doors and waiting for that first customer.

They showed up on Friday! With the project’s first GitHub star and a bug report that was good enough for me to quickly patch the problem. Others may have already been quietly using the package, but this was the first confirmed proof of use. It’s a great feeling as an open-source developer wondering, “I built it: will they come?”

Consider this blog post to be me framing that dollar.