Categories
Data analysis Local reporting Software Work

Making the Switch to Apache Superset

This is the story of how the City of Ann Arbor adopted Apache Superset as its business intelligence (BI) platform. Superset has been a superior product for both creators and consumers of our data dashboards and saves us 94% in costs compared to our prior solution.

Background

As the City of Ann Arbor’s data analyst, I spend a lot of time building charts and dashboards in our business intelligence / data visualization platform. When I started the job in 2021, we were halfway through a contract and I used that existing software as I completed my initial data reporting projects.

After using it for a year, I was feeling its pain points. Building dashboards was a cumbersome and finicky process and my customers wanted more flexible and aesthetically-pleasing results. I began searching for something better.

Being a government entity makes software procurement tricky – we can’t just shop and buy. Our prior BI platform was obtained via a long Request for Proposals (RFP) process. This time I wanted to try out products to make sure they would perform as expected. Will it work with our data warehouse? Can we embed charts in our public-facing webpages?

The desire to try before buying led me to consider open-source options as well as products that we already had access to through existing contracts (i.e., Microsoft Power BI).

Product Comparison

We tried out three platforms: Metabase, Superset, and Power BI. I installed an instance of each, getting a sense of what it would take to maintain them, then fed them data and built copies of existing production dashboards.

It wasn’t close. Superset met our needs better than Metabase, which in turn was better than Power BI. I’d summarize the comparison as:

  • Superset: most powerful, best-looking, and easiest to use. The downside is the steep learning curve to install and maintain it.
  • Metabase: easy to use and configure but lacked features we required. It looks like some of those (tabbed dashboards, single-select filters) have since been added.
  • Power BI: surprisingly poor experience for both dashboard authors and readers. Clunky UI, limited features. E.g., we couldn’t make a tabbed dashboard, with filters, that included charts from different datasets. Has lots of data transformation features that the others don’t, but I want that to happen outside of the BI platform.

Superset has two other key advantages I failed to fully appreciate at first:

  • It’s highly-polished with great default design choices. Maybe this shouldn’t affect how much a user trusts or wants to use a dashboard, but it does. Our end users often remark on how good Superset looks.
  • As an Apache Foundation project, it’s democratically-governed and fully open-source. This leads to greater speed and quality of development as anyone can file bug reports and contribute new features back to the main source code.

Preset, a company founded by the creator of Superset, sells hosted Superset-as-a-Service and is the main driver of Superset’s development. But major contributions come from outside of Preset, e.g., features to download a data table in Excel format and to tag dashboards recently came from other Superset-using organizations.

Metabase is also open-source, but it lacks the same governance model and some of its best features are behind a paywall. For instance, we rely on Superset’s single sign-on (SSO) integration with Azure Active Directory. Metabase offers this feature, but only for paid license tiers.

Implementation

After determining that Superset was our best option, I started learning how to host and configure it. That ended up being the best professional development I’ve had in years (thanks to Zach Steindler for the coaching and encouragement!). And learning new tech – in particular Docker, which feels like magic – unlocked the ability to run other modern software that uses a similar stack.

After we got up and running, the last remaining task was to rebuild in Superset all of our existing dashboards and reports. This was tedious, though we sure got familiar with Superset during that process. One saving grace was that we’d kept most of our data processing outside of our prior BI platform. So at least our data was already tidy and in the same warehouse, ready for charting.

Impact

Switching to Superset has made me happier. When I’m wearing my data analyst hat, it’s is a pleasure to create in. When I’m wearing my DevOps hat and maintaining Superset, I learn new technical skills. And Superset has a welcoming community of users who help each other and improve the product.

Early in my tenure in this role I was musing to a data scientist friend that I thought I could replace our BI platform with an open-source alternative. He said, “yeah, but do you want to be responsible for maintaining that?” It turns out, I do!

It’s also been a win for our internal customers (city staff) and the public, who benefit from a high-quality data reporting tool. Here are some of the public-facing dashboards I’ve built in Superset:

(These are designed for the PC or tablet and may look scrunched on a mobile phone. Superset currently lacks a high-quality mobile experience).

Lastly, it’s saved the city a lot of money. Our prior BI platform charged around $45k/year for 50 licenses. And it was self-hosted, so add on the material and labor costs for that. Superset costs $0 for unlimited licenses (which is good as we’re already exceeded 50 users within the city). Our only cost is the cloud computing environment Superset runs in, on the order of a few thousand dollars annually. Assuming a similar cost for on-premises self-hosting, going from $48k/year -> $3k/year is about a 94% savings!

That said, Superset would be worth it at the same price as our old product. And for those considering Superset whose teams lack the experience or time to self-host — it’s not easy — there’s Preset, which offers a turnkey Superset experience at a quarter of the cost we were paying for our prior product.

Looking Ahead

I think there’s a good chance that Superset/Preset becomes the dominant product in the BI platform marketplace. I haven’t used Looker or Tableau, the top-tier and most-expensive solutions, but given how good Superset is already and how fast it improves, Superset poses a serious threat to their dominance.

I’d particularly like to see other cities and government agencies adopt Superset, given that self-hosting provides a way to demo its features and maintenance requirements while complying with procurement rules and tight budgets. If that’s you and you could use advice or encouragement, drop me a line!

4 replies on “Making the Switch to Apache Superset”

Thanks for this! I have a couple questions:
1) how well does this work with 3rd party data vendors? I know AAPD recently created a data dashboard, but I don’t think it was with this(?). I think part of the complication there was that the data management is largely run through a 3rd party private company the city contracts with, CLEMIS

2) is there a feature wherein the staff could be downloaded, for example as a csv or html? Maybe that’s already the case and I just couldn’t figure out how?

Hi Kevin!

1) We wrangle lots of data out of 3rd party platforms and into our data warehouse, at which point we can report on it with Superset. E.g., we extract air quality data for the above-linked dashboard from the AQMesh (vendor) API, process it, and then report on it. FYI to other readers, Superset is a thin layer over the data warehouse – I do almost all of the data manipulation outside of Superset and then it sends SQL queries to our data warehouse.

You’re correct re: the AAPD dashboard, it uses a different platform. My understanding is that a vendor has built a reporting solution specifically tailored to the way that data is stored in CLEMIS and offers that as an efficient way for the many member agencies of CLEMIS to report on their data.

2) Yes, the data underlying a chart can be downloaded as a .csv. I put instructions in the top panel of the rain gauges dashboard (https://analytics.a2gov.org/superset/dashboard/rain-gauges/?standalone=2), can you try following that and see if it works for you?

A2 Analytics has also started adding some of these datasets to the city’s Data Catalog (https://www.a2gov.org/services/data/Pages/default.aspx) – the two bottom-most entries are dumps of the raw data underlying the air quality and rain gauge dashboards.

Thanks! The three dots worked on my desktop. I couldn’t see the download option on my phone when I tried earlier (but maybe I missed it?)

Leave a Reply