Yesterday, Arsh Khandelwal and I talked at LinkedIn HQ about Orb's technical investment around Orb's alerting features at the scale of 1M+ events/sec. This is an incredibly important feature for Orb's customers to provide timely notifications to *their* customers on hitting a spend cap or usage limit → your customers don't like surprise overages, you don't want to swallow spillover infra costs for excess use.
What makes implementing real-time alerting for billing hard? Why isn't this a solved problem a la Datadog?
A preview of what's tricky:
- Flexibility: Orb is the only billing system that lets you configure your billing metrics with SQL. This makes computing incremental query results significantly harder; traditional stream processing approaches don't work out of the box. Approximates aren't good enough... and remember that the number of groups explodes here quickly since each customer on each timezone has a different timeframe you're evaluating.
- Business complexity: usually, your customers want to get alerted on accrued spend across all metrics they're subscribed to. You'll need to factor in a combination of credit burndown for some metrics, rollovers, minimums, tiered pricing, etc. This is a lot of domain data to load in a perf-critical path. Billing doesn't operate on a single p x q anymore.
- Varying requirements: You might want to alert on a subset of self-serve, high risk customers with a much higher SLO than your trusted enterprise accounts. Being able to fast-lane some customers is critical.