Ankur Rawal

CTO @ Zenduty, Reliability Advocate Everywhere Else.

Helping fast moving orgs minimise business impacting downtime around the world.
Love talking about observability and reliability at incremental scale and novel use cases for modern tech innovations.
Outside of work, you can find me on road trips, discovering new cuisines and photographing wildlife.

The speaker's profile picture


How Thanos Almost Snapped $100,000 from our Infra Budget
Deepak Kumar, Shubham Srivastava, Vishwa Krishnakumar, Ankur Rawal

In a galaxy not so far away, where data is as vast as the cosmos, our team was troubled with observability data chaos.
Seeking some clarity, we sought salvation with Thanos and Fluentbit – fabled titans against our metric storage and logging issues.
Thanos empowered us with a Prometheus setup with high availability and virtually infinite historical data storage. Prometheus ascended to new heights, flawlessly scaling horizontally while Thanos Compactor's downsampling abilities promised faster results for querying older data.
Fluentbit made collecting, filtering, and outputting logs across multiple sources and destinations effortless.

But, little did we know that even the most powerful tools, when not wielded correctly could be double-edged Infinity Stones.

Join us on a thrilling tale of blunders as we recount some missteps in configuring these tools, easily missed caveats in data downsampling and log storage, and how the pursuit of seamless data handling almost cost us over $100,000.