Mariana Ramos Franco
Mariana is a Software Engineer with more than 12 years of experience in the development of web applications and highly scalable distributed systems. Currently, she works in the Amazon Managed Service for Prometheus team out of the beautiful city of Vancouver/Canada. Prior to this, Mariana worked on other AWS services such as Amazon Route 53 and Amazon RDS. She also spent 5 years at IBM Software Lab in Brazil. She holds a MS in Computer Engineering from University of São Paulo.
In this talk we will show you how we speed up Cortex deployments at scale, using zone-aware Kubernetes controllers.
Kubernetes allow pods to be spread across different zones through topology constraints but these are not taken into consideration during rollout updates, or on pod disruption budgets. For instance, it's recommended to replicate Cortex's ingesters across different zones for high availability, allowing for the system to continue to work in the event of a zone outage. However, the lack of zone aware deployments support forces Cortex operators to allow just a single container to be updated at once, causing long deployments and impacting the velocity in which nodes can be upgraded.
To bypass these limitations, the Amazon Managed Service for Prometheus team released a couple of k8s controllers for zone aware rollouts and disruptions that can be used by any high available quorum-base distributed application, such as Cortex, to improve the velocity of deployments in a safe way.