Cloud Native Rejekts EU (Valencia) 2022
An intro to Rejekts
At the beginning of 2019, Chris Nuland and his team were tasked with migrating a large Mesosphere DC/OS cluster with hundreds of running containers to Kubernetes for a Fortune 500 healthcare company. The team needed to finish it within 7 months to allow the sunsetting of DC/OS before the cluster’s end of life. They also needed to containerize a couple hundred applications and deploy them into the newly built cluster during the migration. Fortunately, they met the deadline.
Now, they’re sharing the technical and onboarding challenges they faced during the migration, how they scaled each migration path, and how the process would look different today given many of the migration tooling advantages found in the Konveyor Community open source tool suite. Attendees will learn both tools and tips they can use for their own migration projects.
Getting availability metrics is easy: probe your service or calculate the ratio of failed/successful requests. These approaches are fine, but don't necessarily reflect the user experience. However, user experience is exactly what we want to represent with our metrics. Inspired by Google's meaningful availability paper, Microsoft and SAP collaboratively implemented and open-sourced the connectivity-monitor project to do just that. We expose meaningful availability metrics for the managed K8s api server endpoints. Leveraging the power of eBPF, we capture the relevant network traffic, parse the SNI of the TLS handshake to identify which Kubernetes cluster is being connected to, and assess the encrypted TCP connection to determine if it succeeded or failed. The connectivity exporter "annotates" time and exposes failed/successful seconds as counter metrics for Prometheus to scrape without losing the 1s granularity. All of this with minimal overhead, thanks to eBPF!
There are over 20,500 open source projects in GitHub that are tagged by topic as focused on Kubernetes. 92,000+ repositories mention the word Kubernetes in their repository's "About" paragraph signaling some level of integration. How does one of these projects validate that integration at the feature level? or application? What about simply creating releases for consumption?
This session will outline mistakes, successes, and incurred technical debt while implementing a CI/CD process geared towards design verification and a release strategy. This case study will cover the pros and cons and give an in depth analysis on the "why" for decisions made in this project. This serves as a reference for things to consider (and to avoid) while creating Kubernetes-based projects.
Topics covered will include:
- obstacles and limitations on CI/CD design
- extreme code (actually process) reuse
- reproducibility, verification, and automation
- resource and scheduling impacts
Have you ever encountered missing or incorrect security policies on your k8s cluster?
Maybe you found yourself in a k8s resources chaos where you don't know which resource is created by who?
Maybe you forgot to set some key attributes on your k8s cluster.
With PodSecurityPolicy deprecation, OPA Gatekeeper has become one the most popular alternative as a Policy Controller.
Until recently, it enabled us to validate incoming resources, audit the existing policy violations, and reject nonconformant ones based on user-defined policies present as CRDs. This is great but still left the burden of updating the faulty resources manually. With the new mutation feature, updating nonconformant resources can be automated with customizable mutation policies like "Setting security context of a specific container in a Pod in a namespace to be non-privileged" etc.
In this talk, Harshita will share her experiments with OPA Gatekeeper Mutation policies and lessons learned in developing a k8s native solution to completely automate and simplify policy enforcement across a cluster stack using OPA Gatekeeper.
Secrets. Security best-practices mandate that they stay away from the code—or else! And that’s what we did for a long time.
But as CI/CD practices evolved, for a myriad of reasons, we now want to ship the code, the environment, and the secrets, all in one lump. So we can’t hide the secrets anymore… unless?
Tools like HashiCorp Vault attempt to address this by managing secrets outside the delivery chain. Great! But you can’t use those inside local dev environments, so… When that’s exactly what you need to do, then what?
In this talk, Lian will show the audience how to manage secrets the GitOps way, so you can maintain security best-practices while also being able to use them in your local environment for development. Sound like magic? That’s because it is!
After this talk, the audience will be able to understand secret management solutions that work seamlessly in a variety of environments.
All Kubernetes projects need to define the APIs for CRDs and the lifecycle of each API generally starts with the alpha version. The API definition evolves over time and eventually moves to a stable version. But this evolution leads to multiple releases and each release should provide support for handling multiple API versions.
API changes between versions can include addition, deletion and renaming of fields in CRDs, but while introducing these changes we need to ensure stability and backward compatibility. To support multiple versions simultaneously, we write conversion functions that are used to convert API objects from one version to another. We have auto generating tools that help to write conversions but still we need manual intervention in many cases. There is a lack of resources on this topic so a lot of developers face difficulties when dealing with conversions, especially those who are newer to this ecosystem. Due to this, API changes become a complicated and time consuming experience which makes it a prominent issue for the majority of contributors because APIs and conversions are part of almost all k8s projects.
This talk aims to explain the concepts behind conversion functions and demonstrate how to write conversions for k8s CRDs.
In this talk Frederic will walk through the design decisions of arcticDB, the database used for storing continuous profiling data as part of the Parca project. ArcticDB is an embedded database building on Apache Parquet and Apache Arrow.
Frederic will walk through the use cases of arcticDB as well as an example of how it could be used for other Observability workloads.
Frederic will end with a outlook on where arcticDB is headed and finally how attendees can contribute to that future!
Deployment of Machine Learning (ML) to production is notoriously difficult, made so by variations in models, engines, platforms, and networks. How can we deploy distributed ML in production across dissimilar devices from edge to cloud, make optimal use of available resources, and support practical considerations like blue/green testing, privacy preservation, and live updates?
In this talk, learn how to meet these challenges with wasmCloud, the distributed WebAssembly platform for portable business logic. Discover how you can make use of the open source machine learning capability provider with the open WASI-NN api to deploy a common code base, for use with inference engines like Tensorflow or ONNX, on embedded devices, LAN workstations, and the cloud. We will discuss how inference models can be dynamically and securely updated in the field, and discuss design decisions that have a direct impact on privacy, latency, throughput, and model accuracy.
Helm is a truly excellent ecosystem and is rightly valued by the world over for giving full customisation of deployments. For open-source projects with a finite number of support engineers, full customisation is not always something that is desirable. Sometimes, you need to provide opinionated guide rails for people in order to provide effective support for your product.
This session will focus on the reasons why Gitpod has deprecated its Helm charts and switched to a custom-built Installer. Simon will explore some of the benefits and pitfalls experienced and how the community reacted to such a seismic change. He will also answer the question - "would he do it again?"
Do you know how much your workloads cost? Are you worried about your underutilisation resources? Do you have a tag allocation strategy set in place? Would you like to have a fairly approximate cost report on real time from your Kubernetes resources? Could your teams have control and visibility of their Kubernetes resources cost?
The motivation of this talk is to show how the action of our workloads have an impact in the final invoice of our cloud provider and how can we get visibility and have decisions based on metrics. Embrace Cloud FinOps culture in your company highlighting the sustainability
Everyone has heard about supply chain security in the last year. The Solarwinds hack and President Biden's Cybersecurity Executive order have forced the industry to start taking it seriously. This has resulted in the emergence of credible solutions for addressing provenance concerns in Cloud Native platforms.
This session will begin with an overview of the issues and why they're important, before moving onto look at how we can use tooling to begin addressing them. In particular, we will look at using Sigstore to add provenance data to a container image and Kyverno to verify the data in a Kubernetes cluster.
Finally, we will end with a look at what still needs to be done to truly address our supply chain security issues.
Over two years ago, we introduced the Flatcar project to the Cloud Native Rejekts community in San Diego. A lightweight Linux built specifically for running container workloads, Flatcar builds on the incredibly successful foundation laid by CoreOS Container Linux for enabling security and manageability of container-based distributed systems at scale.
A lot has happened in the meantime, including the end-of-life of the original CoreOS, rapid growth in the Flatcar user community, and not least the acquisition by Microsoft of Kinvolk, the company behind Flatcar.
In this talk, we will hear directly from both the product and engineering managers responsible for the Flatcar project, about the past, present and future of this foundational project that is still highly relevant to many in the cloud native community.
Media Streaming Mesh is a new open-source project which enables real-time media applications to be first class citizens in cloud-native environments.
This talk introduces Horizontal Pod Autoscaler based open source framework Buildscaler which provides seamless CI autoscaling for any build agent (Buildkite, CircleCI, etc) and any compute shape (x86, ARM, Mac). We will also share lessons learnt from running Buildscaler in production for 2+ years.
While Kubernetes has become a de facto standard for running the Cloud Native workloads, the platform on which Kubernetes runs remains pretty diverse. There are several projects that have come up to solve the challenges around managing the Kubernetes Lifecycle Management, with Cluster API becoming a standard way to tackle the problem.
They will walk through the challenges around Kubernetes Lifecycle management from Day-0 to Day-n and deep dive into the Cluster API projects that tackle them in a declarative way. As an example, we will demonstrate how to use CAPI providers to manage Kubernetes Clusters on AWS/Azure.
Ever wondered how remove worker nodes feel when the are far from the control plane and many times not even connected? We want to share some real world tips for managing far Edge deployments without dying trying.
Kubernetes authentication is difficult for admins to configure. With Pinniped, we sought to make the process easy and secure by abstracting away much of the complexity. In this talk you will learn tips and tricks that we used to make our users lives easier. Come learn the extension points that make authentication easier for users.
Online IDE improves learning outcomes for programming and STEM education. Lab.computer is a SaaS platform for AI teachers and students that offers on-demand Jupyter notebooks with all required packages, data, software and background processes. This enables students and teachers to focus on learning AI concepts and not worry about setup.
This talk describes design goals for Lab.Computer’s Jupyter Notebook as a Service product and outlines QoS metrics needed to provide a good user experience for teachers and students connecting across the globe from US, India, and China. We will then share why we picked Kubernetes as a building block for the platform, and how we architected a multi-region multi-cluster hybrid cloud Kubernetes environment to meet our design goals and customer SLAs.
Have you ever wondered how kube-proxy originated in Kubernetes? Are you familiar with the userspace mode of kube-proxy? Have you thought about what it takes to add a mode to kube-proxy? In this session we will go through the evolution of the kube-proxy, from userspace, to iptables mode to Next-Generation-Kube-Proxy also known as KPNG.
We will dig into the working of userspace mode of kube-proxy and showcase what it takes to add it as a backend to KPNG. Attendees will get to know about the improvements introduced in KPNG over the current implementation of kube-proxy, the algorithms behind an intuitive “user space” proxy, and how to reason about kube-proxy’s logic in any mode, using a generic model. We promise to demonstrate KPNG in userspace mode and compare the performance with kube-proxy in userspace mode and exhibit how it performs better with KPNG.
If you build, maintain, or deploy applications, you probably also work with, or at least encounter, databases. Have you ever tried to troubleshoot a database performance issue in an application that was built using an ORM? Or have you tried to determine which of many microservices was resulting in a problematic query? Database observability is important, but tools and libraries for it have lagged behind other areas of observability. sqlcommenter, which is now part of OpenTelemetry, is an open source library that enables application developers or ORMs to augment SQL statements with comments about the code that caused its execution, making it easier to correlate your application code with SQL statements.
In this session, we will demonstrate how to set up and use sqlcommenter in an application to diagnose query performance, look at frameworks and ORMs that sqlcommenter supports, show how you can comment your SQL statements if you don't use an ORM, and demonstrate how you can view this data in database logs and observability tools.
Infrastructure as code. Network as code. Everything as a code. It looks like everything can be defined as code, versioned and tested automatically. Everything except development environments. The industry hasn’t come up with a file format to define software environments yet.
Red Hat, AWS and JetBrains are introducing the Devfile. The goal is to accelerate and simplify developers' environment setup. Vagrantfiles and Dockerfiles set the path, a decade ago, with file formats defining general purpose computing environments. Devfile wants to be a file format specialized in the definition of software development environments.
Can one bring the open-source style of community inside a company? Yes! It can be done and it should be done.
All enterprises aim to be agile. The chances to have a bunch of people passionate about a specific technology grows with the company size. Often, especially in enterprises, the tech engineers are siloed in their tribes/product centers and can't really collaborate. The "Spotify model" praised by the business is not helping here.
This presentation will cover the bootstrapping of a Kubernetes community in a enterprise. It will showcase a step-by-step framework that can be adapted and replicated for building similar communities. The focus will be on the many benefits it brought to internal engineers, the business and on the impact it can have on the wider open source community.
Kubernetes Operators are more popular than ever, but not all operators are created equal. How do we maximize the value that Operators have promised IT teams and ensure that they can deliver a true "as-a-service" experience? We will present a step-by-step guide on how to raise your Operator's capability level to ensure it can live up to its potential. Participants will be presented with a basic Operator and learn real-world strategies on how to add capabilities including seamless upgrades, offsite backups, rollbacks, and deep insights via metrics, logging, and events as well as intelligent autoscaling.
At previous conferences we had the chance to present our upcoming work about checkpoint and restore in Kubernetes. Now that the corresponding Kubernetes Enhancement Proposal (KEP) has been merged and the first code which enables container checkpointing is available in Kubernetes 1.24 we want to present our next steps concerning checkpoint and restore.
- How can we restore containers in Kubernetes?
- How can we checkpoint and restore pods?
- What is missing to be able to migrate containers from one node to another node?
In contrast to the previous session which focused very much on the technical and historical background we want to use this session to present our ideas about possible next steps using checkpoint and restore.
One of our main goals of this session is to get feedback from the community.
- How is the community using checkpoint support?
- Which of the possible next steps are most important to the community?
- What should we focus on in our future development plans?
Kubernetes is hard to operate in a multi-tenant manner.
As organizations add API's and privileged controllers to their clusters, it becomes infeasible to build
clusters that teams can share with each other safely.
This is a design issue with the way projects extend Kubernetes.
While policy engines like Gatekeeper and Kyverno enable cluster owners to patch over insecure API
surfaces to protect tenants, there are patterns that produce APIs resistant to cross-tenant issues.
It's possible to extend Kubernetes without relying on admission-based policy engines to restrict API
boundaries and controller implementations.
This session will cover the new strategies being used in Flux 2's APIs and controllers that allow for
multiple organizations and teams to work safely together.
Come learn how RBAC, Impersonation, and kubeConfig Secrets allow Flux to safely compose objects
across Namespaces and Clusters!
In this presentation authors will share the experience working with a vast ever changing ecosystem and will demonstrate how important it is to adapt to evolving requirements as the journey progresses.
GitOps and its methodologies help developers automate their Software Development Life Cycle (SDLC) process. The SDLC also includes tasks from Operations Management during runtime. Therefore you need to cover dependencies to other software components, e.g., Data Management Software. Those other components are, in most cases, delivered by third-party providers.
Ideally, third-party software is incorporated into one's development life cycle. However, linking multiple SDLCs creates a new life cycle Management, but this time for IT Architecture.
GitOps conveniently enables you to do so. Using a central and standardized CI/CD pipeline allows you to manage your Application Stack better, including external components. Typical methods such as shifting-left testing or continuous configuration automation accelerate the approach.
This talk will present you with a way to connect different SDLCs, manage your whole application stack, and facilitate collaboration between service providers and developers.
Kubernetes patterns, such as sidecars, are increasingly becoming part of modern software architectures. Writing software with these patterns in place, effectively running it in Kubernetes, is very hard. Gefyra makes this possible while providing infrastructure for debugger capabilities and more.
I sit here and reflect back to 2008 when my supervisor suggested I look into the CCNA and Network+. My world changed from plugging a cable into a switch to setting up BGP peers, to configuring Load Balancers for High Availability. Network Engineering has evolved and from my eyes, has been entirely reimagined, retaining the foundations of networking.
As I've slowly pivoted to the world of Cloud Native technologies and DevOps, a lot of my previous Network Engineer has translated to today's approach to microservices architecture. FOLLOW THE PACKET I SAY!
What does Kubernetes provide that allows us to reduce the complexity of Apache Cassandra while making it better suited for cloud native deployments? That was the question we started with as we began a mission to bring Cassandra closer to Kubernetes and eliminate the redundancy. Many great open source databases have been adapted to run on Kubernetes, without relying on the deep ecosystem of projects that it takes to run in Kubernetes(there is a difference). This talk will discuss the design and implementation of the Astra Serverless Database which re-architected Apache Cassandra to run only on Kubernetes infrastructure. Built to be optimized for multi-tenancy and auto-scaling, we set out with a design goal to completely separate compute and storage. Decoupling different aspects of Cassandra into scaleable services and relying on the benefits of Kubernetes and it's ecosystem created a simpler more powerful database service than a stand alone, bare-metal Cassandra cluster. The entire system is now built on Apache Cassandra, Stargate, Etcd, Prometheus, and object-storage like Minio or Ceph. In this talk we will discuss the downstream changes coming to several open source projects based on the work we have done.
What if development tools, including the IDE and application runtimes could be specified with a declarative syntax? If containers were used as the developers lingua franca and Kubernetes as their platform? Those are the ideas behind DevWorkspaces: containerized development environments running on Kubernetes.
Windows is by far the most used desktop operating system in the world, however when it comes to Cloud Native ecosystem, it is also the least documented. In this session i'll walk through the struggles of a Cloud Native Windows developer and share my experiences as part of documentation teams on how we can help to be more inclusive.
Kubescape is a K8s open-source tool providing a multi-cloud K8s single pane of glass, including risk analysis, security compliance, RBAC visualizer, and image vulnerabilities scanning. Kubescape scans K8s clusters, YAML files, and HELM charts, detecting misconfigurations according to multiple frameworks (such as the NSA-CISA, MITRE ATT&CK®), software vulnerabilities, and RBAC (role-based-access-control) violations at early stages of the CI/CD pipeline, calculates risk score instantly and shows risk trends over time. In the last 6 months, we have scanned over 10K unique clusters and learned a great deal about the state of Kubernetes risk, compliance, and vulnerability. In this session, Shauli Rozen, ARMO CEO & Co-Founder, will share interesting insight on why and where Kubernetes deployments are failing, weak spots, and how to get better. He will reveal interesting statistics on K8s cluster risk score and trends, which controls usually fail, and what kind of vulnerabilities everyone has in their clusters.
Deep Learning (DL) has been successfully applied to many fields, including computer vision, natural language, business, and science. The open-source platforms Ray and Ludwig make DL accessible to diverse users, by reducing the complexity barriers to training, scaling, deploying, and serving DL models. However, DL’s cost and operational overhead present significant challenges. The DL model dev/test/tuning cycle requires intermittent use of substantial GPU resources, which cloud vendors are well-positioned to provide, though at non-trivial prices. Given the expense, managing GPU resources judiciously is critical to the practical use of DL. Nodeless Kubernetes commoditizes compute for Kubernetes clusters. It provisions just-in-time right-sized cost-effective compute for a Kubernetes application when the application starts, and terminates the compute when the application terminates. There are no autoscaling knobs to configure/maintain and no compute shape decisions (e.g., on-demand/spot/CaaS) to be made.
This talk describes running Ray and Ludwig on cloud Kubernetes clusters, using Nodeless K8s as a smart cluster provisioner to add right-sized GPU resources to the K8s cluster when they are needed and to remove them when they are not. Experiments comparing the cost and operational overhead of using Nodeless K8s vs using fixed-size Ray clusters running directly on EC2 show sizable improvements in efficiency and usability, reducing elapsed time by 61%, computing cost by 54%, and idle Ray cluster cost by 66%, while retaining the performance quality of the AutoML results and reducing operational complexity.
A series of exploits and vulnerabilities made everybody aware about the importance of having a Secure Supply Chain story in place.
But how hard is to implement a Secure Supply Chain and, most important of all, how to take advantage of it inside of our Kubernetes clusters?
Moreover, how can we ensure our clusters stay compliant and how can we quickly assess whether we are running workloads that are affected by the latest CVE that has just been announced?
This talk explains how to implement a Secure Supply Chain using Open Source projects, and enforce it in our cluster with an Admission Controller.
In the Kubernetes world, it is a common use case to convert API resources written in Go to YAML manifests for further distribution whether as part of helm chart, kustomize template or other tools. How hard can it be to go the other way around, take a YAML manifest and generate a valid Go code from that? This session looks at Kubernetes codecs, scheme, Go reflections, and Go AST parsers from a little unusual perspective.
It is pretty easy to deploy and run your application container on Kubernetes. All you need is a container on a registry and running a kubectl
command. Kubernetes has a lot of settings and applies some defaults for your deployments. Is it safe to continue with those in terms of application security and reliability? We will discuss this and demonstrate the critical configuration we need to set.
The role of an API gateway in building large-scale, cloud-native Microservices APIs is sometimes important. It provides rich traffic management features such as load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more. An API gateway will introduce these concerns, allowing your Microservices to focus on the business task at hand. A plugin is a heart mechanism in API Gateway by using it, we can create high-performance systems under tight deadlines. In this talk, we will describe how Apache APISIX implemented plugin orchestration. Plugin orchestration is a form of low code that can help enterprises reduce usage costs and increase operation and maintenance efficiency. With the plugin orchestration capability in the low-code API gateway Apache APISIX, we can easily orchestrate 50+ plugins in a “drag-and-drop” way on the UI dashboard.
Minikube is a tool used to easily deploy Kubernetes locally.
Sadly, it comes with an old kernel which does not permit running eBPF code.
This contribution is about bumping minikube kernel to 5.10 and adding the needed options to play with eBPF.