Gitpod, a prominent platform in the realm of cloud development environments, recently made a significant shift, opting to move away from using Kubernetes after relying on it for six years. When Gitpod initially adopted Kubernetes, the platform offered invaluable benefits such as advanced scalability, effective container orchestration, and a rich ecosystem that perfectly suited their infrastructure management needs. These attributes significantly contributed to Gitpod’s early success. However, scaling to accommodate development environments for a staggering 1.5 million users brought forth a plethora of challenges that Kubernetes was not ideally equipped to handle in the context of development.
Unique Challenges with Kubernetes for Development
Development environments inherently differ from typical production workloads. They are fundamentally stateful and interactive, marked by unpredictable resource usage, and require extensive permissions and capabilities such as root access and package installations. These unique aspects of development posed substantial hurdles in resource management, security, and storage performance when operating on Kubernetes.
Security and state management emerged as major pain points. With the highly variable CPU requirements needed for development tasks, Gitpod faced difficulties in effectively predicting and allocating CPU time. The platform experimented with various CPU scheduling and prioritization methods in efforts to tackle this issue. Storage performance presented further complications, leading Gitpod to trial different configurations, including SSD RAID 0, block storage, and Persistent Volume Claims (PVCs). Each configuration brought its own set of benefits and drawbacks concerning performance, reliability, and flexibility. Additionally, the process of backing up and restoring local disks proved to be resource-intensive, necessitating a careful balancing act of Input/Output (I/O), network bandwidth, and CPU usage.
Efforts to Optimize Kubernetes
In their pursuit to optimize autoscaling and reduce startup times, Gitpod experimented with several strategies such as ghost workspaces, ballast pods, and custom cluster-autoscaler plugins. To speed up the process of image pulls, they tested techniques like daemonset pre-pulls, maximizing layer reuse, and utilizing pre-baked images. The complexities of networking further compounded the issues, with access control and bandwidth sharing posing significant challenges.
Despite all these efforts, Kubernetes remained a cumbersome solution for the specific needs of Gitpod’s development environments. Recognizing the limitations, Gitpod began exploring micro-VM technologies, including Firecracker, Cloud Hypervisor, and QEMU. These options promised better resource isolation and enhanced security but introduced new challenges like overhead and complexity in image conversion.
The Birth of Gitpod Flex
Ultimately, Gitpod concluded that while Kubernetes could technically meet their requirements, the trade-offs in security and operational overhead were too significant to ignore. This realization led to the birth of Gitpod Flex, a new architecture designed specifically to overcome the limitations encountered with Kubernetes while enhancing security and simplifying infrastructure management. Gitpod Flex retains core principles of Kubernetes such as control theory and declarative APIs but introduces abstraction layers tailored to development environments.
One of the standout features of Gitpod Flex is its capability to integrate with devcontainers while allowing development environments to run on desktop machines. This not only streamlines the infrastructure but also provides enhanced flexibility and security. Additionally, Gitpod Flex supports quick, self-hosted deployments across multiple regions, giving organizations greater control over compliance and organizational boundaries.
Conclusion: A New Path Forward
Gitpod, a leading platform in cloud development environments, recently made a pivotal decision to move away from using Kubernetes after depending on it for six years. Originally, Kubernetes provided Gitpod with key advantages like advanced scalability, efficient container orchestration, and a robust ecosystem that aligned perfectly with their infrastructure management needs. These features were integral to Gitpod’s initial success. However, as the platform grew and had to scale to support development environments for an enormous user base of 1.5 million developers, numerous challenges emerged. It became clear that Kubernetes was not ideally equipped to address the unique issues associated with such extensive scaling in the context of development environments. Consequently, Gitpod decided to transition away from Kubernetes, seeking alternative solutions better suited to their evolving requirements and the demands of their expanding user base. This shift marks a significant evolution in Gitpod’s approach to managing cloud-based development environments.