Our experiences, learnings & directions with Kubernetes

Scality’s journey through Kubernetes now spans the better part of four years. Back in 2016 and early 2017, our customers started requesting solutions for integrating data stored on-premises in RING with public cloud services, such as AWS. It quickly became apparent that other cloud services like Azure and Google were also on their radar screens with an eye toward avoiding lock-in to any particular cloud. The central aspect of their request was to provide a way to view and manage data across on-premises data centers in the enterprise and their preferred public clouds. In 2017, we made a strategic decision to launch a new product called Zenko into the market as a multi-cloud data management solution.

So, how does this relate to our journey to Kubernetes? For our flagship product called RING, we had for years depended on traditional Linux software package deployment methods (such as RPM) and automated this with frameworks such as SaltStack. Over time, the effort and cost to maintain this type of packaging increased dramatically due to new features introducing dependencies, aging operating system (OS) distributions (for example, RHEL 6), and occasional conflicts with customer installed packages on the same servers.

New capabilities in RING were also being rapidly introduced in this timeframe, so we started to embrace development using new environments such as NodeJS. This further introduced complex dependencies in traditional software packaging methods. We then decided to take a key step forward—we embraced Docker-style containers to help solve these packaging issues for new services in RING. Along with that, we started to experiment with container orchestration solutions such as Docker Swarm, Mesosphere and, ultimately, Kubernetes.

Zenko was launched with a goal for portable deployment across on-premises and public clouds, so it was our first foray into “cloud-native” offerings that could, in essence, run on physical hardware or in the cloud(s). Zenko is built and deployed as a set of container-based microservices that are deployed and managed on Kubernetes. This type of architecture really exemplifies the future of software, and we were fully embracing it for new product development. This approach started to reap benefits in packaging simplicity and deployment flexibility, for example, for scaling-out services as needed to address larger workloads.

Many of us in the tech industry have now heard of Kubernetes and understand it’s basically responsible for managing resource scheduling across a set of nodes (CPU and memory, for example). Given our experience, we can provide a simple analogy: Kubernetes is like an operating system, but for a cluster of servers like we use for deploying RING. As an OS, Kubernetes is responsible for local resource scheduling for “scarce resources” on a physical node such as CPUs, memory limits, other devices (GPUs, TPUs) — and, very important for a storage system — local storage devices such as hard disks and flash media. Kubernetes is also responsible for external access to applications through networking plugins (CNI) and external storage plugins (CSI).

After several years of real-world experience with Kubernetes, our view is that it provides a fundamentally “simple” architecture thanks to its consistency. In Kubernetes, we always go from a “declarative” model in which you describe desired state and behavior (how things should work), after which various controller loops (translating high-level objects into lower-level ones within the model) and system services (interacting with the real world) are responsible for realizing this. Ultimately, Kubernetes manages and schedules containers to run on physical servers but executes them on a container runtime engine such as Docker, containerd, cri-o, etc. Overall, we believe Kubernetes has a solid and dependable architecture and is a high-quality implementation. As with any software, there are shortcomings and issues that need to be fixed, but Kubernetes has really strong community support that we have engaged with to resolve issues, contribute code and to make it work for our needs.

With a container orchestration system in place, we can further increase automation of operations. Whereas, until recently, many products focused on automating the installation and upgrade of software versions (which in a distributed environment can be non-trivial), Kubernetes and, more importantly, the operator pattern allow us to further codify procedures that were historically often manual and documented rather than automated. Examples include reconfiguration or troubleshooting. By implementing operators for all our software solutions, we can easily automate (and test!) deployment, upgrade and reconfiguration of the software — all in line with Kubernetes standard practices of infrastructure-as-code — as well as detect and handle various runtime events, e.g., a cluster node becoming unreachable.

Our mission is to deliver enterprise-ready storage on-premises. Kubernetes was originally designed for cloud-oriented deployments such as AWS or Google, where all related resources are available “as a service.” Since we decided to use Kubernetes to deploy Zenko on-premises as well, we wanted to ensure we did not push the responsibility of deploying Kubernetes on physical servers to our customers. A “plain” Kubernetes cluster is sufficient for developer purposes, but it’s not quite ready for use in production deployment and it takes effort to bring it to that level. For production deployments, we need things like monitoring capabilities, log management, user authentication and other security concerns, load balancing and many other key services that are not part of Kubernetes’ core. We need to address disconnected “dark site” environments, and we must integrate with the customer’s custom network, user directory services and more — often not supported by existing Kubernetes distributions.

We, therefore, decided to create MetalK8s as an open-source project, an opinionated Kubernetes distribution with a focus on long-term on-prem deployments. As we take the journey more fully into the cloud-native world, MetalK8s provides a solid foundation for our customers. As commercial distributions of Kubernetes emerge, mature and provide these capabilities, we will broaden support to them as well for on-prem deployments. This includes emerging offerings such as VMware Tanzu, Redhat OpenShift, HPE Ezmeral and Rancher OS (now owned by SUSE).

We also see the requirements of new “cloud native” applications changing things significantly for data storage, as the preference for these apps becomes API-based access to storage services. For this reason, object APIs such as AWS S3 will continue to grow in popularity. Moreover, there are cloud-native ways of exposing storage interfaces and automation through Kubernetes, including an emerging object storage management API called COSI (the Container Object Storage Interface).

Watch this space as Scality continues to learn from real-world experiences and uses this knowledge to deliver new and exciting solutions for this emerging world.

Our experiences, learnings & directions with Kubernetes

Zenko on 3-node EKS Kubernetes cluster

Why fast object storage will help IT ride the cloud-native tsunami

Related Articles

Leave a Comment Cancel Reply