Kolo Rahl's Blog

The Path to Containerized Services

Published 2017 Dec 21, 03:35

This is designed to be a multi-part article. This first post, the introduction, talks about what the plan was and how it came to be along with some of the concerns and questions we had prior to starting implementation. The following parts discuss the exact details behind each of the particular goals we were trying to achieve.

It began with a con…

While I was the Software Architect at Loot Crate there came a task where the company was gearing up for a spin-off of their core product(s), which would later be known as Sports Crate. I was told that I would head up the technical design and architecture for that project, as well as contribute as the engineering lead in the beginning while I got the team ramped up on everything.

This was shortly before KubeCon 2016, and I knew that I wanted to use Docker and Kubernetes if possible. The vision I had seemed like it would be most easily realized through the use of Docker containers and the up-and-coming Kubernetes container orchestration tool. So in order to acquire the most information possible on Kubernetes and how it is used in production, I asked the company to send me and some of my colleagues to KubeCon 2016, which they gladly obliged.

The event was wonderful. We meet a lot of interesting people working on interesting projects in the container technology space. Monitoring, deployment, development flows, image verification, hosting solutions, and more were present. The most intriguing part of it all, however, was that no one really knew the “right” way of doing anything in this space yet. Sure, people knew how to monitor a giant cluster of services, but deploying those monitoring tools as either a separate service container or bundling it into existing containers was new space. And automating a lot of the boring stuff was really new and different companies had different ideas on how to tackle the issue. Overall it was a great font of knowledge that we eagerly chugged down.

I used the remainder of November to consolidate the notes from the convention and test out a few ideas to see if they would even be possible in this budding architecture design I had growing in my head. It seemed more and more likely as I did more tests, and I confirmed with some testing that our Director of DevOps was doing in tandem with my own work, and we agreed that this path was possible.

Initial Architecture

In December I began to hammer out the official design. I’ll post other articles to talk about some of these design points in more detail, but here was the basic architecture:

  • Use Docker for all deployable services. The application code, databases, anything we wanted to use in our new system would have to be deployed from a container image, and we would use Docker to build/fetch those images.
  • Use docker-compose for local development. One of the major pain-points with local development is that the environment you’re developing in tends to be significantly different from the one you’re deploying to. The use of docker-compose significantly reduced that disparity.
  • Use Jenkins for automation. Not just kicking off tests, but general automation. Specifically we used it to build images, run tests, upload images, and kick off deployments.
  • Use the Google Cloud platform to store images, host compute instances, manage network configuration (ingress/egress rules, load balancing, etc), and manage Kubernetes.
  • Use Kubernetes to manage container deployments.
  • Use Datadog to monitor container instances.
  • Use PagerDuty to automate the on-call rotation and escalation rules, with alerts typically coming in from Datadog.
  • Use Cloud Pub/Sub to build an asynchronous and decentralized message bus. This was primarily to allow our legacy software and our new prototype work to communicate effectively.

The “legacy” platform at Loot Crate was written as a monolithic Ruby on Rails application. Our front-end developers were writing and managing ERB and Coffeescript files, which we already knew we didn’t want in the prototype. Aside from merely trying to improve the technology we were using, I also wanted to improve our process, so I wanted to separate the development and deployment needs of front-end and back-end code. The prototype therefore built a Grape API (still using Ruby) for the back-end and a React project for the front-end. I will speak about this - why we did it and what it accomplished - in another article, but the TL;DR is that this method allowed back-end and front-end teams to develop and deploy asynchronously, greatly speeding up time-to-deployment.

Concerns and the Unknown

There was a lot of new stuff in this design that most people in the company hadn’t even heard of before. Kubernetes was still new, I was the only person to ever use Docker before, no one had used Google Cloud, and many people weren’t sure how/why to use a message queue. We were also using Heroku for deployments and some third-party CI service for automated testing, so moving away from both of those things onto a completely custom solution (Kubernetes + Jenkins) was quite polarizing for some. Overall people were excited though… assuming we could bring it all together.

The biggest concern was: if we start implementing all of this in January could we have a functioning website/service deployed by mid-March? Well, some of the design decisions were made specifically to accommodate such a timeline. For example, we ultimately went with Google Cloud because they supported a lot of what we wanted to do, like container image hosting and Kubernetes-as-a-service, at a time when AWS wasn’t touching that stuff. I also knew my team well and had no concern that they wouldn’t pick up on the work I started and see this through to completion. And again, we made some concessions to ensure developers would be as effective as possible, such as using Ruby as the back-end language since all the back-end engineers knew it well. One of the reasons we were comfortable going with React was because our front-end engineers had been learning React up until that point in preparation for a switch in the legacy system, so they were all trained up and ready to go with that.

I made an estimated timeline and worked closely with our Director of DevOps and Product Manager to ensure that all of our requirements were within reason and that if we were cutting any corners that they were known about ahead of time. I didn’t mind having a less-than-perfect prototype because that’s why you build prototypes, to test things without worrying about getting it exactly correct. Everyone was on board with that and by the end of December we had pinned down our largest tasks according to estimated effort and time-to-completion. Getting docker-compose to work was a large developer task since no one had used it before. Getting Kubernetes configured and working was a large DevOps task. Setting up Jenkins was a medium task but crucial since it automated all of our builds and deployments. And finally there was the task to get the back-end and front-end work completed so that we had something customers could interact with, and that was actually the least risky part of all.

Coming Up

The following articles are planned to provide more detail about parts of the design, as well as a final article that discusses the successes and failures of the architecture.

  • Containers are a win: Using Docker and docker-compose
  • Automate all the things! With Jenkins!
  • Google Cloud vs Amazon Web Services (A Mostly-Opinion Piece)
  • Kubernetes might be the future of containerization
  • Message queues: distributed architectures need it
  • Planning an API for React consumption
  • Epilogue: How it all came together