Saltside recently migrated production applications to Kubernetes using Helm to deploy the application. This post describes how we built our chart to fit Saltside’s unique business requirements.
Saltside makes classifieds (user posted ads, not secret information) sites. We operate in multiple markets: Tonaton.com, Bikroy.com, Efritin.com, and Ikman.lk. I’ll refer to this as MARKET
going forward. I suggest you checkout the sites. You’ll notice they look and feel the same. That’s by design!
Market infrastructure is deployed, scaled, and operated separately from the others. This ensures each market’s infrastructure scales correctly for load and other requirements while guarding against failures propagating to others. Deploying separately requires different configuration for each MARKET
. Configurations must be tested in different stages in the deployment pipeline. Example stages are production or uat. I’ll refer to this as STAGE
going forward.
Naturally this propagates down to individual codebases. Each code base must support different MARKET
and STAGE
values. Our product is composed of ~20 independent code bases that may themselves be composed of multiple deployed processes (such as a thrift server, web server, or working off messages in queue). I’ll use COMPONENT
to refer to components or code bases and PROCESS
to refer processes inside each COMPONENT
.
You may be thinking “this sounds like a lot of stuff!” You’re right. It is. The total possible configuration is MARKET
* STAGE
* COMPONENT
* PROCESS
. You can learn more about our previous setup in my talk at DevOps Days India 2016. This the jumping off point for discussing our Kubernetes solution.
Requirements
The big idea is that we’ll move from independently configured and deployed COMPONENT
to centrally configured and deployed mono-repo. The implementation is no surprise from the title. The entire product (all configuration for MARKET
, STAGE, COMPONENT
, and PROCESS
) are wrapped up in a single repo that builds Helm charts. I’m not going into the reasons why we chose Helm. There are plenty of articles, talks, and presentations on what Helm is and what it can do. Let’s move onto specifics we need from our Helm chart.
Market, Stage, Component and Process Configuration
MARKET
, STAGE
, COMPONENT
, and PROCESS
specific environment variables and secrets. There is a hierarchy between those 4 values. MARKET
is the most general and PROCESS
is the most specific. A specific COMPONENT
may need different environment variables depending on MARKET
or STAGE
. PROCESS
may need the same. Here’s a concrete example. The same BUGSNAG_API_KEY
may apply to all PROCESS
inside a given COMPONENT
. However something like EXTERNAL_REDIRECT_URL
may need different values depending on MARKET
or STAGE
.
A Single Values File
No separate values files or requirements to pass -f
or --set
on helm install
or helm upgrade
. There are too many possible configurations. That also forces engineers to mange these files or pass in command line flags is simply unmanageable in our problem domain.
Encrypted Secrets Values
Our current solution is subpar in this area so we’d like to, at a minimum, encrypt sensitive data that’s committed to SCM.
Preconfigured Charts for MARKET and STAGE
This is somewhat a follow up to the previous point. The idea here is that anyone can run a command like helm install $MARKET
and everything will work as expected. Preconfigured versions for different STAGE
values are communicated through different semantic versions. This requirement opens up many use cases such as automatic environment creation from topic branches or allowing any engineer to stand up test environments against arbitrary configurations.
Easier or Better than our Current Solution
Different COMPONENT and PROCESS configurations should be as easy as current solution or easier. This one is subjective so here’s some background. Different COMPONENT have roughly the same shape. All COMPONENT values share the same Docker image. Different PROCESSS run as different containers using that Docker image. This may require a different command, ports, environment variables, or other things. This generally boils down to expressing configuration at our abstraction level rather than Helm’s.
Deployable to any Kubernetes Cluster
The chart should be deployable on any Kubernetes cluster. This means it should be self contained, specifically it should contain an image pull secret for access to private Docker images.
Those are the high level requirements around chart. Let’s see how this requirements drive the implementation.
Implementation at 10,000ft
The requirements dictate a single values.yaml file and a build process that can spit out preconfigured charts for different M and S values. Here’s what this looks like for us:
- Everything declared in values.yaml with a ton of YAML anchors (more on that later)
script/lint-values
testsvalues.yaml
for correct semantics. Example: each combination of M, S, C, and P have declared environment variables (so range in chart templates works as expected).- Secret values kept in a
secrets.yaml
managed with git-crypt. We initially tried using Ansible vault but aborted that effort. More on that later as well. - A parameterized build script that injects the correct values for
MARKET
andSTAGE
intovalues.yaml
before building the final chart. The productionSTAGE
produces a chart namedMARKET-VERSION
. The sandboxSTAGE
produces a chart namedMARKET-VERSION.betaN
. AnySTAGE
is supported through different version modifiers. The versioning information is determined by the commit and git branch. - Chart templates expect a specific structure in
$.Values
to generate each Service, Deployment, and Pod etc. This allows defining everything aboutCOMPONENT
andPROCESS
invalues.yaml
without creating new templates.
In short, our mono repo is a template for producing MARKET * STAGE
Helm charts. Let’s dive into implementation specifics.
Templates & Values
The chart templates are straight forward. There is one template for Service, Deployment, and Secret. Each template produces multiple resources (separated by ---) in YAML from range functions over various $.Values
. The charts essentially loop over $.Values.applications[].containers[]
(where application refers to COMPONENT
and containers is PROCESS
). One Deployment is created for each COMPONENT
and PROCESS
combination because this is how our system scales. One Service is generated for each COMPONENT
with ports for each PROCESS
. “Our” is emphasized on purpose. Your application may be different.
Our values.yaml
and secrets.yaml
rely on YAML anchors. This is especially useful for “overriding” the MARKET/STAGE/COMPONENT/PROCESS
hierarchy. Initially this started out in in the templates (one loop for each). It was easier to move everything into values using YAML anchors.
The values.yaml
defines images for each COMPONENT
value; as well as common things like MARKET
and STAGE
values. There are many shared environment variables (like MARKET
, STAGE
, RACK_ENV
, or NODE_ENV
) that are defined as anchors and appended to. Let’s look at the main anchors. Anchors are prefixed with _ to prevent accidental clashes with real variables. They are also annotated with comments so readers know that these are not intended for direct use, but only referenced later in the file. Here is an example:
_anchors:
common_env: &COMMON_ENV
APP_VERSION: "${CHART_ID}"
APP_ENV: "${CHART_STAGE}"
NODE_ENV: "${CHART_STAGE}"
RACK_ENV: "${CHART_STAGE}"
RAILS_ENV: "${CHART_STAGE}"
MARKET: "${CHART_MARKET}"
STATSD_URL: "udp://localhost:8125"
LOG_LEVEL: info
# TODO: AWS variables
The ${VARIABLE}
is special. That’s inserted by the build process with envsubst. I’ll cover that later on. This anchor is reused in each STAGE
and COMPONENT
declaration as shown below:
env:
admin_service:
production: &ADMIN_SERVICE_PRODUCTION_ENV_VARS
<<: *COMMON_ENV
sandbox: &ADMIN_SERVICE_SANDBOX_ENV_VARS
<<: *COMMON_ENV
# FIXME: This is wrong!
APP_ENV: development
AWS_ACCESS_KEY_ID: foo
AWS_SECRET_ACCESS_KEY: foo
Injecting COMMON_ENV
anchor into the COMPONENT * STAGE
context is easy as well as adding new values (in the sandbox STAGE
in this example). That COMPONENT * STAGE
configuration is assigned a new anchor name which can be customized by MARKET
or PROCESS
later on. Here’s an MARKET
example:
env:
admin_service:
production:
bikroy:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
ikman:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
tonaton:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
efritin:
<<: *ADMIN_SERVICE_PRODUCTION_ENV_VARS
sandbox:
bikroy:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
ikman:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
tonaton:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
efritin:
<<: *ADMIN_SERVICE_SANDBOX_ENV_VARS
It’s not the prettiest but it works. It removes effort for shared configuration and makes it easy override all levels in the hierarchy. This same structure applies to secrets.yaml
.
Here’s an example COMPONENT
and PROCESS
configuration from values.yaml:
- name: image_service
tier: thrift
service_type: ClusterIP
containers:
- name: thrift
command:
- "thrift-server"
ports:
container: 9090
service: 9090
livenessProbe:
initialDelaySeconds: 5
tcpSocket:
port: 9090
resources:
sandbox:
requests:
memory: 128Mi
production:
requests:
memory: 128Mi
cpu: 50m
limits:
memory: 128Mi
cpu: 100m
We mix custom structures and existing Kubernetes ones (like livenessProbeor inside resources). This works well because data may directly dumped into the manifest.
This gist is our complete deployment template. That there are things that I’m not going to describe in this post. Note the use of index in combination with $.Values.market
, $.Values.stage
, and $.Values.topology.pods
. Also note that generating JSON init containers is quite strange inside YAML since YAML is white space sensitive but JSON requires delimiters.
Kubernetes Services are declared through a combination of custom data structures and previously shown COMPONENT
and PROCESS
configuration.
Repo Organization & Building
I find this section the most interesting personally since it demonstrates what you can do with Helm when you throw some Bash into the mix. It requires a bit of funkiness to get everything to work. I’ll touch on those the most because they highlight curious choices in Helm chart packaging.
Here’s the tree:
.
├── Makefile
├── README.md
├── VERSION
├── chart
│ ├── charts
│ └── templates
│ ├── _helpers.tpl
│ ├── app_deployments.yaml
│ ├── app_services.yaml
│ ├── sandbox.yaml
│ ├── secrets.yaml
│ └── smoke_test.yaml
├── config
│ ├── Chart.yaml
│ └── values.yaml
├── script
│ ├── await-release
│ ├── build
│ ├── ci
│ │ └── test
│ ├── clean-releases
│ ├── lint-manifest
│ ├── lint-values
│ ├── package
│ ├── publish
│ ├── test-chart
│ ├── test-release
│ └── yaml2json
├── secrets.yaml
Building
make coordinates the entire process. /chart
and /config
are the most relevant directories for this section. /chart
contains files that are not parameterized and may be copied directly into the final output directory. /config
contains files that are parameterized so each must pass through envsubst
before coping into the final output directory. This is where ${MARKET}
etc is inserted into the final values.yaml file.
script/build
coordinates the process. It determines the correct VERSION
along all other template variables, creates a specifically named directory for the chart, copies over everything in /templates
, templates everything in /config
and dumps that into the output directory. It’s shared in its entirety:
You’ll also notice chart our versioning semantics coded into that file. Astute readers will notice that this script does not package or publish the chart. This script only prepares the directory. That is enough to install the chart for testing. Other scripts produce the final .tgz archive and upload to our internal chart repository. There is nothing interesting in those script. They accept a directory as an argument and call the various helm commands.
This approach seems is working well enough for our production continuous delivery pipeline. The CD pipeline is a entire post in itself. Check back in the future for all the details once all the kinks are sorted out. I want to close out this post by discussing things we tried that didn’t work out.
What Didn’t Work
- Ansible vault for secrets. Internally we use Ansible to automate processes and glue various workflows together. We did not want to commit secrets in plain text to source control, but we could accept encrypting them in source control. Ansible vault was a natural fit for us. It worked in the beginning then we hit increased friction levels beyond the dev stage. We generated a Kubernetes Secret using Ansible templates and variables. The chart required a
$.Values.secret_name
. This meant secret must exist in the cluster(s) where the chart would be installed on. This became more problematic as we have multiple Kubernetes clusters for different stages (namely production & non-production). This also opened up a question of what to do if the chart itself failed tests? The secret would need to be deleted from the clusters. That is possible but it’s an avoidable problem. Everything became much easier when switched to git-crypt which made the chart self-contained. I advise teams adopting Helm start and continue with git-crypt until it becomes a pain point. - Duplicating configuration in templates. Notice the Deployment template (naturally) defines the template pods and their init containers. The init containers use the same Docker image as
COMPONENT
to run initPROCESS
. EachPROCESS
generally requires the same set of configuration values (such asCOMPONENT
specific values likeMYSQL_URL
orMARKET
orSTAGE
global values). Previous versions duplicated generation in different parts of the templates. Things became much easier by moving things into values.yaml through YAML anchors. I doubt other teams will encounter problems of this scope. Regardless it’s an interesting approach to keep in your back pocket. - This is tangentially related to the chart but I think it’s important to discuss since it impact the chart’s usability. Initially we wanted to pre-authenticate all cluster nodes to our Docker registry. We did this through updated the default ServiceAccount to also use a custom image pull secret. This seemed like a good idea in the beginning, but turned out to be a bad one when it came down to it. All this did was introduce more pre-reqs to where the chart could be installed. In the end it was easier to create an image pull secret as part of chart (also not an issue since git-crypt manages our plain text secrets) and use that in all the templates. This was another step in making the chart more self contained.
- The chart’s flexibility correlates with the internal application’s (
COMPONENT
andPROCESS
). Our process currently has two stages: production and sandbox. Unfortunately our code bases are in various stages of being completely configured by environment variables and/or command line options. We cannot easily introduce a new stage because many code bases hard coded logic on something likeNODE_ENV
orRACK_ENV
. This is reminder that hard coding conditional is an anti pattern and should be avoided.
What’s Next?
We plan for future posts documenting our test strategy, workflow around the chart, and continuous delivery (and hopefully deployment!) processes. Stay tuned for more information. Also don’t hesitate to leave a comment or ask a question. You can also find the SRE team on the Kubernetes slack in #helm-users, #helm-dev, #in-users, and #kops if you want to chat with us.
Good luck out there and happy shipping!