11 Oct 2017, 18:00

Continuous Delivery and Continuous Deployment for Kubernetes microservices

Continuous Delivery and Continuous Deployment for Kubernetes microservices

THIS IS A DRAFT VERSION OF POST TO COME, PLEASE DO NOT SHARE

Starting Point

Over last years we’ve been adopting several concepts for our project, straggling to make them work together.

The first one is the Microservice Architecture. We did not start it clean and by the book, rather applied it to the already existing project: splitting big services into smaller and breaking excessive coupling. The refactoring work is not finished yet. New services, we are building, starts looking more like “microservices”, while there are still few that, I would call “micro-monoliths”. I have a feeling that this is a typical situation for an already existing project, that tries to adopt this new architecture pattern: you are almost there, but there is always a work to be done.

Another concept is using Docker for building, packaging and deploying application services. We bet on Docker from the very beginning and used it for most of our services and it happens to be a good bet. There are still few pure cloud services, that we are using when running our application on public cloud, thing like Databases, Error Analytics, Push Notifications and some others.

And one of the latest bet we made was Kubernetes. Kubernetes became the main runtime platform for our application. Adopting Kubernetes, allowed us not only to hide away lots of operational complexity, achieving better availability and scalability, but also be able to run our application on any public cloud and on-premise deployment.

With great flexibility, that Kubernetes provides, it brings an additional deployment complexity. Suddenly your services are not just plain Docker containers, but there are a lot of new (and useful) Kubernetes resources that you need to take care for: ConfigMsaps, Secrets, Services, Deployments, StatefulSets, PVs, PVCs, Ingress, Jobs and others. And it’s not always obvious where to keep all these resources and how they are related to Docker images built by CI tool.

“Continuous Delivery” vs. “Continuous Deployment”

The ambiguity of CD term annoys me a lot. Different people mean different things when using this term. And it’s not only about abbreviation meaning: Continuous Deployment vs Continuous Delivery, but also what do people really mean, when using this abbreviation.

Still, it looks like there is a common agreement that Continuous Deployment (CD) is a super-set of Continuous Delivery (CD). And the main difference, so far, is that Continuous Deployment is 100% automated, while in Continuous Delivery there are still some steps that should be done manually.

In our project, for example, we succeeded to achieve Continuous Delivery, that serves us well both for SaaS and on-premise versions of our product. Our next goal is to create fluent Continuous Deployment for SaaS version. We would like to be able release a change automatically to production, without human intervention, and be able to rollback to the previous version if something went wrong.

Kubernetes Application and Release Content

Now let’s talk about Release and try to define what is a Release Content?.

When we are releasing a change to some runtime environment (development, staging or production), it’s not always a code change, that is represented by a newly backed Docker image with some tag. Change can be done to application configurations, secrets, ingress rules, jobs we are running, volumes and other resources. It would be nice to be able to release all these changes in the same way as we release a code change. Actually, a change can be a mixture of both and in practice, it’s not a rare use case.

So, we need to find a good methodology and supporting technology, that will allow us to release a new version of our Kubernetes application, that might be composed of multiple changes and these changes are not only new Docker image tags. This methodology should allow us to do it repeatedly on any runtime environment (Kubernetes cluster in our case) and be able to rollback ALL changes to the previous version if something went wrong.

That’s why we adopted Helm as our main release management tool for Kubernetes.

Helm recap

This post is not about Helm, so Helm recap will be very short. I encourage you to read Helm documentation, it’s complete and well written.

Just to remind - core Helm concepts are:

  • (Helm) Chart - is a package (tar archive) with Kubernetes YAML templates (for different Kubernetes resources) and default values (also stored in YAML files). Helm uses chart to install a new or update an existing (Helm) release.
  • (Helm) Release - is a Kubernetes application instance, installed with Helm. It is possible to create multiple releases from the same chart version.
  • (Release) Revision - when updating an existing release, a new revision is created. Helm can rollback a release to the previous revision. Helm stores all revisions in ConfigMap and it’s possible to list previous releases with helm history command.
  • Chart Repository - is a location where packaged charts can be stored and shared. Any web server that can store and serve static files can be used as Chart Repository (Nginx, GitHub, AWS S3 and others).

Helm consists of the server, called Tiller and the command line client, called helm. When releasing a new version (or updating an existing) helm client sends chart (template files and values) to the Helm server. Tiller server generates valid Kubernetes yaml files from templates and values and deploys them to Kubernetes, using Kubernetes API. Tiller also saves generated yaml files as a new revision inside ConfigMaps and can use previously saved revision for rollback operation.

It was a short recap. Helm is a flexible release management system and can be extended with plugins and hooks.

Helm Chart Management

Typical Helm chart contains a list of template files (yaml files with go templates commands) and values files (with configurations and secrets).

We use Git to store all our Helm chart files and Amazon S3 for chart repository.

Short How-To guide:

  1. Adopt some Git management methodology. We use something very close to the GitHub Flow model GitHub Flow
  2. Have a git repository for each microservice. Our typical project structure:

    # chart files
    chart/
        # chart templates
        templates/
        # external dependency
        requirements.yaml
        # default values
        values.yaml
        # chart definition
        Chart.yaml
    # source code
    scr/
    # test code
    test/
    # build scripts
    hack/
    # multi-stage Docker build file
    Dockerfile
    # Codefresh CI/CD pipeline
    codefresh.yaml
    
  3. We keep our application chart in a separate git repository. The application chart does not contain templates, but only list of third party charts it needs (requirements.yaml file) and values files for different runtime environments (testing, staging and production)

  4. All secrets in values files are encrypted with sops tool and we defined a .gitignore file and setup a git pre-commit hook to avoid unintentional commit of decrypted secrets.

Docker Continuous Integration

Building and testing code on git push/tag event and packaging it into some build artifact is a common knowledge and there are tons of tools, services, and tutorials how to do it.

Codefresh is one of such services, which is tuned effectively build Docker images.

Codefresh Docker CI has one significant benefit versus other similar services - besides just being fast CI for Docker, it maintains a traceability links between git commits, builds, Docker images and Helm Releases running on Kubernetes clusters.

Typical Docker CI flow

Docker CI

  1. Trigger CI pipeline on push event
  2. Build and test service code. Tip: give a try to a Docker multi-stage build.
  3. Tip: Embed the git commit details into the Docker image (using Docker labels). I suggest following Label Schema convention.
  4. Tag Docker image with {branch}-{short SHA}
  5. Push newly created Docker image into preferred Docker Registry

Docker multistage build

With a Docker multi-stage build, you can even remove a need to learn a custom CI DSL syntax, like Jenkins Job/Pipeline, or other YAML based DSL. Just use a familiar Dockerfile imperative syntax to describe all required CI stages (build, lint, test, package) and create a thin and secure final Docker image, that contains only bare minimum, required to run the service.

Using multi-stage Docker build, has other benefits.

It allows you to use the same CI flow both on the developer machine and the CI server. It can help you to switch easily between different CI services, using the same Dockerfile. The only thing you need is a right Docker daemon version (‘> 17.05’). So, select CI service that supports latest Docker daemon versions.

Example: Node.js multi-stage Dockerfile

#
# ---- Base Node ----
FROM alpine:3.5 AS base
# install node
RUN apk add --no-cache nodejs-npm tini
# set working directory
WORKDIR /root/chat
# Set tini as entrypoint
ENTRYPOINT ["/sbin/tini", "--"]
# copy project file
COPY package.json .

#
# ---- Dependencies ----
FROM base AS dependencies
# install node packages
RUN npm set progress=false && npm config set depth 0
RUN npm install --only=production 
# copy production node_modules aside
RUN cp -R node_modules prod_node_modules
# install ALL node_modules, including 'devDependencies'
RUN npm install

#
# ---- Test ----
# run linters, setup and tests
FROM dependencies AS test
COPY . .
RUN  npm run lint && npm run setup && npm run test

#
# ---- Release ----
FROM base AS release
# copy production node_modules
COPY --from=dependencies /root/chat/prod_node_modules ./node_modules
# copy app sources
COPY . .
# expose port and define CMD
EXPOSE 5000
CMD npm run start

Kubernetes Continuous Delivery (CD)

Building Docker image on git push is a very first step you need to automate, but …

Docker Continuous Integration is not a Kubernetes Continuous Deployment/Delivery

After CI completes, you just have a new build artifact - a Docker image file.

Now, somehow you need to deploy it to a desired environment (Kubernetes cluster) and maybe also need to modify other Kubernetes resources, like configurations, secrets, volumes, policies, and others. Or maybe you do not have a “pure” microservice architecture and some of your services still have some kind of inter-dependency and have to be released together. I know, this is not “by the book”, but this is a very common use case: people are not perfect and not all architectures out there perfect too. Usually, you start from an already existing project and try to move it to a new ideal architecture step by step.

So, on one side, you have one or more freshly backed Docker images. On the other side, there are one or more environments where you want to deploy these images with related configuration changes. And most likely, you would like to reduce required manual effort to the bare minimum or dismiss it completely, if possible.

Continuous Delivery is the next step we are taking. Most of the CD tasks should be automated, while there still may be a few tasks that should be done manually. The reason for having manual tasks can be different: either you cannot achieve full automation or you want to have a feeling of control (deciding when to release by pressing some “Release” button), or there is some manual effort required (bring the new server and switch in on :) )

For our Kubernetes Continuous Delivery pipeline, we manually update Codefresh application Helm chart with appropriate image tags and sometimes we also update different Kubernetes YAML template files too (defining a new PVC or environment variable). Once changes to our application chart are pushed into the git repository, an automated Continuous Delivery pipeline execution is triggered.

Codefresh includes some helper steps that make building Kubernetes CD pipeline easier. First, we have a built-in helm update step that can install or update a Helm chart on specified Kubernetes cluster or namespace, using Kubernetes context, defined in Codefresh account.

Codefresh also provides a nice view of what is running in your Kubernetes cluster, where it comes from (release, build) and what does it contain: images, image metadata (quality, security, etc.), code commits.

We use our own service (Codefresh) to build an effective Kubernetes Continuous Delivery pipeline for deploying Codefresh itself. We also constantly add new features and useful functionality that simplify our life (as developers) and hopefully help our customers too.

Codefresh Helm Release View

Typical Kubernetes Continuous Delivery flow

Kubernetes Continuous Delivery

  1. Setup a Docker CI for the application microservices
  2. Update microservice/s code and chart template files, if needed (adding ports, env variables, volumes, etc.)
  3. Wait till Docker CI completes and you have a new Docker image for updated microservice/s
  4. Manage the application Helm chart code in separate git repository; use the same git branch methodology as for microservices
  5. Manually update imageTags for updated microservice/s
  6. Manually update the application Helm chart version
  7. Trigger CD pipeline on git push event for the application Helm chart git repository
    • validate Helm chart syntax: use helm lint
    • convert Helm chart to Kubernetes template files (with helm template plugin) and use kubeval to validate these files
    • package the application Helm chart and push it to the Helm chart repository
      • Tip: create few chart repositories; I suggest having a chart repository per environment: production, staging, develop
  8. Manually (or automatically) execute helm upgrade --install from corresponding chart repository

After CD completes, we have a new artifact - an updated Helm chart package (tar archive) of our Kubernetes application with a new version number.

Now, we can run help upgrade --install command creating a new revision for the application release. If something goes wrong, we can always rollback failed release to the previous revision. For the sake of safety, I suggest first to run helm diff (using helm diff plugin) or at least use a --dry-run flag for the first run, inspect the difference between a new release version and already installed revision. If you are ok with upcoming changes, accept them and run the helm upgrade --install command without --dry-run flag.

Kubernetes Continuous Deployment (CD)

Based on above definition, to achieve Continuous Deployment we should try to avoid all manual steps, besides git push for code and configuration changes. All actions, running after git push, should be 100% automated and deliver all changes to a corresponding runtime environment.

Let’s take a look at manual steps from “Continuous Delivery” pipeline and think about how can we automate them?

Kubernetes Continuous Deployment

Automate: Update microservice imageTag after successful docker push

After a new Docker image for some microservice pushed to a Docker Registry, we would like to update the microservice Helm chart with the new Docker image tag. There are two (at least) options to do this.

  1. Add a Docker Registry WebHook handler (for example, using AWS Lambda). Take the new image tag from the DockerHub push event payload and update corresponding imageTag in the Application Helm chart. For GitHub, we can use GitHub API to update a single file or bash scripting with mixture of sed and git commands.
  2. Add an additional step to every microservice CI pipeline, after docker push step, to update a corresponding imageTag for the microservice Helm chart

Automate: Deploy Application Helm chart

After a new chart version uploaded to a chart repository, we would like to deploy it automatically to “linked” runtime environment and rollback on failure.

Helm chart repository is not a real server that aware of deployed charts. It is possible to use any Web server that can serve static files as a Helm chart repository. In general, I like simplicity, but sometimes it leads to naive design and lack of basic functionality. With Helm chart repository it is the case. Therefore, I recommend using a web server that supports nice API and allows to get notifications about content change without pull loop. Amazon S3 can be a good choice for Helm chart repository

Once you have a chart repository up and running and can get notifications about a content update (as WebHook or with pool loop), and make next steps towards Kubernetes Continuous Deployment.

  1. Get updates from Helm chart repository: new chart version
  2. Run helm update --install command to update/install a new application version on “linked” runtime environment
  3. Run post-install and in-cluster integration tests
  4. Rollback to the previous application revision on any “failure”

Summary

This post describes our current Kubernetes Continuous Delivery” pipeline we succeeded to setup. There are still things we need to improve and change in order to achieve fully automated Continuous Deployment.

We constantly change Codefresh to be the product that helps us and our customers to build and maintain effective Kubernetes CD pipelines. Give it a try and let us know how can we improve it.


Hope, you find this post useful. I look forward to your comments and any questions you have.


*This is a working draft version. The final post version is published at Codefresh Blog on December 4, 2018.*

04 Oct 2017, 18:00

Chaos Testing for Docker Containers

What follows is the text of my presentation, Chaos Testing for Docker Containers that I gave at ContainerCamp in London this year. I’ve also decided to turn the presentation into an article. I edited the text slightly for readability and added some links for more context. You can find the original video recording and slides at the end of this post.

Docker Chaos Testing

Intro

Software development is about building software services that support business needs. More complex businesses processes we want to automate and integrate with. the more complex software system we are building. And solution complexity is tend to grown over time and scope.

The reasons for growing complexity can be different. Some systems just tend to handle too many concerns, or require a lot of integrations with external services and internal legacy systems. These systems are written and rewritten multiple times over several years by different people with different skills, trying to satisfy constantly changing business requirements, using different technologies, following different technology and architecture trends.

So, my point, is that building software, that unintentionally become more and more complex over time, is easy - we all done in the past it or doing it right now. Building a “good” software architecture for complex systems and preserving it’s “good” abilities for some period of time, is really hard.

When you have too many “moving” parts, integrations, constantly changing requirements, that lead to code changes, security upgrades, hardware modernization, multiple network communication channels and etc, it can become a “Mission Impossible” to avoid unexpected failures.

Stuff happens!

All systems fail from time to time. And your software system will fail too. Take this as a fact of life. There will always be something that can — and will — go wrong. No matter how hard we try, we can’t build perfect software, nor can the companies we depend on. Even the most stable and respectful services from companies, that practice CI/CD, test driven development (TDD/BDD), have huge QA departments and well defined release procedures, fail.

Just a few examples from the last year outages:

  1. BM, January 26
    • IBM’s cloud credibility took a hit at the start of the year when a management portal used by customers to access its Bluemix cloud infrastructure went down for several hours. While no underlying infrastructure actually failed, users were frustrated in finding they couldn’t manage their applications or add or remove cloud resources powering workloads.
    • IBM said the problem was intermittent and stemmed from a botched update to the interface.
  2. GitLab, January 31
    • GitLab’s popular online code repository, GibLab.com, suffered an 18-hour service outage that ultimately couldn’t be fully remediated.
    • The problem resulted when an employee removed a database directory from the wrong database server during maintenance procedures.
  3. AWS, February 28
    • This was the outage that shook the industry.
    • An Amazon Web Services engineer trying to debug an S3 storage system in the provider’s Virginia data center accidentally typed a command incorrectly, and much of the Internet – including many enterprise platforms like Slack, Quora and Trello – was down for four hours.
  4. Microsoft Azure, March 16
    • Storage availability issues plagued Microsoft’s Azure public cloud for more than eight hours, mostly affecting customers in the Eastern U.S.
    • Some users had trouble provisioning new storage or accessing existing resources in the region. A Microsoft engineering team later identified the culprit as a storage cluster that lost power and became unavailable.

Visit Outage.Report or Downdetector to see a constantly updating long list of outages reported by end-users.

Chasing Software Quality

As software engineers, we what to be proud of software systems we are building. We want theses systems to be of high quality, without functional bugs, security holes, providing exceptional performance, resilient to unexpected failures, self-healing, always available and easy to maintain and modernize.

Every new project starts with “high quality” picture in mind and none wants to create crappy software, but very few of us (or none) are able to achieve and keep intact all good “abilities”. So, what we can do to improve overall system quality? Should we do more testing?

I tend to say “Yes” - software testing is critical. But just running unit, functional and performance testing is not enough.

Today, building complex distributed system is much easier with all new amazing technology we have and experience we gathered. Microservice Architecture is a real trend nowadays and miscellaneous container technologies support this architecture. It’s much easier to deploy, scale, link, monitor, update and manage distributed systems, composed from multiple “microservices”. When we are building distributed systems, we are choosing P (Pratition Tolerance) from the CAP theorem and second to it either A (Availability - the most popular choice) or C (Consistency). So, we need to find a good approach for testing AP or CP systems.

Traditional testing disciplines and tools do not provide a good answer to how does your distributed system behave when unexpected stuff happens in production?. Sure, you can learn from previous failures, after the fact, and you should definitely do it. But, learning from past experience should not be the only way to prepare for the future failures.

Waiting for things to break in production is not an option. But what’s the alternative?

Chaos Engineering

The alternative is to break things on purpose. And Chaos Engineering is a particular approach to doing just that. The idea of Chaos Engineering is to embrace the failure! Chaos Engineering for distributed software systems was originally popularized by Netflix.

Chaos Engineering defines an empirical approach to resilience testing of distributed software systems. You are testing a system by conducting chaos experiments.

Typical chaos experiment: - define a normal/steady state of the system (e.g. by monitoring a set of system and business metrics) - pseudo-randomly inject faults (e.g. by terminating VMs, killing containers or changing network behavior) - try to discover system weaknesses by deviation from expected or steady-state behavior

The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system.  

Chaos Engineering tools

Of cause it’s possible to practice Chaos Engineering manually, or relay on automatic system updates, but we, as engineers like to automate boring manual tasks, so there are some nice tools to use.

Netflix built a some useful tools for practicing Chaos Engineering in public cloud (AWS): - Chaos Monkey - kill EC2, kill processes, burn CPU, fill disk, detach volumes, add network latency, etc - Chaos Kong - remove whole AWS Regions

These are very good tools, I encourage you to use them. But when I’ve started my new container based project (2 years ago), it looks like these tools provided just a wrong granularity for chaos I wanted to create, and I wanted to be able to create the chaos not only in real cluster, but also on single developer machine, to be able to debug and tune my application. So, I’ve searched Google for Chaos Monkey for Docker, but did not find anything, besides some basic Bash scripts. So, I’ve decided to create my own tool. And since it happens to be quite a useful tool from the very first version, I’ve shared it with a community as an open source. It’s a Chaos Monkey Warthog for Docker - Pumba

Pumba - Chaos Testing for Docker

What is Pumba(a)?

Those of us who have kids or was a kid in 90s should remember this character from Disney’s animated film The Lion King. In Swahili, pumbaa means “to be foolish, silly, weak-minded, careless, negligent”. I like the Swahili meaning of this word. It matched perfectly for the tool I wanted to create.

What Pumba can do?

Pumba disturbs running Docker runtime environment by injecting different failures. Pumba can kill, stop, remove or pause Docker container. Pumba can also do a network emulation, simulating different network failures, like: delay, packet loss (using different probability loss models), bandwidth rate limits and more. For network emulation, Pumba uses Linux kernel traffic control tc with netem queueing discipline, read more here. If tc is not available within target container, Pumba uses a sidekick container with tc on-board, attaching it to the target container network.

You can pass list of containers to the Pumba or just write a regular expression to select matching containers. If you will not specify containers, Pumba will try to disturb all running containers. Use --random option, to randomly select only one target container from the provided list. It’s also possible to define a repeatable time interval and duration parameters to better control the amount of chaos you want to create.

Pumba is available as a single binary file for Linux, MacOS and Windows, or as a Docker container.

# Download binary from https://github.com/gaia-adm/pumba/releases
curl https://github.com/gaia-adm/pumba/releases/download/0.4.6/pumba_linux_amd64 --output /usr/local/bin/pumba
chmod +x /usr/local/bin/pumba && pumba --help

# Install with Homebrew (MacOS only)
brew install pumba && pumba --help

# Use Docker image
docker run gaiaadm/pumba pumba --help

Pumba commands examples

First of all, run pumba --help to get help about available commands and options and pumba <command> --help to get help for the specific command and sub-command.

# pumba help
pumba --help

# pumba kill help
pumba kill --help

# pumba netem delay help
pumba netem delay --help

Killing randomly chosen Docker container from ^test regex list.

# on main pane/screen, run 7 test containers that do nothing
for i in {0..7}; do docker run -d --rm --name test$i alpine tail -f /dev/null; done
# run an additional container with 'skipme' name
docker run -d --rm --name skipme alpine tail -f /dev/null

# run this command in another pane/screen to see running docker containers
watch docker ps -a

# go back to main pane/screen and kill (once in 10s) random 'test' container, ignoring 'skipme'
pumba --random --interval 10s kill re2:^test
# press Ctrl-C to stop Pumba at any time

Adding a 3000ms (+-50ms) delay to the engress traffic for the ping container for 20 seconds, using normal distribution model.

# run "ping" container on one screen/pane
docker run -it --rm --name ping alpine ping 8.8.8.8

# on second screen/pane, run pumba netem delay command, disturbing "ping" container; sidekick a "tc" helper container
pumba netem --duration 20s --tc-image gaiadocker/iproute2 delay --time 3000 jitter 50 --distribution normal ping
# pumba will exit after 20s, or stop it with Ctrl-C

To demonstrate packet loss capability, we will need three screens/panes. I will use iperf network bandwidth measurement tool. On the first pane, run server docker container with iperf on-board and start there a UDP server. On the second pane, start client docker container with iperf and send datagrams to the server container. Then, on the third pane, run pumba netem loss command, adding a packet loss to the client container. Enjoy the chaos.

# create docker network
docker network create -d bridge testnet

# > Server Pane
# run server container
docker run -it --name server --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh"
# shell inside server container: run a UDP Server listening on UDP port 5001
sh$ iperf -s -u -i 1

# > Client Pane
# run client container
docker run -it --name client --network testnet --rm alpine sh -c "apk add --no-cache iperf; sh"
# shell inside client container: send datagrams to the server -> see no packet loss
sh$ iperf -c server -u

# > Server Pane
# see server receives datagrams without any packet loss

# > Pumba Pane
# inject 20% packet loss into client container, for 1m
pumba netem --duration 1m --tc-image gaiadocker/iproute2 loss --percent 20 client

# > Client Pane
# shell inside client container: send datagrams to the server -> see ~20% packet loss
sh$ iperf -c server -u

Session and slides


Hope, you find this post useful. I look forward to your comments and any questions you have.


*This is a working draft version. The final post version is published at Codefresh Blog on October 4, 2017.*

03 Jun 2017, 18:00

Debugging remote Node.js application running in a Docker container

Teaser

Suppose you want to debug a Node.js application already running on a remote machine inside Docker container. And would like to do it without modifying command arguments (enabling debug mode) and opening remote Node.js debugger agent port to the whole world.

I bet you didn’t know that it’s possible and also have no idea how to do it.

I encourage you to continue reading this post if you are eager to learn some new cool stuff.

The TodoMVC demo application

I’m using the fork of TodoMVC Node.js application (by Gleb Bahmutov) as a demo application for this blog post. Feel free to clone and play with this repository.

Here is the Dockerfile, I’ve added, for TodoMVC application. It allows to run TodoMVC application inside a Docker container.

FROM alpine:3.5

# install node
RUN apk add --no-cache nodejs-current tini

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

# Build time argument to set NODE_ENV ('production'' by default)
ARG NODE_ENV
ENV NODE_ENV ${NODE_ENV:-production}

# install npm packages: clean obsolete files
COPY package.json /usr/src/app/
RUN npm config set depth 0 && \
    npm install && \
    npm cache clean && \
    rm -rf /tmp/*

# copy source files
COPY . /usr/src/app

EXPOSE 3000

# Set tini as entrypoint
ENTRYPOINT ["/sbin/tini", "--"]

CMD [ "npm", "start" ]

# add VCS labels for code sync and nice reports
ARG VCS_REF="local"
LABEL org.label-schema.vcs-ref=$VCS_REF \          
      org.label-schema.vcs-url="https://github.com/alexei-led/todomvc-express.git"

Building and Running TodoMVC in a Docker container:

To build a new Docker image for TodoMVC application, run the docker build command.

$ # build Docker image; set VCS_REF to current HEAD commit (short)
$ docker build -t local/todomvc --build-arg VCS_REF=`git rev-parse --short HEAD` .
$ # run TodoMVC in a Docker container
$ docker run -d -p 3000:3000 --name todomvc local/todomvc node src/start.js

The Plan

Final Goal - I would like to be able to attach a Node.js debugger to a Node.js application already up and running inside a Docker container, running on remote host machine in AWS cloud, without modifying the application, container, container configuration or restarting it with additional debug flags. Imagine that the application is running and there is some problem happening right now - I want to connect to it with debugger and start looking at the problem.

So, I need a plan - a step by step flow that will help me to achieve the final goal.

Let’s start with exploring the inventory. On the server (AWS EC2 VM) machine, I have a Node.js application running inside a Docker container. On the client (my laptop), I have an IDE (Visual Studio Code, in my case), Node.js application code (git pull/clone) and a Node.js debugger.

So, here is my plan:

  1. Set already running application to debug mode
  2. Expose a new Node.js debugger agent port to enable remote debugging in a secure way
  3. Syncronize client-server code: both should be on the same commit in git tree
  4. Attach a local Node.js debugger to the Node.js debugger agent port on remote server and do it in a secure way
  5. And, if everything works, I should be able to perform regular debugging tasks, like setting breakpoints, inspecting variables, pausing execution and others.

Debug Node in Docker

Step 1: set already running Node.js application to the debug mode

The V8 debugger can be enabled and accessed either by starting Node with the --debug command-line flag or by signaling an existing Node process with SIGUSR1. (Node API documentation)

Cool! So, in order to switch on Node debugger agent, I just need to send the SIGUSR1 signal to the Node.js process of TodoMVC application. Remember, it’s running inside a Docker container. What command can I use to send process signals to an application running in a Docker container?

The docker kill - is my choice! This command does not actually “kill” the PID 1 process, running in a Docker container, but sends a Unix signal to it (by default it sends SIGKILL).

Setting TodoMVC into debug mode

So, all I need to do is to send SIGUSR1 to my TodoMVC application running inside todomvc Docker container.

There are two ways to do this:

  1. Use docker kill --signal command to send SIGUSR1 to PID 1 process running inside Docker container, and if it’s a “proper” (signal forwarding done right) init application (like tini), than this will work
  2. Or execute kill -s SIGUSR1 inside already running Docker container, sending SIGUSR1 signal to the main Node.js process.
$ # send SIGUSR1 with docker kill (if using proper init process)
$ docker kill --signal SIGUSR1 todomvc 
$ # OR run kill command for node process inside todomvc container
$ docker exec -it todomvc sh -c 'kill -s SIGUSR1 $(pidof -s node)'

Let’s verify that Node application is set into debug mode.

$ docker logs todomvc

TodoMVC server listening at http://:::3000
emitting 2 todos
server has new 2 todos
GET / 200 31.439 ms - 3241
GET /app.css 304 4.907 ms - -
Starting debugger agent.
Debugger listening on 127.0.0.1:5858

As you can see the Node.js debugger agent was started, but it can accept connections only from the localhost, see the last output line: Debugger listening on 127.0.0.1:5858

Step 2: expose Node debug port

In order to attach a remote Node.js debugger to a Node application, running in the debug mode, I need:

  1. Allow connection to debugger agent from any (or specific) IP (or IP range)
  2. Open port of Node.js debugger agent outside of Docker container

How to do it when an application is already running in a Docker container and a Node.js debugger agent is ready to talk only with a Node.js debugger running on the same machine, plus a Node.js debugger agent port is not accessible from outside of the Docker container?

Of cause it’s possible to start every Node.js Docker container with exposed debugger port and allow connection from any IP (using --debug-port and --debug Node.js flags), but we are not looking for easy ways :).

It’s not a good idea from a security point of view (allowing unprotected access to a Node.js debugger). Also, if I restart an already running application with debug flags, I’m going to loose the current execution context and may not be able to reproduce the problem I wanted to debug.

I need a better solution!

Unfortunately, Docker does not allow to expose an additional port of already running Docker container. So, I need somehow connect to a running container network and expose a new port for Node.js debugger agent.

Also, it is not possible to tell a Node.js debugger agent to accept connections from different IP addresses, when Node.js process was already started.

Both of above problems can be solved with help of the small Linux utility called socat (SOcket CAT). This is just like the netcat but with security in mind (e.g., it support chrooting) and works over various protocols and through files, pipes, devices, TCP sockets, Unix sockets, a client for SOCKS4, proxy CONNECT, or SSL etc.

From socat man page: > socat is a command line based utility that establishes two bidirectional byte streams and transfers data between them. Because the streams can be constructed from a large set of different types of data sinks and sources (see address types), and because lots of address options may be applied to the streams, socat can be used for many different purposes.

Exactly, what I need!

So, here is the plan. I will run a new Docker container with the socat utility onboard, and configure Node.js debugger port forwarding for TodoMVC container.

socat.Dockerfile:

FROM alpine:3.5
RUN apk add --no-cache socat
CMD socat -h

Building socat Docker container

$ docker build -t local/socat - < socat.Dockerfile

Allow connection to Node debugger agent from any IP

I need to run a “sidecar” socat container in the same network namespace as the todomvc container and define a port forwarding.

$ # define local port forwarding
$ docker run -d --name socat-nid --network=container:todomvc local/socat socat TCP-LISTEN:4848,fork TCP:127.0.0.1:5858

Now any traffic that arrives at 4848 port will be routed to the Node.js debugger agent listening on 127.0.0.1:5858. The 4848 port can accept traffic from any IP. It’s possible to use an IP range to restrict connection to the socat listening port, adding range=<ANY IP RANGE> option.

Exposing Node.js debugger port from Docker container

First, we will get IP of todomvc Docker container.

$ # get IP of todomvc container
$ TODOMVC_IP=$(docker inspect -f "{{.NetworkSettings.IPAddress}}" todomvc)

Then, configure port forwarding to the “sidecar” socat port, we define previously, running on the same network as the todomvc container.

$ # run socat container to expose Node.js debugger agent port forwarder
$ docker run -d -p 5858:5858 --name socat local/socat socat TCP-LISTEN:5858,fork TCP:${TODOMVC_IP}:4848

Any traffic that will arrive at the 5858 port on the Docker host will be forwarded, first, to the 4848 socat port and then to the Node.js debugger agent running inside the todomvc Docker container.

Exposing Node.js debugger port for remote access

In most cases, I would like to debug an application running on a remote machine (AWS EC2 instance, for example). I also do not want to expose a Node.js debugger agent port unprotected to the whole world.

One possible and working solution is to use SSH tunneling to access this port.

$ # Open SSH Tunnel to gain access to servers port 5858. Set `SSH_KEY_FILE` to ssh key location or add it to ssh-agent
$ #
$ # open an ssh tunnel, send it to the bg, and wait 20 seconds for connections
$ # once all connections are closed after 20 seconds then close the tunnel
$ ssh -i ${SSH_KEY_FILE} -f -o ExitOnForwardFailure=yes -L 5858:127.0.0.1:5858 ec2_user@some.ec2.host.com sleep 20

Now all traffic to the localhost:5858 will be tunneled over SSH to the remote Docker host machine and after some socat forwarding to the Node.js debugger agent running inside the todomvc container.

Step 3: Synchronizing on the same code commit

In order to be able to debug a remote application, you need to make sure that you are using the same code in your IDE as one that is running on remote server.

I will try to automate this step too. Remember the LABEL command, I’ve used in TodoMVC Dockerfile?

These labels help me to identify git repository and commit for the application Docker image:

  1. org.label-schema.vcs-ref - contains short SHA for a HEAD commit
  2. org.label-schema.vcs-url - contains an application git repository url (I can use in clone/pull)

I’m using (Label Schema Convention)[http://label-schema.org/rc1/], since I really like it and find it useful, but you can select any other convention too.

This approach allows me, for each, properly labeled, Docker image, to identify the application code repository and the commit it was created from.

$ # get git repository url form Docker image
$ GIT_URL=$(docker inspect local/todomvc | jq -r '.[].ContainerConfig.Labels."org.label-schema.vcs-url"')
$ # get git commit from Docker image
$ GIT_COMMIT=$(docker inspect local/todomvc | jq -r '.[].ContainerConfig.Labels."org.label-schema.vcs-ref"')
$ 
$ # clone git repository, if needed
$ git clone $GIT_URL
$ # set HEAD to same commit as server
$ git checkout $GIT_COMMIT

Now, both my local development environment and remote application are on the same git commit. And I can start to debug my code, finally!

Step 4: Attaching local Node.js debugger to debugger agent port

To start debugging, I need to configure my IDE. In my case, it’s Visual Studio Code and I need to add a new Launch configuration.

This launch configuration specifies remote debugger server and port to attach and remote location for application source files, which should be in sync with local files (see the previous step).

{
    // For more information about Node.js debug attributes, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "attach",
            "name": "Debug Remote Docker",
            "address": "127.0.0.1",
            "port": 5858,
            "localRoot": "${workspaceRoot}/",
            "remoteRoot": "/usr/src/app/"
        }
    ]
}

Summary

And finally, I’ve met my goal: I’m able to attach a Node.js debugger to a Node.js application, that is already up and running in a Docker container on a remote machine.

It was a long journey to find the proper solution, but after I found it, the process does not look complex at all. Now, once I met a new problem in our environment I can easily attach the Node.js debugger to the running application and start exploring the problem. Nice, isn’t it?

I’ve recorded a short movie, just to demonstrate all steps and prove that things are working fluently, exactly as I’ve described in this post.


Hope, you find this post useful. I look forward to your comments and any questions you have.


*This is a working draft version. The final post version is published at Codefresh Blog on June 6, 2017.*

25 Apr 2017, 18:00

Create lean Node.js image with Docker multi-stage build

TL;DR

Starting from Docker 17.05+, you can create a single Dockerfile that can build multiple helper images with compilers, tools, and tests and use files from above images to produce the final Docker image.

Multi-stage Docker Build

The “core principle” of Dockerfile

Docker can build images by reading the instructions from a Dockerfile. A Dockerfile is a text file that contains a list of all the commands needed to build a new Docker image. The syntax of Dockerfile is pretty simple and the Docker team tries to keep it intact between Docker engine releases.

The core principle is very simple: 1 Dockerfile -> 1 Docker Image.

This principle works just fine for basic use cases, where you just need to demonstrate Docker capabilities or put some “static” content into a Docker image.

Once you advance with Docker and would like to create secure and lean Docker images, singe Dockerfile is not enough.

People who insist on following the above principle find themselves with slow Docker builds, huge Docker images (several GB size images), slow deployment time and lots of CVE violations embedded into these images.

The Docker Build Container pattern

Docker Pattern: The Build Container

The basic idea behind Build Container pattern is simple:

Create additional Docker images with required tools (compilers, linters, testing tools) and use these images to produce lean, secure and production ready Docker image.

=============

An example of the Build Container pattern for typical Node.js application:

  1. Derive FROM a Node base image (for example node:6.10-alpine) node and npm installed (Dockerfile.build)
  2. Add package.json
  3. Install all node modules from dependency and devDependency
  4. Copy application code
  5. Run compilers, code coverage, linters, code analysis and testing tools
  6. Create the production Docker image; derive FROM same or other Node base image
  7. install node modules required for runtime (npm install --only=production)
  8. expose PORT and define default CMD (command to run your application)
  9. Push the production image to some Docker registry

This flow assumes that you are using two or more separate Dockerfiles and a shell script or flow tool to orchestrate all steps above.

Example

I use a fork of Let’s Chat node.js application.

Builder Docker image with eslint, mocha and gulp

FROM alpine:3.5
# install node 
RUN apk add --no-cache nodejs
# set working directory
WORKDIR /root/chat
# copy project file
COPY package.json .
# install node packages
RUN npm set progress=false && \
    npm config set depth 0 && \
    npm install
# copy app files
COPY . .
# run linter, setup and tests
CMD npm run lint && npm run setup && npm run test

Production Docker image with ‘production’ node modules only

FROM alpine:3.5
# install node
RUN apk add --no-cache nodejs tini
# set working directory
WORKDIR /root/chat
# copy project file
COPY package.json .
# install node packages
RUN npm set progress=false && \
    npm config set depth 0 && \
    npm install --only=production && \
    npm cache clean
# copy app files
COPY . .
# Set tini as entrypoint
ENTRYPOINT ["/sbin/tini", "--"]
# application server port
EXPOSE 5000
# default run command
CMD npm run start

What is Docker multi-stage build?

Docker 17.0.5 extends Dockerfile syntax to support new multi-stage build, by extending two commands: FROM and COPY.

The multi-stage build allows using multiple FROM commands in the same Dockerfile. The last FROM command produces the final Docker image, all other images are intermediate images (no final Docker image is produced, but all layers are cached).

The FROM syntax also supports AS keyword. Use AS keyword to give the current image a logical name and reference to it later by this name.

To copy files from intermediate images use COPY --from=<image_AS_name|image_number>, where number starts from 0 (but better to use logical name through AS keyword).

Creating a multi-stage Dockerfile for Node.js application

The Dockerfile below makes the Build Container pattern obsolete, allowing to achieve the same result with the single file.

#
# ---- Base Node ----
FROM alpine:3.5 AS base
# install node
RUN apk add --no-cache nodejs-npm tini
# set working directory
WORKDIR /root/chat
# Set tini as entrypoint
ENTRYPOINT ["/sbin/tini", "--"]
# copy project file
COPY package.json .

#
# ---- Dependencies ----
FROM base AS dependencies
# install node packages
RUN npm set progress=false && npm config set depth 0
RUN npm install --only=production 
# copy production node_modules aside
RUN cp -R node_modules prod_node_modules
# install ALL node_modules, including 'devDependencies'
RUN npm install

#
# ---- Test ----
# run linters, setup and tests
FROM dependencies AS test
COPY . .
RUN  npm run lint && npm run setup && npm run test

#
# ---- Release ----
FROM base AS release
# copy production node_modules
COPY --from=dependencies /root/chat/prod_node_modules ./node_modules
# copy app sources
COPY . .
# expose port and define CMD
EXPOSE 5000
CMD npm run start

The above Dockerfile creates 3 intermediate Docker images and single release Docker image (the final FROM).

  1. First image FROM alpine:3.5 AS bas - is a base Node image with: node, npm, tini (init app) and package.json
  2. Second image FROM base AS dependencies - contains all node modules from dependencies and devDependencies with additional copy of dependencies required for final image only
  3. Third image FROM dependencies AS test - runs linters, setup and tests (with mocha); if this run command fail not final image is produced
  4. The final image FROM base AS release - is a base Node image with application code and all node modules from dependencies

Try Docker multi-stage build today

In order to try Docker multi-stage build, you need to get Docker 17.0.5, which is going to be released in May and currently available on the beta channel.

So, you have two options:

  1. Use beta channel to get Docker 17.0.5
  2. Run dind container (docker-in-docker)

Running Docker-in-Docker 17.0.5 (beta)

Running Docker 17.0.5 (beta) in docker container (--privileged is required):

$ docker run -d --rm --privileged -p 23751:2375 --name dind docker:17.05.0-ce-dind --storage-driver overlay2

Try mult-stage build. Add --host=:23751 to every Docker command, or set DOCKER_HOST environment variable.

$ # using --host
$ docker --host=:23751 build -t local/chat:multi-stage .

$ # OR: setting DOCKER_HOST
$ export DOCKER_HOST=localhost:23751
$ docker build -t local/chat:multi-stage .

Summary

With Docker multi-stage build feature, it’s possible to implement an advanced Docker image build pipeline using a single Dockerfile. Kudos to Docker team!


Hope, you find this post useful. I look forward to your comments and any questions you have.


_This is a working draft version. The final post version is published at Codefresh Blog on April 24, 2017._

07 Mar 2017, 18:00

Crafting perfect Java Docker build flow

TL;DR

What is the bare minimum you need to build, test and run my Java application in Docker container?

The recipe: Create a separate Docker image for each step and optimize the way you are running it.

Duke and Container

Introduction

I started working with Java in 1998, and for a long time, it was my main programming language. It was a long love–hate relationship.

DDuring my work career, I wrote a lot of code in Java. Despite that fact, I don’t think Java is usually the right choice for writing microservices running in Docker containers.

But, sometimes you have to work with Java. Maybe Java is your favorite language and you do not want to learn a new one, or you have a legacy code that you need to maintain, or your company decided on Java and you have no other option.

Whatever reason you have to marry Java with Docker, you better do it properly.

In this post, I will show you how to create an effective Java-Docker build pipeline to consistently produce small, efficient, and secure Docker images.

Be careful

There are plenty of “Docker for Java developers” tutorials out there, that unintentionally encourage some Docker bad practices.

For example:

For current demo project, first two tutorials took around 15 minutes to build (first build) and produced images of 1.3GB size each.

Make yourself a favor and do not follow these tutorials!

What should you know about Docker?

Developers new to Docker are often tempted to think of it as just another VM. Instead, think of Docker as a “child process”. The files and packages needed for an entire VM are different from those needed by just another process running a dev machine. Docker is even better than a child process because it allows better isolation and environmental control.

If you’re new to Docker, I suggest reading this Understanding Docker article. Docker isn’t so complex than any developer should not be able to understand how it works.

Dockerizing Java application

What files need to be included in a Java Application’s Docker image?

Since Docker containers are just isolated processes, your Java Docker image should only contain the files required to run your application.

What are these files?

It starts with a Java Runtime Environment (JRE). JRE is a software package, that has everything required to run a Java program. It includes an implementation of the Java Virtual Machine (JVM) with an implementation of the Java Class Library.

I recommend using OpenJDK JRE. OpenJDK is licensed under GPL with Classpath Exception. The Classpath Exception part is important. This license allows using OpenJDK with any software of any license, not just the GPL. In particular, you can use OpenJDK in proprietary software without disclosing your code.

Before using Oracle’s JDK/JRE, please read the following post: “Running Java on Docker? You’re Breaking the Law.”

Since it’s rare for Java applications to be developed using only the standard library, you most likely need to also add 3rd party Java libraries. Then add the application compiled bytecode as plain Java Class files or packaged into JAR archives. And, if you are using native code, you will need to add corresponding native libraries/packages too.

Choosing a base Docker image for Java Application

In order to choose the base Docker image, you need to answer the following questions:

  • What native packages do you need for your Java application?
  • Should you choose Ubuntu or Debian as your base image?
  • What is your strategy for patching security holes, including packages you are not using at all?
  • Do you mind paying extra (money and time) for network traffic and storage of unused files?

Some might say: “but, if all your images share the same Docker layers, you only download them just once, right?”

That’s true in theory, but in reality is often very different.

Usually, you have lots of different images: some you built lately, others a long time ago, others you pull from DockerHub. All these images do not share the same base image or version. You need to invest a lot of time to align these images to share the same base image and then keep these images up-to-date.

Some might say: “but, who cares about image size? we download them just once and run forever”.

Docker image size is actually very important.

The size has an impact on …

  • network latency - need to transfer Docker image over the web
  • storage - need to store all these bits somewhere
  • service availability and elasticity - when using a Docker scheduler, like Kubernetes, Swarm, DC/OS or other (scheduler can move containers between hosts)
  • security - do you really, I mean really need the libpng package with all its CVE vulnerabilities for your Java application?
  • development agility - small Docker images == faster build time and faster deployment

Without being careful, Java Docker images tends to grow to enormous sizes. I’ve seen 3GB Java images, where the real code and required JAR libraries only take around 150MB.

Consider using Alpine Linux image, which is only a 5MBs image, as a base Docker image. Lots of “Official Docker images” have an Alpine-based flavor.

Note: Many, but not all Linux packages have versions compiled with musl libc C runtime library. Sometimes you want to use a package that is compiled with glibc (GNU C runtime library). The frolvlad/alpine-glibc image based on Alpine Linux image and contains glibc to enable proprietary projects, compiled against glibc (e.g. OracleJDK, Anaconda), working on Alpine.

Choosing the right Java Application server

Frequently, you also need to expose some kind of interface to reach your Java application, that runs in a Docker container.

When you deploy Java applications with Docker containers, the default Java deployment model changes.

Originally, Java server-side deployment assumes that you have already pre-configured a Java Web Server (Tomcat, WebLogic, JBoss, or other) and you are deploying an application WAR (Web Archive) packaged Java application to this server and run it together with other applications, deployed on the same server.

Lots of tools are developed around this concept, allowing you to update running applications without stopping the Java Application server, route traffic to the new application, resolve possible class loading conflicts and more.

With Docker-based deployments, you do not need these tools anymore, you don’t even need the fat “enterprise-ready” Java Application servers. The only thing that you need is a stable and scalable network server that can serve your API over HTTP/TCP or other protocol of your choice. Search Google for “embedded Java server” and take one that you like most.

For this demo, I forked Spring Boot’s REST example and modified it a bit. The demo uses Spring Boot with an embedded Tomcat server. Here is my fork on GitHub repository (blog branch).

Building a Java Application Docker image

In order to run this demo, I need to create a Docker image with JRE, the compiled and packaged Java application, and all 3rd party libraries.

Here is the Dockerfile I used to build my Docker image. This demo Docker image is based on slim Alpine Linux with OpenJDK JRE and contains the application WAR file with all dependencies embedded into it. It’s just the bare minimum required to run the demo application.

# Base Alpine Linux based image with OpenJDK JRE only
FROM openjdk:8-jre-alpine

# copy application WAR (with libraries inside)
COPY target/spring-boot-*.war /app.war

# specify default command
CMD ["/usr/bin/java", "-jar", "-Dspring.profiles.active=test", "/app.war"]

To build the Docker image, run the following command:

$ docker build -t blog/sbdemo:latest .

Running the docker history command on created Docker image will let you to see all layers that make up this image:

  • 4.8MB Alpine Linux Layer
  • 103MB OpenJDK JRE Layer
  • 61.8MB Application WAR file
$ docker history blog/sbdemo:latest

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
16d5236aa7c8        About an hour ago   /bin/sh -c #(nop)  CMD ["/usr/bin/java" "-...   0 B                 
e1bbd125efc4        About an hour ago   /bin/sh -c #(nop) COPY file:1af38329f6f390...   61.8 MB             
d85b17c6762e        2 months ago        /bin/sh -c set -x  && apk add --no-cache  ...   103 MB              
<missing>           2 months ago        /bin/sh -c #(nop)  ENV JAVA_ALPINE_VERSION...   0 B                 
<missing>           2 months ago        /bin/sh -c #(nop)  ENV JAVA_VERSION=8u111       0 B                 
<missing>           2 months ago        /bin/sh -c #(nop)  ENV PATH=/usr/local/sbi...   0 B                 
<missing>           2 months ago        /bin/sh -c #(nop)  ENV JAVA_HOME=/usr/lib/...   0 B                 
<missing>           2 months ago        /bin/sh -c {   echo '#!/bin/sh';   echo 's...   87 B                
<missing>           2 months ago        /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0 B                 
<missing>           2 months ago        /bin/sh -c #(nop) ADD file:eeed5f514a35d18...   4.8 MB              

Running the Java Application Docker container

In order to run the demo application, run following command:

$ docker run -d --name demo-default -p 8090:8090 -p 8091:8091 blog/sbdemo:latest

Let’s check, that application is up and running (I’m using the httpie tool here):

$ http http://localhost:8091/info

HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 09 Mar 2017 14:43:28 GMT
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked

{
    "build": {
        "artifact": "${project.artifactId}",
        "description": "boot-example default description",
        "name": "spring-boot-rest-example",
        "version": "0.1"
    }
}

Setting Docker container memory constraints

One thing you need to know about Java process memory allocation is that in reality it consumes more physical memory than specified with the -Xmx JVM option. The -Xmx option specifies only the maximum Java heap size. But the Java process is a regular Linux process and what is interesting, is how much actual physical memory this process is consuming.

Or in other words - what is the Resident Set Size (RSS) value for running a Java process?

Theoretically, in the case of a Java application, a required RSS size can be calculated by:

RSS = Heap size + MetaSpace + OffHeap size

where OffHeap consists of thread stacks, direct buffers, mapped files (libraries and jars) and JVM code itself.

There is a very good post on this topic: Analyzing java memory usage in a Docker container by Mikhail Krestjaninoff.

When using the --memory option in docker run make sure the limit is larger (at least twice) than what you specify for -Xmx.

Offtopic: Using OOM Killer instead of GC

There is an interesting JDK Enhancement Proposal (JEP) by Aleksey Shipilev: [Epsilon GC]((http://openjdk.java.net/jeps/8174901). This JEP proposes to develop a GC that only handles memory allocation, but does not implement any actual memory reclamation mechanism.

This GC, combined with --restart (Docker restart policy) should theoretically allow supporting “Extremely short lived jobs” implemented in Java.

For ultra-performance-sensitive applications, where developers are conscious about memory allocations or want to create completely garbage-free applications - GC cycle may be considered an implementation bug that wastes cycles for no good reason. In such use case, it could be better to allow OOM Killer (Out of Memory) to kill the process and use Docker restart policy to restarting the process.

Anyway, Epsilon GC is not available yet, so it’s just an interesting theoretical use case for a moment.

Building Java applications with Builder container

As you can probably see, in the previous step, I did not explain how I’ve created the application WAR file.

Of course, there is a Maven project file pom.xml which most Java developers should be familiar with. But, in order to actually build, you need to install the same Java Build tools (JDK and Maven) on every machine, where you are building the application. You need to have the same versions, use the same repositories and share the same configurations. While’s tt’s possible, managing different projects that rely on different tools, versions, configurations, and development environments can quickly become a nightmare.

What if you might also want to run a build on a clean machine that does not have Java or Maven installed? What should you do?

Java Builder Container

Docker can help here too. With Docker, you can create and share portable development and build environments. The idea is to create a special Builder Docker image, that contains all tools you need to properly build your Java application, e.g.: JDK, Ant, Maven, Gradle, SBT or others.

To create a really useful Builder Docker image, you need to know well how you Java Build tools are working and how docker build invalidates build cache. Without proper design, you will end up with non-effective and slow builds.

Running Maven in Docker

While most of these tools were created nearly a generation ago, they are still are very popular and widely used by Java developers.

Java development life is hard to imagine without some extra build tools. There are multiple Java build tools out there, but most of them share similar concepts and serve the same targets - resolve cumbersome package dependencies, and run different build tasks, such as, compile, lint, test, package, and deploy.

In this post, I will use Maven, but the same approach can be applied to Gradle, SBT, and other less popular Java Build tools.

It’s important to learn how your Java Build tool works and how can it’s tuned. Apply this knowledge, when creating a Builder Docker image and the way you are running a Builder Docker container.

Maven uses the project level pom.xml file to resolve project dependencies. It downloads missing JAR files from private and public Maven repositories, and caches these files for future builds. Thus, next time you run your build, it won’t download anything if your dependency had not been changed.

Official Maven Docker image: should you use it?

The Maven team provides an official Docker images. There are multiple images (under different tags) that allow you to select an image that can answer your needs. Take a deeper look at the Dockerfile files and mvn-entrypoint.sh shell scripts when selecting Maven image to use.

There are two flavors of official Maven Docker images: regular images (JDK version, Maven version, and Linux distro) and onbuild images.

What is the official Maven image good for?

The official Maven image does a good job containerizing the Maven tool itself. The image contains some JDK and Maven version. Using such image, you can run Maven build on any machine without installing a JDK and Maven.

Example: running mvn clean install on local folder

$ docker run -it --rm --name my-maven-project -v "$PWD":/usr/src/app -w /usr/src/app maven:3.2-jdk-7 mvn clean install

Maven local repository, for official Maven images, is placed inside a Docker data volume. That means, all downloaded dependencies are not part of the image and will disappear once the Maven container is destroyed. If you do not want to download dependencies on every build, mount Maven repository Docker volume to some persistent storage (at least local folder on the Docker host).

Example: running mvn clean install on local folder with properly mounted Maven local repository

$ docker run -it --rm --name my-maven-project -v "$PWD":/usr/src/app -v "$HOME"/.m2:/root/.m2 -w /usr/src/app maven:3.2-jdk-7 mvn clean install

Now, let’s take a look at onbuild Maven Docker images.

What is Maven onbuild image?

Maven onbuild Docker image exists to “simplify” developer’s life, allowing him/er skip writing a Dockerfile. Actually, a developer should write a Dockerfile, but it’s usually enough to have the single line in it:

FROM maven:<versions>-onbuild

Looking into onbuild Dockerfile on the GitHub repository …

FROM maven:<version>

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

ONBUILD ADD . /usr/src/app
ONBUILD RUN mvn install

… you can see several Dockerfile commands with the ONBUILD prefix. The ONBUILD tells Docker to postpone the execution of these build commands until building a new image that inherits from the current image.

In our example, two build commands will be executed, when you build the application Dockerfile created FROM: maven:<version>-onbuild :

  • Add current folder (all files, if you are not using .dockerignore) to the new Docker image
  • Run mvn install Maven target

The onbuild Maven Docker image is not as useful as the previous image.

First of all, it copies everything from the current repository, so do not use it without a properly configured .dockerignore file.

Then, think: what kind of image you are trying to build?

The new image, created from onbuild Maven Docker image, includes JDK, Maven, application code (and potentially all files from current directory), and all files produced by Maven install phase (compiled, tested and packaged app; plus lots of build junk files you do not really need).

So, this Docker image contains everything, but, for some strange reason, does not contain a local Maven repository. I have no idea why the Maven team created this image.

Recommendation: Do not use Maven onbuild images!

If you just want to use Maven tool, use non-onbuild image.

If you want to create proper Builder image, I will show you how to do this later in this post.

Where to keep Maven cache?

Official Maven Docker image chooses to keep Maven cache folder outside of the container, exposing it as a Docker data volume, using VOLUME root/.m2 command in the Dockerfile. A Docker data volume is a directory within one or more containers that bypasses the Docker Union File System, in simple words: it’s not part of the Docker image.

What you should know about Docker data volumes:

  • Volumes are initialized when a container is created.
  • Data volumes can be shared and reused among containers.
  • Changes to a data volume are made directly to the mounted endpoint (usually some directory on host, but can be some storage device too)
  • Changes to a data volume will not be included when you update an image or persist Docker container.
  • Data volumes persist even if the container itself is deleted.

So, in order to reuse Maven cache between different builds, mount a Maven cache data volume to some persistent storage (for example, a local directory on the Docker host).

$ docker run -it --rm --volume "$PWD"/pom.xml://usr/src/app/pom.xml --volume "$HOME"/.m2:/root/.m2 maven:3-jdk-8-alpine mvn install

The command above runs the official Maven Docker image (Maven 3 and OpenJDK 8), mounts project pom.xml file into working directory and $HOME"/.m2 folder for Maven cache data volume. Maven running inside this Docker container will download all required JAR files into host’s local

Maven running inside this Docker container will download all required JAR files into host’s local folder $HOME/.m2. Next time you create new Maven Docker container for the same pom.xml file and the same cache mount, Maven will reuse the cache and will download only missing or updated JAR files.

Maven Builder Docker image

First, let’s try to formulate what is the Builder Docker image and what should it contain?

Builder is a Docker image that contains everything to allow you creating a reproducible build on any machine and at any point of time.

So, what should it contain?

  • Linux shell and some tools - I prefer Alpine Linux
  • JDK (version) - for the javac compiler
  • Maven (version) - Java build tool
  • Application source code and pom.xml file/s - it’s the application code SNAPSHOT at specific point of time; just code, no need to include a .git repository or other files
  • Project dependencies (Maven local repository) - all POM and JAR files you need to build and test Java application, at any time, even offline, even if library disappear from the web

The Builder image captures code, dependencies, and tools at a specific point of time and stores them inside a Docker image. The Builder container can be used to create the application “binaries” on any machine, at any time and even without internet connection (or with poor connection).

Here is the sample Dockerfile for my demo Builder:

FROM openjdk:8-jdk-alpine

# ----
# Install Maven
RUN apk add --no-cache curl tar bash

ARG MAVEN_VERSION=3.3.9
ARG USER_HOME_DIR="/root"

RUN mkdir -p /usr/share/maven && \
  curl -fsSL http://apache.osuosl.org/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz | tar -xzC /usr/share/maven --strip-components=1 && \
  ln -s /usr/share/maven/bin/mvn /usr/bin/mvn

ENV MAVEN_HOME /usr/share/maven
ENV MAVEN_CONFIG "$USER_HOME_DIR/.m2"
# speed up Maven JVM a bit
ENV MAVEN_OPTS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"

ENTRYPOINT ["/usr/bin/mvn"]

# ----
# Install project dependencies and keep sources 

# make source folder
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

# install maven dependency packages (keep in image)
COPY pom.xml /usr/src/app
RUN mvn -T 1C install && rm -rf target

# copy other source files (keep in image)
COPY src /usr/src/app/src

Let’s go over this Dockerfile and I will try to explain the reasoning behind each command.

  • FROM: openjdk:8-jdk-alpine - select and freeze JDK version: OpenJDK 8 and Linux Alpine
  • Install Maven
    • ARG ... - Use build arguments to allow overriding Maven version and local repository location (MAVEN_VERSION and USER_HOME_DIR) with docker build --build-arg ...
    • RUN mkdir -p ... curl ... tar ... - Download and install (untar and ln -s) Apache Maven
    • Speed up Maven JVM a bit: MAVEN_OPTS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1", read the following post
  • RUN mvn -T 1C install && rm -rf target Download project dependencies:
    • Copy project pom.xml file and run mvn install command and remove build artifacts as far as I know, there is no Maven command that will let you download without installing)
    • This Docker image layer will be rebuilt only when project’s pom.xml file changes
  • COPY src /usr/src/app/src - copy project source files (source, tests, and resources)

Note: if you are using Maven Surefire plugin and want to have all dependencies for the offline build, make sure to lock down Surefire test provider.

When you build a new Builder version, I suggest you use a --cache-from option passing previous Builder image to it. This will allow you reuse any unmodified Docker layer and avoid obsolete downloads most of the time (if pom.xml did not change or you did not decide to upgrade Maven or JDK).

$ # pull latest (or specific version) builder image
$ docker pull myrep/mvn-builder:latest
$ # build new builder
$ docker build -t myrep/mvn-builder:latest --cache-from myrep/mvn-builder:latest .
Use Builder container to run tests
$ # run tests - test results are saved into $PWD/target/surefire-reports
$ docker run -it --rm -v "$PWD"/target:/usr/src/app/target myrep/mvn-builder -T 1C -o test
Use Builder container to create application WAR
$ # create application WAR file (skip tests) - $PWD/target/spring-boot-rest-example-0.3.0.war
$ docker run -it --rm -v $(shell pwd)/target:/usr/src/app/target myrep/mvn-builder package -T 1C -o -Dmaven.test.skip=true

Summary

Take a look at images bellow:

REPOSITORY      TAG     IMAGE ID     CREATED        SIZE
sbdemo/run      latest  6f432638aa60 7 minutes ago  143 MB
sbdemo/tutorial 1       669333d13d71 12 minutes ago 1.28 GB
sbdemo/tutorial 2       38634e4d9d5e 3 hours ago    1.26 GB
sbdemo/builder  mvn     2d325a403c5f 5 days ago     263 MB
  • sbdemo/run:latest - Docker image for demo runtime: Alpine, OpenJDK JRE only, demo WAR
  • sbdemo/builder:mvn - Builder Docker image: Alpine, OpenJDK 8, Maven 3, code, dependency
  • sbdemo/tutorial:1 - Docker image created following first tutorial (just for reference)
  • sbdemo/tutorial:2 - Docker image created following second tutorial (just for reference)

Bonus: Build flow automation

In this section, I will show how to use Docker build flow automation service to automate and orchestrate all steps from this post.

Build Pipeline Steps

I’m going to use Codefresh.io Docker CI/CD service (the company I’m working for) to create a Builder Docker image for Maven, run tests, create application WAR, build Docker image for application and deploy it to DockerHub.

The Codefresh automation flow YAML (also called pipeline) is pretty straight forward:

  • it contains ordered list of steps
  • each step can be of type:
  • - build - for docker build command
  • - push - for docker push
  • - composition - for creating environment, specified with docker-compose
  • - freestyle (default if not specified) - for docker run command
  • /codefresh/volume/ data volume (git clone and files generated by steps) is mounted into each step
  • current working directory for each step is set to /codefresh/volume/ by default (can be changed)

For detailed description and other examples, take a look at the documentation.

For my demo flow I’ve created following automation steps:

  1. mvn_builder - create Maven Builder Docker image
  2. mv_test - execute tests in Builder container, place test results into /codefresh/volume/target/surefire-reports/ data volume folder
  3. mv_package - create application WAR file, place created file into /codefresh/volume/target/ data volume folder
  4. build_image - build application Docker image with JRE and application WAR file
  5. push_image - tag and push the application Docker image to DockerHub

Here is the full Codefresh YAML:

version: '1.0'

steps:

  mvn_builder:
    type: build
    description: create Maven builder image
    dockerfile: Dockerfile.build
    image_name: <put_you_repo_here>/mvn-builder

  mvn_test:
    description: run unit tests 
    image: ${{mvn_builder}}
    commands:
      - mvn -T 1C -o test
  
  mvn_package:
    description: package application and dependencies into WAR 
    image: ${{mvn_builder}}
    commands:
      - mvn package -T 1C -o -Dmaven.test.skip=true

  build_image:
    type: build
    description: create Docker image with application WAR
    dockerfile: Dockerfile
    working_directory: ${{main_clone}}/target
    image_name: <put_you_repo_here>/sbdemo

  push_image:
    type: push
    description: push application image to DockerHub
    candidate: '${{build_image}}'
    tag: '${{CF_BRANCH}}'
    credentials:
      # set docker registry credentials in project configuration
      username: '${{DOCKER_USER}}'
      password: '${{DOCKER_PASS}}'

Hope, you find this post useful. I look forward to your comments and any questions you have.


_This is a working draft version. The final post version is published at Codefresh Blog on March 22, 2017._

02 Jan 2017, 18:00

Everyday hacks for Docker

In this post, I’ve decided to share with you some useful commands and tools, I’m frequently using, working with amazing Docker technology. There is no particular order or “coolness level” for every “hack”. I will try to present the use case and how does specific command or tool help me with my work.

Docker Hacks

Cleaning up

Working with Docker for some time, you start to accumulate development junk: unused volumes, networks, exited containers and unused images.

One command to “rule them all”

$ docker system  prune

prune is a very useful command (works also for volume and network sub-commands), but it’s only available for Docker 1.13. So, if you are using older Docker versions, then following commands can help you to replace the prune command.

Remove dangling volumes

dangling volumes - volumes not in use by any container. To remove them, combine two commands: first, list volume IDs for dangling volumes and then remove them.

$ docker volume rm $(docker volume ls -q -f "dangling=true")

Remove exited containers

The same principle works here too: first, list containers (only IDs) you want to remove (with filter) and then remove them (consider rm -f to force remove).

$ docker rm $(docker ps -q -f "status=exited")

Remove dangling images

dangling images are Docker untagged images, that are the leaves of the images tree (not intermediary layers).

docker rmi $(docker images -q -f "dangling=true")

Autoremove interactive containers

When you run a new interactive container and want to avoid typing rm command after it exits, use --rm option. Then when you exit from created container, it will be automatically destroyed.

$ docker run -it --rm alpine sh

Inspect Docker resources

jq - jq is a lightweight and flexible command-line JSON processor. It is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

docker info and docker inspect commands can produce output in JSON format. Combine these commands with jq processor.

Pretty JSON and jq processing

# show whole Docker info
$ docker info --format "{{json .}}" | jq .

# show Plugins only
$ docker info --format "{{json .Plugins}}" | jq .

# list IP addresses for all containers connected to 'bridge' network
$ docker network inspect bridge -f '{{json .Containers}}' | jq '.[] | {cont: .Name, ip: .IPv4Address}'

Watching containers lifecycle

Sometimes you want to see containers being activated and exited, when you run some docker commands or try different restart policies. watch command combined with docker ps can be pretty useful here. The docker stats command, even with --format option is not useful for this use case since it does not allow you to see same info as you can see with docker ps command.

Display a table with ‘ID Image Status’ for active containers and refresh it every 2 seconds

$ watch -n 2 'docker ps --format "table {{.ID}}\t {{.Image}}\t {{.Status}}"'

Enter into host/container Namespace

Sometimes you want to connect to the Docker host. The ssh command is the default option, but this option can be either not available, due to security settings, firewall rules or not documented (try to find how to ssh into Docker for Mac VM).

nsenter, by Jérôme Petazzoni, is a small and very useful tool for above cases. nsenter allows to enter into namespaces. I like to use minimalistic (580 kB) walkerlee/nsenter Docker image.

Enter into Docker host

Use --pid=host to enter into Docker host namespaces

# get a shell into Docker host
$ docker run --rm -it --privileged --pid=host walkerlee/nsenter -t 1 -m -u -i -n sh

Enter into ANY container

It’s also possible to enter into any container with nsenter and --pid=container:[id OR name]. But in most cases, it’s better to use standard docker exec command. The main difference is that nsenter doesn’t enter the cgroups, and therefore evades resource limitations (can be useful for debugging).

# get a shell into 'redis' container namespace
$ docker run --rm -it --privileged --pid=container:redis walkerlee/nsenter -t 1 -m -u -i -n sh

Heredoc Docker container

Suppose you want to get some tool as a Docker image, but you do not want to search for a suitable image or to create a new Dockerfile (no need to keep it for future use, for example). Sometimes storing a Docker image definition in a file looks like an overkill - you need to decide how do you edit, store and share this Dockerfile. Sometimes it’s better just to have a single line command, that you can copy, share, embed into a shell script or create special command alias. So, when you want to create a new ad-hoc container with a single command, try a Heredoc approach.

Create Alpine based container with ‘htop’ tool

$ docker build -t htop - << EOF
FROM alpine
RUN apk --no-cache add htop
EOF

Docker command completion

Docker CLI syntax is very rich and constantly growing: adding new commands and new options. It’s hard to remember every possible command and option, so having a nice command completion for a terminal is a must have.

Command completion is a kind of terminal plugin, that lets you auto-complete or auto-suggest what to type in next by hitting tab key. Docker command completion works both for commands and options. Docker team prepared command completion for docker, docker-machine and docker-compose commands, both for Bash and Zsh.

If you are using Mac and Homebrew, then installing Docker commands completion is pretty straight forward.

# Tap homebrew/completion to gain access to these
$ brew tap homebrew/completions

# Install completions for docker suite
$ brew install docker-completion
$ brew install docker-compose-completion
$ brew install docker-machine-completion

For non-Mac install read official Docker documentation: docker engine, docker-compose and docker-machine

Start containers automatically

When you are running process in Docker container, it may fail due to multiple reasons. Sometimes to fix this failure it’s enough to rerun the failed container. If you are using Docker orchestration engine, like Swarm or Kubernetes, the failed service will be restarted automatically. But if you are using plain Docker and want to restart container, based on exit code of container’s main process or always (regardless the exit code), Docker 1.12 introduced a very helpful option for docker run command: restart.

Restart always

Restart the redis container with a restart policy of always so that if the container exits, Docker will restart it.

$ docker run --restart=always redis

Restart container on failure

Restart the redis container with a restart policy of on-failure and a maximum restart count of 10.

$ docker run --restart=on-failure:10 redis

Network tricks

There are cases when you want to create a new container and connect it to already existing network stack. It can be Docker host network or another container’s network. This can be pretty useful for debugging and audition network issues. The docker run --network/net option support this use case.

Use Docker host network stack

$ docker run --net=host ...

The new container will attach to same network interfaces as the Docker host.

Use another container’s network stack

$ docker run --net=container:<name|id> ...

The new container will attach to same network interfaces as another container. The target container can be specified with id or name.

Attachable overlay network

Using docker engine running in swarm mode, you can create a multi-host overlay network on a manager node. When you create a new swarm service, you can attach it to the previously created overlay network.

Sometimes to inspect network configuration or debug network issues, you want to attach a new Docker container, filled with different network tools, to existing overlay network and do this with docker run command and not to create a new “debug service”.

Docker 1.13 brings a new option to docker network create command: attachable. The attachable option enables manual container attachment.

# create an attachable overlay network
$ docker network create --driver overlay --attachable mynet
# create net-tools container and attach it to mynet overlay network
$ docker run -it --rm --net=mynet net-tools sh

This is a working draft version. The final post version is published at Codefresh Blog on January 5, 2017.

18 Dec 2016, 14:00

Deploy Docker Compose (v3) to Swarm (mode) Cluster

Disclaimer: all code snippets bellow are working only with Docker 1.13+

TL;DR

Docker 1.13 simplifies deployment of composed application to a swarm (mode) cluster. And you can do it without creating a new dab (Distribution Application Bundle) file, but just using familiar and well-known docker-compose.yml syntax (with some additions) and --compose-file option.

Compose to Swarm

Swarm cluster

Docker Engine 1.12 introduced a new swarm mode for natively managing a cluster of Docker Engines called a swarm. Docker swarm mode implements Raft Consensus Algorithm and does not require using external key value store anymore, such as Consul or etcd.

If you want to run a swarm cluster on a developer’s machine, there are several options.

The first option and most widely known, is to use a docker-machine tool with some virtual driver (Virtualbox, Parallels or other).

But, in this post I will use another approach: using docker-in-docker Docker image with Docker for Mac, see more details in my Docker Swarm cluster with docker-in-docker on MacOS post.

Docker Registry mirror

When you deploy a new service on local swarm cluster, I recommend to setup local Docker registry mirror and run all swarm nodes with --registry-mirror option, pointing to local Docker registry. By running a local Docker registry mirror, you can keep most of the redundant image fetch traffic on your local network and speedup service deployment.

Docker Swarm cluster bootstrap script

I’ve prepared a shell script to bootstrap 4 nodes swarm cluster with Docker registry mirror and very nice swarm visualizer application.

The script initialize docker engine as a swarm master, then starts 3 new docker-in-docker containers and join them to the swarm cluster as worker nodes. All worker nodes run with --registry-mirror option.

Deploy multi-container application - the “old” way

The Docker compose is a tool (and deployment specification format) for defining and running composed multi-container Docker applications. Before Docker 1.12, you could use docker-compose tool to deploy such applications to a swarm cluster. With 1.12 release, it’s not possible anymore: docker-compose can deploy your application only on single Docker host.

In order to deploy it to a swarm cluster, you need to create a special deployment specification file (also knows as Distribution Application Bundle) in dab format (see more here).

The way to create this file, is to run the docker-compose bundle command. The output of this command is a JSON file, that describes multi-container composed application with Docker images referenced by @sha256 instead of tags. Currently dab file format does not support multiple settings from docker-compose.yml and does not allow to use supported options from docker service create command.

Such a pity story: the dab bundle format looks promising, but currently is totally useless (at least in Docker 1.12).

Deploy multi-container application - the “new” way

With Docker 1.13, the “new” way to deploy a multi-container composed application is to use docker-compose.yml again (hurrah!). Kudos to Docker team!

*Note: And you do not need the docker-compose tool, only yaml file in docker-compose format (version: "3")

$ docker deploy --compose-file docker-compose.yml myapp

Docker compose v3 (version: "3")

So, what’s new in docker compose version 3?

First, I suggest you take a deeper look at docker-compose schema. It is an extension of well-known docker-compose format.

Note: docker-compose tool (ver. 1.9.0) does not support docker-compose.yaml version: "3" yet.

The most visible change is around swarm ***service deployment. Now you can specify all options supported by docker service create/update commands:

  • number of service replicas (or global service)
  • service labels
  • hard and soft limits for service (container) CPU and memory
  • service restart policy
  • service rolling update policy
  • deployment placement constraints link

Docker compose v3 example

I’ve created a “new” compose file (v3) for classic “Cats vs. Dogs” example. This example application contains 5 services with following deployment configurations:

  1. voting-app - a Python webapp which lets you vote between two options; requires redis
  2. redis - Redis queue which collects new votes; deployed on swarm manager node
  3. worker .NET worker which consumes votes and stores them in db;
    • # of replicas: 2 replicas
    • hard limit: max 25% CPU and 512MB memory
    • soft limit: max 25% CPU and 256MB memory
    • placement: on swarm worker nodes only
    • restart policy: restart on-failure, with 5 seconds delay, up to 3 attempts
    • update policy: one by one, with 10 seconds delay and 0.3 failure rate to tolerate during the update
  4. db - Postgres database backed by a Docker volume; deployed on swarm manager node
  5. result-app Node.js webapp which shows the results of the voting in real time; 2 replicas, deployed on swarm worker nodes

Run the docker deploy --compose-file docker-compose.yml command to deploy my version of “Cats vs. Dogs” application on a swarm cluster.


Hope you find this post useful. I look forward to your comments and any questions you have.


This is a working draft version. The final post version is published at Codefresh Blog on December 25, 2016.

26 Nov 2016, 16:00

Do not ignore .dockerignore

TL;DR

Tip: Consider to define and use .dockerignore file for every Docker image you are building. It can help you to reduce Docker image size, speedup docker build and avoid unintended secret exposure.

Overloaded container ship

Docker build context

The docker build command is used to build a new Docker image. There is one argument you can pass to the build command build context.

So, what is the Docker build context?

First, remember, that Docker is a client-server application, it consists from Docker client and Docker server (also known as daemon). The Docker client command line tool talks with Docker server and asks it do things. One of these things is build: building a new Docker image. The Docker server can run on the same machine as the client, remote machine or virtual machine, that also can be local, remote or even run on some cloud IaaS.

Why is that important and how is the Docker build context related to this fact?

In order to create a new Docker image, Docker server needs an access to files, you want to create the Docker image from. So, you need somehow to send these files to the Docker server. These files are the Docker build context. The Docker client packs all build context files into tar archive and uploads this archive to the Docker server. By default client will take all files (and folders) in current working directory and use them as the build context. It can also accept already created tar archive or git repository. In a case of git repository, the client will clone it with submodules into a temporary folder and will create a build context archive from it.

Impact on Docker build

The first output line, that you see, running the docker build command is:

Sending build context to Docker daemon 45.3 MB
Step 1: FROM ...

This should make things clear. Actually, every time you are running the docker build command, the Docker client creates a new build context archive and sends it to the Docker server. So, you are always paying this “tax”: the time it takes to create an archive, storage and network traffic and latency time.

Tip: The rule of thumb is not adding files to the build context, if you do not need them in your Docker image.

The .dockerignore file

The .dockerignore file is the tool, that can help you to define the Docker build context you really need. Using this file, you can specify ignore rules and exceptions from these rules for files and folder, that won’t be included in the build context and thus won’t be packed into an archive and uploaded to the Docker server.

Why should you care?

Indeed, why should you care? Computers today are fast, networks are also pretty fast (hopefully) and storage is cheap. So, this “tax” may be not that big, right? I will try to convince you, that you should care.

Reason #1: Docker image size

The world of software development is shifting lately towards continuous delivery, elastic infrastructure and microservice architecture.

How is that related?

Your systems are composed of multiple components (or microservices), each one of them running inside Linux container. There might be tens or hundreds of services and even more service instances. These service instances can be built and deployed independently of each other and this can be done for every single code commit. More than that, elastic infrastructure means that new compute nodes can be added or removed from the system and its microservices can move from node to node, to support scale or availability requirements. That means, your Docker images will be frequently built and transferred.

When you practice continuous delivery and microservice architecture, image size and image build time do matter.

Reason #2: Unintended secrets exposure

Not controlling your build context, can also lead to an unintended exposure of your code, commit history, and secrets (keys and credentials).

If you copy files into you Docker image with ADD . or COPY . command, you may unintendedly include your source files, whole git history (a .git folder), secret files (like .aws, .env, private keys), cache and other files not only into the Docker build context, but also into the final Docker image.

There are multiple Docker images currently available on DockerHub, that expose application source code, passwords, keys and credentials (for example Twitter Vine).

Reason #3: The Docker build - cache invalidation

A common pattern is to inject an application’s entire codebase into an image using an instruction like this:

COPY . /usr/src/app

In this case, we’re copying the entire build context into the image. It’s also important to understand, that every Dockerfile command generates a new layer. So, if any of included file changes in the entire build context, this change will invalidate the build cache for COPY . /opt/myapp layer and a new image layer will be generated on the next build.

If your working directory contains files that are frequently updated (logs, test results, git history, temporary cache files and similar), you are going to regenerate this layer for every docker build run.

The .dockerignore syntax

The .dockerignore file is similar to gitignore file, used by git tool. similarly to .gitignore file, it allows you to specify a pattern for files and folders that should be ignored by the Docker client when generating a build context. While .dockerignore file syntax used to describe ignore patterns is similar to .gitignore it’s not the same.

The .dockerignore pattern matching syntax is based on Go filepath.Match() function and includes some additions.

Here is the complete syntax for the .dockerignore:

pattern:
    { term }
term:
    '*'         matches any sequence of non-Separator characters
    '?'         matches any single non-Separator character
    '[' [ '^' ] { character-range } ']'
                character class (must be non-empty)
    c           matches character c (c != '*', '?', '\\', '[')
    '\\' c      matches character c

character-range:
    c           matches character c (c != '\\', '-', ']')
    '\\' c      matches character c
    lo '-' hi   matches character c for lo <= c <= hi

additions:
  '**'        matches any number of directories (including zero)
  '!'         lines starting with ! (exclamation mark) can be used to make exceptions to exclusions
    '#'         lines starting with this character are ignored: use it for comments

Note: Using the ! character is pretty tricky. The combination of it and patterns before and after line with the ! character can be used to create more advanced rules.

Examples

# ignore .git and .cache folders
.git
.cache
# ignore all *.class files in all folders, including build root
**/*.class
# ignore all markdown files (md) beside all README*.md other than README-secret.md
*.md
!README*.md
README-secret.md

Next


Hope you find this post useful. I look forward to your comments and any questions you have.


This is a working draft version. The final post version is published at Codefresh Blog on December 8, 2016.

06 Oct 2016, 16:00

Docker Swarm cluster with docker-in-docker on MacOS

TL;DR

Docker-in-Docker dind can help you to run Docker Swarm cluster on your Macbook only with Docker for Mac (v1.12+). No virtualbox, docker-machine, vagrant or other app is required.

The Beginning

One day, I’ve decided to try running Docker 1.12 Swarm cluster on my MacBook Pro. Docker team did a great job releasing Docker for Mac, and from that time I forgot all problems I used to have with boot2docker. I really like Docker for Mac: it’s fast, lightweight, tightly integrated with MacOS and significantly simplifies my life when working in changing network environment. The only missing thing is that it’s possible to create and work with single Docker daemon running inside xhyve VM. Shit! I want a cluster.

Of cause, it’s possible to create Swarm cluster with docker-machine tool, but it’s not MacOS friendly and requires to install additional VM software, like VirtualBox or Parallels (why? I already have xhyve!). I have different network for work office and home. At work I’m behind corporate proxy with multiple firewall filters. At home, of cause, life is better. docker-machine requires to create dedicated VMs for each environment and thus force me juggling with different shell scripts when I switch from one to another. It’s possible, but it’s not fun.

I just want to have multi-node Swarm cluster with Docker for Mac (and xhyve VM). As simple as it is.

I’m a lazy person and if there is an already existing solution, I will alway choose one, even if it’s not ideal. So, after googling for a while, I’ve failed to find any suitable solution or blog post. So, I’ve decided to create my own and share it with you.

The Idea

The basic idea is to use Docker for Mac for running Swam master and several Docker-in-Docker containers for running Swarm worker nodes.

First, lets init our Swarm master:

# init Swarm master
docker swarm init

… keep Swarm join token:

# get join token
SWARM_TOKEN=$(docker swarm join-token -q worker)

… and Docker xhyve VM IP:

# get Swarm master IP (Docker for Mac xhyve VM IP)
SWARM_MASTER=$(docker info | grep -w 'Node Address' | awk '{print $3}')

… now let’s create 3 worker nodes and join these nodes to our cluster

# run NUM_WORKERS workers with SWARM_TOKEN
NUM_WORKERS=3
for i in $(seq “${NUM_WORKERS}"); do
  docker run -d --privileged --name worker-${i} --hostname=worker-${i} -p ${i}2375:2375 docker:1.12.1-dind
  docker --host=localhost:${i}2375 swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER}:2377
done

Listing all our Swarm cluster nodes:

# list Swarm nodes :)
docker node ls

… you should see something like this:

ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
1f6z8pioh3vuaz84gyp0biqt0    worker-2  Ready   Active
35z72o6zjhs9u1h99lrwzvx5n    worker-3  Ready   Active
d9ph5cmc604wp1vhhs754nnxx *  moby      Ready   Active        Leader
dj3gnpv86uqrw4b9mo9ux4jb5    worker-1  Ready   Active

That’s all folks! Now, you have running Swarm cluster on your Macbook and your Docker client is talking with Swarm master.

Nice tip:

You can use very nice Swarm visualizer by Mano Marks to see your Swarm cluster “in action”.

Run it with following command:

docker run -it -d -p 8000:8000 -e HOST=localhost -e PORT=8000 -v /var/run/docker.sock:/var/run/docker.sock manomarks/visualizer

And you should be able to see something like this (after you deploy some demo app):

Docker Swarm visualizer: Voting App

01 Aug 2016, 20:00

Network emulation for Docker containers

TL;DR

Pumba netem delay and netem loss commands can emulate network delay and packet loss between Docker containers, even on single host. Give it a try!

Introduction

Microservice architecture has been adopted by software teams as a way to deliver business value faster. Container technology enables delivery of microservices into any environment. Docker has accelerated this by providing an easy to use toolset for development teams to build, ship, and run distributed applications. These applications can be composed of hundreds of microservices packaged in Docker containers.

In a recent NGINX survey [Finding #7], the “biggest challenge holding back developers” is the trade-off between quality and speed. As Martin Fowler indicates, testing strategies in microservices architecture can be very complex. Creating a realistic and useful testing environment is an aspect of this complexity.

One challenge is simulating network failures to ensure resiliency of applications and services.

The network is a critical arterial system for ensuring reliability for any distributed application. Network conditions are different depending on where the application is accessed. Network behavior can greatly impact the overall application availability, stability, performance, and user experience (UX). It’s critical to simulate and understand these impacts before the user notices. Testing for these conditions requires conducting realistic network tests.

After Docker containers are deployed in a cluster, all communication between containers happen over the network. These containers run on a single host, different hosts, different networks, and in different datacenters.

How can we test for the impact of network behavior on the application? What can we do to emulate different network properties between containers on a single host or among clusters on multiple hosts?

Pumba with Network Emulation

Pumba is a chaos testing tool for Docker containers, inspired by Netflix Chaos Monkey. The main benefit is that it works with containers instead of VMs. Pumba can kill, stop, restart running Docker containers or pause processes within specified containers. We use it for resilience testing of our distributed applications. Resilience testing ensures reliability of the system. It allows the team to verify their application recovers correctly regardless of any event (expected or unexpected) without any loss of data or functionality. Pumba simulates these events for distributed and containerized applications.

Pumba netem

We enhanced Pumba with network emulation capabilities starting with delay and packet loss. Using pumba netem command we can apply delay or packet loss on any Docker container. Under the hood, Pumba uses Linux kernel traffic control (tc) with netem queueing discipline. To work, we need to add iproute2 to Docker images, that we want to test. Some base Docker images already include iproute2 package.

Pumba netem delay and netem loss commands can emulate network delay and packet loss between Docker containers, even on a single host.

Linux has a built-in network emulation capabilities, starting from kernel 2.6.7 (released 14 years ago). Linux allows us to manipulate traffic control settings, using tc tool, available in iproute2; netem is an extension (queueing discipline) of the tc tool. It allows emulation of network properties — delay, packet loss, packer reorder, duplication, corruption, and bandwidth rate.

Pumba netem commands can help development teams simulate realistic network conditions as they build, ship, and run microservices in Docker containers.

Pumba with low level netem options, greatly simplifies its usage. We have made it easier to emulate different network properties for running Docker containers.

In the current release, Pumba modifies egress traffic only by adding delay or packet loss for specified container(s). Target containers can be specified by name (single name or as a space separated list) or via regular expression (RE2). Pumba modifies container network conditions for a specified duration. After a set time interval, Pumba restores normal network conditions. Pumba also restores the original connection with a graceful shutdown of the pumba process Ctrl-C or by stopping the Pumba container with docker stop command. An option is available to apply an IP range filter to the network emulation. With this option, Pumba will modify outgoing traffic for specified IP and will leave other outgoing traffic unchanged. Using this option, we can change network properties for a specific inter-container connection(s) as well as specific Docker networks — each Docker network has its own IP range.

Pumba delay: netem delay

To demonstrate, we’ll run two Docker containers: one is running a ping command and the other is Pumba Docker container, that adds 3 seconds network delay to the ping container for 1 minute. After 1 minute, Pumba container restores the network connection properties of the ping container as it exits gracefully.

# open two terminal windows: (1) and (2)

# terminal (1)
# create new 'tryme' Alpine container (with iproute2) and ping `www.example.com`
$ docker run -it --rm --name tryme alpine sh -c "apk add --update iproute2 && ping www.example.com"

# terminal (2)
# run pumba: add 3s delay to `tryme` container for 1m
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
         pumba netem --interface eth0 --duration 1m delay --time 3000 tryme

# See `ping` delay increased by 3000ms for 1 minute
# You can stop Pumba earlier with `Ctrl-C`

netem delay examples

This section contains more advanced network emulation examples for delay command.

# add 3 seconds delay for all outgoing packets on device `eth0` (default) of `mydb` Docker container for 5 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m \
      delay --time 3000 \
      mydb
# add a delay of 3000ms ± 30ms, with the next random element depending 20% on the last one,
# for all outgoing packets on device `eth1` of all Docker container, with name start with `hp`
# for 10 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m --interface eth1 \
      delay \
        --time 3000 \
        --jitter 30 \
        --correlation 20 \
      re2:^hp
# add a delay of 3000ms ± 40ms, where variation in delay is described by `normal` distribution,
# for all outgoing packets on device `eth0` of randomly chosen Docker container from the list
# for 10 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba --random \
      netem --duration 5m \
        delay \
          --time 3000 \
          --jitter 40 \
          --distribution normal \
        container1 container2 container3

Pumba packet loss: netem loss, netem loss-state, netem loss-gemodel

Lets start with packet loss demo. Here we will run three Docker containers. iperf server and client for sending data and Pumba Docker container, that will add packer loss on client container. We are using perform network throughput tests tool iperf to demonstrate packet loss.

# open three terminal windows

# terminal (1) iperf server
# server: `-s` run in server mode; `-u` use UDP;  `-i 1` report every second
$ docker run -it --rm --name tryme-srv alpine sh -c "apk add --update iperf && iperf -s -u -i 1"

# terminal (2) iperf client
# client: `-c` client connects to <server ip>; `-u` use UDP
$ docker run -it --rm --name tryme alpine sh -c "apk add --update iproute2 iperf && iperf -c 172.17.0.3 -u"

# terminal (3)
# run pumba: add 20% packet loss to `tryme` container for 1m
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
         pumba netem --duration 1m loss --percent 20 tryme

# See server report on terminal (1) 'Lost/Total Datagrams' - should see lost packets there

It is generally understood that packet loss distribution in IP networks is “bursty”. To simulate more realistic packet loss events, different probability models are used. Pumba currently supports 3 different loss probability models for *packet loss”. Pumba defines separate loss command for each probability model. - loss - independent probability loss model (Bernoulli model); it’s the most widely used loss model where packet losses are modeled by a random process consisting of Bernoulli trails - loss-state - 2-state, 3-state and 4-state State Markov models - loss-gemodel - Gilbert and Gilbert-Elliott models

Papers on network packer loss models: - “Indepth: Packet Loss Burstiness” link - “Definition of a general and intuitive loss model for packet networks and its implementation in the Netem module in the Linux kernel.” link - man netem link

netem loss examples

# loss 0.3% of packets
# apply for `eth0` network interface (default) of `mydb` Docker container for 5 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m \
      loss --percent 0.3 \
      mydb
# loss 1.4% of packets (14 packets from 1000 will be lost)
# each successive probability (of loss) depends by a quarter on the last one
#   Prob(n) = .25 * Prob(n-1) + .75 * Random
# apply on `eth1` network interface  of Docker containers (name start with `hp`) for 15 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth1 --duration 15m \
      loss --percent 1.4 --correlation 25 \
      re2:^hp
# use 2-state Markov model for packet loss probability: P13=15%, P31=85%
# apply on `eth1` network interface of 3 Docker containers (c1, c2 and c3) for 12 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth1 --duration 12m \
      loss-state -p13 15 -p31 85 \
      c1 c2 c3
# use Gilbert-Elliot model for packet loss probability: p=5%, r=90%, (1-h)=85%, (1-k)=7%
# apply on `eth2` network interface of `mydb` Docker container for 9 minutes and 30 seconds

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth2 --duration 9m30s \
      loss-gemodel --pg 5 --pb 90 --one-h 85 --one-k 7 \
      mydb

Contribution

Special thanks to Neil Gehani for helping me with this post and to Inbar Shani for initial Pull Request with netem command.

Next

To see more examples on how to use Pumba with [netem] commands, please refer to the Pumba GitHub Repository. We have open sourced it. We gladly accept ideas, pull requests, issues, or any other contributions.

Pumba can be downloaded as precompiled binary (Windows, Linux and MacOS) from the GitHub project release page. It’s also available as a Docker image.

Pumba GitHub Repository