Sebastien Goasguen: containers

Showing posts with label containers. Show all posts

Tuesday, June 09, 2015

Building an S3 object store with Docker, Cassandra and Kubernetes

Docker makes building distributed applications relatively painless. At the very least deploying existing distributed systems/framework is made easier since you only need to launch containers. Docker hub is full of MongoDB, Elasticsearch, Cassandra images etc ... Assuming that you like what is inside those images, you can just grab them and run a container and you are done.

With a cluster manager/container orchestration system like Kubernetes, running clustered version of these systems where you need to operate multiple containers and multiple nodes is also made dead simple. Swear to God, it is !

Just check the list of examples and you will find everything that is needed to run a Redis, a Spark, a Storm, an Hazelcast even a Glusterfs cluster. Discovery of all the nodes can be a challenge but with things like Etcd, Consul, registrator, service discovery has never been easier.

What caught my eye in the list of Kubernetes examples is the ability to run an Apache Cassandra cluster. Yes, a Cassandra cluster based on Docker containers. It caught my eye especially that my buddies at exoscale have written an S3 compatible object store that uses Cassandra for storage. It's called Pithos and for those interested is written in Clojure.

So I wondered, let's run Cassandra in Kubernetes, then let's create a Docker image for Pithos and run it in Kubernetes as well. That should give me a S3 compatible object store, built using Docker containers.

To start we need a Kubernetes cluster. The easiest is to use Google container engine. But keep an eye on Kubestack which is a Terraform plan to create one. It could easily be adapted for different cloud providers. If you are new to Kubernetes check my previous post, or get the Docker cookbook in early release I just pushed a chapter on Kubernetes. Whatever technique you use, before proceeding you should be able to use the kubectl client and list the nodes in your cluster. For example:

$ ./kubectl get nodes
NAME                              LABELS                                                   STATUS
k8s-cookbook-935a6530-node-hsdb   kubernetes.io/hostname=k8s-cookbook-935a6530-node-hsdb   Ready
k8s-cookbook-935a6530-node-mukh   kubernetes.io/hostname=k8s-cookbook-935a6530-node-mukh   Ready
k8s-cookbook-935a6530-node-t9p8   kubernetes.io/hostname=k8s-cookbook-935a6530-node-t9p8   Ready
k8s-cookbook-935a6530-node-ugp4   kubernetes.io/hostname=k8s-cookbook-935a6530-node-ugp4   Ready

Running Cassandra in Kubernetes

You can use the Kubernetes example straight up or clone my own repo, you can explore all the pods, replication controllers and service definition there:

$ git clone https://github.com/how2dock/dockbook.git
$ cd ch05/examples

Then launch the Cassandra replication controller, increase the number of replicas and launch the service:

$ kubectl create -f ./cassandra/cassandra-controller.yaml
$ kubectl scale --replicas=4 rc cassandra
$ kubectl create -f ./cassandra/cassandra-service.yaml

Once the image is downloaded you will have your Kubernetes pods in running state. Note that the image currently used comes from the Google registry. That's because this image contains a Discovery class specified in the Cassandra configuration. You could use the Cassandra image from Docker hub but would have to put that Java class in there to allow all cassandra nodes to discover each other. As I said, almost painless !

$ kubectl get pods --selector="name=cassandra"

Once Cassandra discovers all nodes and rebalances the database storage you will get something like:

$ ./kubectl exec cassandra-5f709 -c cassandra nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.16.2.4  84.32 KB   256     46.0%             8a0c8663-074f-4987-b5db-8b5ff10d9774  rack1
UN  10.16.1.3  67.81 KB   256     53.7%             784c8f4d-7722-4d16-9fc4-3fee0569ec29  rack1
UN  10.16.0.3  51.37 KB   256     49.7%             2f551b3e-9314-4f12-affc-673409e0d434  rack1
UN  10.16.3.3  65.67 KB   256     50.6%             a746b8b3-984f-4b1e-91e0-cc0ea917773b  rack1

Note that you can also access the logs of a container in a pod with kubectl logs very handy.

Launching Pithos S3 object store

Pithos is a daemon which "provides an S3 compatible frontend to a cassandra cluster". So if we run Pithos in our Kubernetes cluster and point it to our running Cassandra cluster we can expose an S3 compatible interface.

To that end I created a Docker image for Pithos runseb/pithos on Docker hub. Its an automated build so you can check out the Dockerfile there. The image contains the default configuration file. You will want to change it to edit your access keys and bucket stores definitions. I launch Pithos as a Kubernetes replication controller and expose a service with an external load balancer created on Google compute engine. The Cassandra service that we launched earlier allows Pithos to find Cassandra using DNS resolution. To bootstrap pithos we need to run a non-restarting Pod which installs the Pithos schema in Cassandra. Let's do it:

$ kubectl create -f ./pithos/pithos-bootstrap.yaml

Wait for the bootstrap to happen, i.e for the Pod to get in succeed state. Then launch the replication controller. For now we will launch only one replicas. Using an rc makes it easy to attach a service and expose it via a Public IP address.

$ kubectl create -f ./pithos/pithos-rc.yaml
$ kubectl create -f ./pithos/spithos.yaml
$ ./kubectl get services --selector="name=pithos"
NAME      LABELS        SELECTOR      IP(S)            PORT(S)
pithos    name=pithos   name=pithos   10.19.251.29     8080/TCP
                                      104.197.27.250

Since Pithos will serve on port 8080 by default, make sure that you open the firewall for public IP of the load-balancer.

Use an S3 client

You are now ready to use your S3 object store, offered by Pithos, backed by Cassandra, running on Kubernetes using Docker. Wow...a mouth full !!!

Install s3cmd and create a configuration file like so:

$ cat ~/.s3cfg
[default]
access_key = AKIAIOSFODNN7EXAMPLE
secret_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
check_ssl_certificate = False
enable_multipart = True
encoding = UTF-8
encrypt = False
host_base = s3.example.com
host_bucket = %(bucket)s.s3.example.com
proxy_host = 104.197.27.250 
proxy_port = 8080
server_side_encryption = True
signature_v2 = True
use_https = False
verbosity = WARNING

Note that we use an unencrypted proxy (the load-balancer IP created by the Pithos Kubernetes service, don't forget to change it). The access and secret keys are the default stored in the Dockerfile

With this configuration in place, you are ready to use +s3cmd+:

$ s3cmd mb s3://foobar
Bucket 's3://foobar/' created
$ s3cmd ls
2015-06-09 11:19  s3://foobar

If you wanted to use Boto, this would work as well:

#!/usr/bin/env python

from boto.s3.key import Key
from boto.s3.connection import S3Connection
from boto.s3.connection import OrdinaryCallingFormat

apikey='AKIAIOSFODNN7EXAMPLE'
secretkey='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

cf=OrdinaryCallingFormat()

conn=S3Connection(aws_access_key_id=apikey,
                  aws_secret_access_key=secretkey,
                  is_secure=False,host='104.197.27.250',
                  port=8080,
                  calling_format=cf)

conn.create_bucket('foobar')

And that's it. All of these steps make sound like a lot, but honestly it has never been that easy to run an S3 object store. Docker and Kubernetes truly make running distributed applications a breeze.

Monday, May 04, 2015

Running VMs in Docker Containers via Kubernetes

Couple weeks ago Google finally published a technical paper describing Borg, their cluster management system that they built over the last ten years or more and that runs all Google services.

There are several interesting concepts in the paper, one of them of course being that they run everything in containers. Whether they use Docker or not is unknown. Some parts of their workloads probably still use LMCTFY, - Let Me Contain That For You-. What struck me is that they say to not be using full virtualization. It makes sense in terms of timeline, considering that Borg started before the advent of hardware virtualization. However, their Google Compute Engine offers VM as a Service, so it is fair to wonder how they are running their VMs. This reminded me of John Wilkes talk at MesosCon 2014. He discussed scheduling in Borg (without mentioning it) and at 23 minutes in his talk, mentions that they run VMs in containers.

Running VM in containers does make sense when you think in terms of a cluster management system that deals with multiple type of workloads. You treat your IaaS (e.g GCE) as a workload, and contain it so that you can pack all your servers and maximize utilization. It also allows you to run some workloads on bare-metal for performance.

Therefore let's assume that GCE is just another workload for Google and that it runs through Borg.

Since Borg laid out the principles for Kubernetes, the cluster management system designed for containerized workloads and open sourced by Google in June 2014. You are left asking:

"How can we run VMs in Kubernetes ?"

This is where Rancher comes to our help to help us prototype a little some-some. Two weeks ago, Rancher announced RancherVM, basically a startup script that creates KVM VMs inside Docker containers (not really doing it justice calling it a script...). It is available on GitHub and super easy to try. I will spare you the details and tell you to go to GitHub instead. The result is that you can build a Docker image that contains a KVM qcow image, and that running the container starts the VM with the proper networking.

Privilege gotcha

With a Docker image now handy to run a KVM instance in it, using Kubernetes to start this container is straightforward. Create a Pod that launches this container. The only caveat is that the Docker host(s) that you use and that form your Kubernetes cluster need to have KVM installed and that your containers will need to have some level of privileges to access the KVM devices. While this can be tweaked with Docker run parameters like --device and --cap-add, you can brute force it in a very unsecure manner with --privilege. However Kubernetes does not accept to run privileged containers by default (rightfully so). Therefore you need to start you Kubernetes cluster (i.e API server and Kubelet with the --allow_privilege=true option).

If you are new to Kubernetes, check out my previous post where I show you how to start a one node Kubernetes "cluster" with Docker compose. The only modification that I did from that post, is that I am running this on a Docker host that also has KVM installed, that the compose manifest specifies --allow_pivileged=true in the kubelet startup command, and that I modify the /etc/kubernetes/manifests/master.json by specifiying a volume. This allows me not to tamper with the images from Google.

Let's try it out

Build your RancherVM images:

$ git clone https://github.com/rancherio/vm.git
$ cd vm
$ make all

You will now have several RancherVM images:

$ sudo docker images
REPOSITORY                           TAG                 ...
rancher/vm-android                   4.4                 ...
rancher/vm-android                   latest              ...
rancher/ranchervm                    0.0.1               ...
rancher/ranchervm                    latest              ...
rancher/vm-centos                    7.1                 ...
rancher/vm-centos                    latest              ...
rancher/vm-ubuntu                    14.04               ...
rancher/vm-ubuntu                    latest              ...
rancher/vm-rancheros                 0.3.0               ...
rancher/vm-rancheros                 latest              ...
rancher/vm-base                      0.0.1               ...
rancher/vm-base                      latest              ...

Starting one of those will give you access to a KVM instance running in the container.

I will skip the startup of the Kubernetes components. Check my previous post. Once you have Kubernetes running you can list the pods (i.e group of containers/volumes). You will see that the Kubernetes master itself is running as a Pod.

$ ./kubectl get pods
POD         IP        CONTAINER(S)         IMAGE(S)                                     ...
nginx-127             controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ...
                      apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                             
                      scheduler            gcr.io/google_containers/hyperkube:v0.14.1

Now let's define a RancherVM as a Kubernetes Pod. We do this in a YAML file

apiVersion: v1beta2
kind: Pod
id: ranchervm
labels:
  name: vm
desiredState:
  manifest:
    version: v1beta2
    containers:
      - name: master
        image: rancher/vm-rancheros
        privileged: true
        volumeMounts:
          - name: ranchervm
            mountPath: /ranchervm
        env:
         - name: RANCHER_VM
           value: "true"
    volumes:
      - name: ranchervm
        source:
          hostDir: 
            path: /tmp/ranchervm

To create the Pod use the kubectl CLI:

$ ./kubectl create -f vm.yaml 
pods/ranchervm
$ ./kubectl get pods
POD         IP            CONTAINER(S)         IMAGE(S)                                     ....
nginx-127                 controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ....
                          apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                             
                          scheduler            gcr.io/google_containers/hyperkube:v0.14.1                                             
ranchervm   172.17.0.10   master               rancher/vm-rancheros                         ....

The RancherVM image specified contains RancherOS. The container will start automatically but of course the actual VM will take couple more seconds to start. Once it's up, you can ping it and you can ssh to the VM instance.

$ ping -c 1 172.17.0.10
PING 172.17.0.10 (172.17.0.10) 56(84) bytes of data.
64 bytes from 172.17.0.10: icmp_seq=1 ttl=64 time=0.725 ms

$ ssh rancher@172.17.0.10 
...
[rancher@ranchervm ~]$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[rancher@ranchervm ~]$ sudo system-docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
229a22962a4d        console:latest      "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            console             
cfd06aa73192        userdocker:latest   "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            userdocker          
448e03b18f93        udev:latest         "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            udev                
ff929cddeda9        syslog:latest       "/usr/sbin/entry.sh    2 minutes ago       Up 2 minutes                            syslog

Amazing ! I can feel that you are just wondering what the heck is going on:)

You want to kill the VM ? Just kill the pod:

$ ./kubectl delete pod ranchervm

Remember that a Pod is not a single container but could contain several ones as well as volumes.

Let's go a step further, and scale the number of VMs by using a replication controller.

Using a Replication Controller to scale the VM

Kubernetes is quite nice, it builds on years of experience with fault-tolerance at Google and provides mechanism for keeping your services up, scaling them and rolling new versions. The replication Controller is a primitive to manage the scale of your services.

So say you would like to automatically increase or decrease the number of VMs running in your datacenter. Start them with a replication controller. This is defined in a YAML manifest like so:

id: ranchervm
kind: ReplicationController
apiVersion: v1beta2
desiredState:
  replicas: 1
  replicaSelector:
    name: ranchervm
  podTemplate:
    desiredState:
      manifest:
        version: v1beta2
        id: vm 
        containers:
          - name: vm
            image: rancher/vm-rancheros
            privileged: true
            volumeMounts:
              - name: ranchervm
                mountPath: /ranchervm
            env:
              - name: RANCHER_VM
                value: "true"
        volumes:
          - name: ranchervm
            source:
              hostDir:
                path: /tmp/ranchervm
    labels:
      name: ranchervm

This manifest defines a Pod template (the one that we created earlier), and set a number of replicas. Here we start with one. To launch it, use the kubectl binary again:

$ ./kubectl create -f vmrc.yaml 
replicationControllers/ranchervm
$ ./kubectl get rc
CONTROLLER   CONTAINER(S)   IMAGE(S)               SELECTOR         REPLICAS
ranchervm    vm             rancher/vm-rancheros   name=ranchervm   1

If you list the pods, you will see that your container is running and hence your VM will start shortly.

$ ./kubectl get pods
POD               IP            CONTAINER(S)         IMAGE(S)                                     ...
nginx-127                       controller-manager   gcr.io/google_containers/hyperkube:v0.14.1   ...
                                apiserver            gcr.io/google_containers/hyperkube:v0.14.1                                                    
                                scheduler            gcr.io/google_containers/hyperkube:v0.14.1                                                    
ranchervm-16ncs   172.17.0.11   vm                   rancher/vm-rancheros                         ...

Why is this awesome ? Because you can scale easily:

$ ./kubectl resize --replicas=2 rc ranchervm
resized

And Boom, two VMs:

$ ./kubectl get pods -l name=ranchervm
POD               IP            CONTAINER(S)   IMAGE(S)               ...
ranchervm-16ncs   172.17.0.11   vm             rancher/vm-rancheros   ...
ranchervm-279fu   172.17.0.12   vm             rancher/vm-rancheros   ...

Now of course, this little test is done on one node. But if you had a real Kubernetes cluster, it would schedule these pods on available nodes. From a networking standpoint, RancherVM can provide DHCP service or not. That means that you could let Kubernetes assign the IP to the Pod and the VMs would be networked over the overlay in place.

Now imagine that we had security groups via an OVS switch on all nodes in the cluster...we could have multi-tenancy with network isolation and full VM isolation. While being able to run workloads in "traditional" containers. This has some significant impact on the current IaaS space, and even Mesos itself.

Your Cloud as a containerized distributed workload, anyone ???

For more recipes like these, checkout the Docker cookbook.