Containers from scratch


This is write up for talk I gave at CAT BarCamp, an awesome unconference at Portland State University. Binary stars list The talk started with the self-imposed challenge “give an intro to containers without Docker or rkt.”

Often thought of as cheap VMs, containers are just isolated groups of processes running on a single host. Exchange rate for euro to dollar That isolation leverages several underlying technologies built into the Linux kernel: namespaces, cgroups, chroots and lots of terms you’ve probably heard before.

Container images, the thing you download from the internet, are literally just tarballs (or tarballs in tarballs if you’re fancy). Capital meaning in economics The least magic part of a container are the files you interact with.

For this post I’ve build a simple tarball by stripping down a Docker image. Futures market prices The tarball holds something that looks like a Debian file system and will be our playground for isolating processes. Stock meaning $ wget

The resulting directory looks an awful lot like a Linux system. 1 usd to cad There’s a bin directory with executables, an etc with system configuration, a lib with shared libraries, and so on.

Actually building this tarball is an interesting topic, but one we’ll be glossing over here. 1 usd in idr For an overview, I’d strongly recommend the excellent talk “Minimal Containers” by my coworker Brian Redbeard. Python tutorial video chroot

The first tool we’ll be working with is chroot. Us to canadian dollar exchange rate history A thin wrapper around the similarly named syscall, it allows us to restrict a process’ view of the file system. Usa today coaches poll football In this case, we’ll restrict our process to the “rootfs” directory then exec a shell.

It’s worth noting that this works because of all the things baked into the tarball. Usd to hkd conversion When we execute the Python interpreter, we’re executing rootfs/usr/bin/python, not the host’s Python. Convert binary That interpreter depends on shared libraries and device files that have been intentionally included in the archive.

Creating namespace is super easy, just a single syscall with one argument, unshare. Futures market quotes The unshare command line tool gives us a nice wrapper around this syscall and lets us setup namespaces manually. 1 jpy to usd In this case, we’ll create a PID namespace for the shell, then execute the chroot like the last example. Historical exchange rates gbp to usd $ sudo unshare -p -f –mount-proc=$PWD/rootfs/proc \

Having created a new process namespace, poking around our chroot we’ll notice something a bit funny. Hkd usd Our shell thinks its PID is 1?! What’s more, we can’t see the host’s process tree anymore. The millionaire Entering namespaces with nsenter

A powerful aspect of namespaces is their composability; processes may choose to separate some namespaces but share others. Lg washing machine codes For instance it may be useful for two programs to have isolated PID namespaces, but share a network namespace (e.g. Exchange rate british pounds to us dollars Kubernetes pods). Cad to usd exchange rate history This brings us to the setns syscall and the nsentercommand line tool.

The kernel exposes namespaces under /proc/(PID)/ns as files. Gbp usd exchange rate In this case, /proc/29840/ns/pid is the process namespace we’re hoping to join. Yen to usd exchange rate history $ sudo ls -l /proc/29840/ns

The nsenter command provides a wrapper around setns to enter a namespace. Binary 24 We’ll provide the namespace file, then run the unshare to remount /proc and chroot to setup a chroot. Usd jpy forecast This time, instead of creating a new namespace, our shell will join the existing one. 1 usd to 1 gbp $ sudo nsenter –pid=/proc/29840/ns/pid \

When deploying an “immutable” container it often becomes important to inject files or directories into the chroot, either for storage or configuration. 1 usd to myr For this example, we’ll create some files on the host, then expose them read-only to the chrooted shell using mount.

Next, we’ll create a target directory in our container and bind mount the directory providing the -o ro argument to make it read-only. Dollar euro exchange rate forecast If you’ve never seen a bind mount before, think of this like a symlink on steroids. Cny usd exchange rate $ sudo mkdir -p rootfs/var/readonlyfiles

cgroups, short for control groups, allow kernel imposed isolation on resources like memory and CPU. Binary cheat sheet After all, what’s the point of isolating processes they can still kill neighbors by hogging RAM?

The kernel exposes cgroups through the /sys/fs/cgroup directory. Usd to aed exchange rate history If your machine doesn’t have one you may have to mount the memory cgroup to follow along. Binary tree $ ls /sys/fs/cgroup/

For this example we’ll create a cgroup to restrict the memory of a process. Euro to aud converter Creating a cgroup is easy, just create a directory. Stock market futures live In this case we’ll create a memory group called “demo”. Usd vnd Once created, the kernel fills the directory with files that can be used to configure the cgroup. Python tutorial youtube $ sudo su

To adjust a value we just have to write to the corresponding file. Love quotes for wife Let’s limit the cgroup to 100MB of memory and turn off swap. Vnd to usd # echo “100000000” > /sys/fs/cgroup/memory/demo/memory.limit_in_bytes

The tasks file is special, it contains the list of processes which are assigned to the cgroup. Trading places stock market scene explained To join the cgroup we can write our own PID. Aud usd yahoo # echo $$ > /sys/fs/cgroup/memory/demo/tasks

cgroups can’t be removed until every processes in the tasks file has exited or been reassigned to another group. Exchange rate us to indian rupee Exit the shell and remove the directory with rmdir (don’t use rm -r). Yahoo futures quotes # exit

Containers are extremely effective ways of running arbitrary code from the internet as root, and this is where the low overhead of containers hurts us. Usd law school Containers are significantly easier to break out of than a VM. Usd chf As a result many technologies used to improve the security of containers, such as SELinux, seccomp, and capabilities involve limiting the power of processes already running as root.

Predictably this program fails; listing on port 80 requires permissions we don’t have. Size of futures market Of course we can just use sudo, but we’d like to give the binary just the one permission to listen on lower ports.

Capabilities are a set of discrete powers that together make up everything root can do. Conversion of euro to usd This ranges from things like setting the system clock, to kill arbitrary processes. Usd to chf In this case, CAP_NET_BIND_SERVICE allows executables to listen on lower ports.

For things already running as root, like most containerized apps, we’re more interested in taking capabilities away than granting them. Idr to usd conversion First let’s see all powers our root shell has: $ sudo su

Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep

Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,37

As an example, we’ll use capsh to drop a few capabilities including CAP_CHOWN. Exchange rate pound to dollar today If things work as expected, our shell shouldn’t be able to modify file ownership despite being root. Exchange rate danish krone to us dollar $ sudo capsh –drop=cap_chown,cap_setpcap,cap_setfcap,cap_sys_admin –chroot=$PWD/rootfs —

Conventional wisdom still states that VMs isolation is mandatory when running untrusted code. Usd cad exchange rate forecast But security features like capabilities are important to protect against hacked applications running in containers.

Beyond more elaborate tools like seccomp, SELinux, and capabilities, applications running in containers generally benefit from the same kind of best practices as applications running outside of one. Stock market futures 2015 Know what your linking against, don’t run as root in your container, update for known security issues in a timely fashion. Exchange rate vnd to usd Conclusion

Containers aren’t magic. Idr to usd Anyone with a Linux machine can play around with them and tools like Docker and rkt are just wrappers around things built into every modern kernel. Euro to usd exchange rate history No, you probably shouldn’t go and implement your own container runtime. Algorithm for binary search But having a better understanding of these lower level technologies will help you work with these higher level tools (especially when debugging).

There’s a ton of topics I wasn’t able to cover today, networking and copy-on-write file systems probably being the biggest two. Decimal word problems However, I hope this acts as a good starting point for anyone wanting to get their hands dirty. 200 usd to euro Happy hacking!