Two Objects not Namespaced by the Linux Kernel

存储架构 2017-04-26

Wednesday, April 26, 2017

If you are new to my blog then you might be new to the concept of Linux kernel namespaces. I suggest first reading Getting Towards Real Sandbox Containers and Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs .

Linux namespaces are one of the primitives that make up what is known as a “container.” They control what a process can see. Cgroups, the other main ingredient of “containers”, control what a process can use. But let’s focus for this post on namespaces. The current set of namespaces in the kernel are: mount, pid, uts, ipc, net, user, and cgroup. These all cover basically exactly what they are named after. But what is not covered? Well, let’s go over two of the things not namespaced by the Linux kernel.

Time

First, and my favorite to nerd out about, is time. Now, it should go without saying that if you want to set the time in Linux you need CAP_SYS_TIME . By default you do not get this capability in Docker containers. The settimeofday , etc syscalls are also blocked by the default seccomp profile in Docker as well.

What happens if you do change the time in a container?

Well, it’s not namespaced so obviously the time on the host would change as well. “But whaaaaa? I thought containers were just like a VM”, you ask. Again, you should read my post Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs .

One of my favorite questions I have been asked at a conference is “If you could add any new namespace to Linux what would it be?” Obviously this is an awesome question, totally up my alley, and not even a statement from someone trying to prove to me “they know things.” But I digress, I always answer with “Time.” Obviously there is no production use case for this, other than making more NTP hell for yourself. I do believe there is a development use case. Say you want to change the time for a test running in one container but not mess with the other tests running in other containers. What a fun way to make a chaos monkey for NTP! 😛

Kernel Keyring

The kernel keyring is another item not namespaced. There have been recent efforts to fix this for user namespaces , but the problem still stands if you are creating containers without user namespaces. Again, the default Docker seccomp profile blocks these syscalls so you don’t shoot yourself in the foot.

What happens if you use the kernel keyring from within in a container without a user namespace?

Well if root in one container stores keys in the keyring, any other containers on that same host can see it in their keyring, which is really just the same exact keyring.

All in all, I hope this proves once again that you need more than just namespaces and cgroups to get any sort of “real” isolation with containers. Please, please don’t disable seccomp or add extra capabilities you don’t need. Happy containering! I must leave you with this gif… 😀

责编内容by:Jessie Frazelle's Blog (源链)。感谢您的支持!

您可能感兴趣的

Linux kernel lockdown and UEFI Secure Boot David Howells recently published the latest version of his kernel lockdown p...
Linux Kernel 4.13.4 发布 Linux Kernel 4.13.4 发布。更新详情如下: stable: 4.13.4 201...
How to load or unload a Linux kernel module This article is excerpted from chapter 15 of Linux in Action , published by ...
UCloud-201706-002:Linux内核漏洞”Phoenix Talon”安全预警 | U... UCloud-201706-002:Linux内核漏洞”Phoenix Talon”安全预警 2017年6月19日 栏目:安全资讯 ...
Arch Linux 2017.07.01 Is Now Available for Downloa... It's that time of the month again when the developers of the popular Arch Linux ...