Continuing with interesting security things in the Linux kernel, here’s v4.4. As before, if you think there’s stuff I missed that should get some attention, please let me know.
The CONFIG_STRICT_DEVMEM setting that has existed for a long time already protects system RAM from being accessible through the
/dev/mem device node to root in user-space. Dan Williams added CONFIG_IO_STRICT_DEVMEM to extend this so that if a kernel driver has reserved a device memory region for use, it will become unavailable to
/dev/mem also. The reservation in the kernel was to keep other kernel things from using the memory, so this is just common sense to make sure user-space can’t stomp on it either. Everyone should have this enabled.
If you’re looking to create a very bright line between user-space having access to device memory, it’s worth noting that if a device driver is a module, a malicious root user can just unload the module (freeing the kernel memory reservation), fiddle with the device memory, and then reload the driver module. So either just leave out
/dev/mem entirely (not currently possible with upstream), build a monolithic kernel (no modules), or otherwise block (un)loading of modules (
Mickaël Salaün added seccomp support (and selftests) for user-mode Linux. Moar architectures!
Tycho Andersen added a way to extract and restore seccomp filters from running processes via PTRACE_SECCOMP_GET_FILTER under CONFIG_CHECKPOINT_RESTORE. This is a continuation of his work (that I failed to mention in my prior post) from v4.3, which introduced a way to suspend and resume seccomp filters. As I mentioned at the time (and for which he continues to quote me) “this feature gives me the creeps.” 🙂
x86 W^X corrections
Stephen Smalley noticed that there was still a range of kernel memory (just past the end of the kernel code itself) that was incorrectly marked writable and executable, defeating the point of CONFIG_DEBUG_RODATA which seeks to eliminate these kinds of memory ranges. He corrected this and added CONFIG_DEBUG_WX which performs a scan of memory at boot time and yells loudly if unexpected memory protection are found. To nobody’s delight, it was shortly discovered the UEFI leaves chunks of memory in this state too, which posed an ugly-to-solve problem (which Matt Fleming addressed in v4.6 ).
x86_64 vsyscall CONFIG
I introduced a way to control the mode of the x86_64 vsyscall with a build-time CONFIG selection, though the choice I really care about is CONFIG_LEGACY_VSYSCALL_NONE , to force the vsyscall memory region off by default. The vsyscall memory region was always mapped into process memory at a fixed location, and it originally posed a security risk as a ROP gadget execution target. The vsyscall emulation mode was added to mitigate the problem, but it still left fixed-position static memory content in all processes, which could still pose a security risk . The good news is that glibc since version 2.15 doesn’t need vsyscall at all, so it can just be removed entirely. Any kernel built this way that discovered they needed to support a pre-2.15 glibc could still re-enable it at the kernel command line with “vsyscall=emulate”.
That’s it for v4.4. Tune in tomorrow for v4.5!
© 2016,Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License .