Linux 4.12

by on July 9th, 2017

Initial Radeon RX Vega support has been added. But for Linux 4.12 this is very initial support for these soon-to-launch GPUs. But don’t get too excited as the DC display code didn’t make it for this release so there is not any monitor/display support for Vega. When these GPUs launch soon you’ll need to build your own out-of-tree kernel or use AMDGPU-PRO.
For open source NVIDIA graphics there is initial GTX 1000 “Pascal” accelerated support. The consumer Pascal cards now have hardware acceleration support when paired with NVIDIA’s recently released firmware images for the GTX 1050/1060/1070/1080 series. But there isn’t yet any re-clocking support so it’s very slow.
This release adds support for USB Type-C connectors. USB Type-C, commonly known as simply USB-C, is a 24-pin USB connector system allowing transport of data and energy.
BFQ (Budget Fair Queuing) is a new I/O scheduler. On desktop systems BFQ provides low latency for interactive applications, low latency for soft real-time applications, higher speed for code-development tasks, high throughput, and strong fairness, bandwidth and delay guarantees. For servers, besides the same benefits as above, BFQ guarantees: audio and video-streaming with zero or very low jitter and drop rate; fast retrieval of WEB pages and embedded objects; real-time recording of data in live-dumping applications (e.g. packet logging); responsiveness in local and remote access to a server.
Live patching is a feature merged in Linux 4.0 that allows to patch the kernel code in running systems, which in turn allows to patch security issues without rebooting. This release adds a so-called per-task consistency model, a foundation which will eventually enable to patch those ~10% of security patches which change function or data semantics. This is the biggest remaining piece needed to make livepatch more generally useful. This code stems from the design proposal made in November 2014. It’s a hybrid of kGraft and kpatch: it uses kGraft’s per-task consistency and syscall barrier switching combined with kpatch’s stack trace switching.
This release introduces pblk, a host-side FTL for Open-Channel SSDs to expose them like block devices. Open-Channel SSDs are SSDs that do not include a Flash Translation Layer, support for them was included in Linux 4.4. Pblk is an implementation of a FTL in the Linux kernel, which allows data placement decisions, and I/O scheduling to be managed by the host, enabling users to optimize the SSD for their specific workloads.
Another I/O scheduler has been added. The Kyber I/O scheduler is a low-overhead scheduler suitable for multiqueue and other fast devices. Given target latencies for reads and synchronous writes it will self-tune queue depths to achieve that goal, similarly to blk-wbt.
Please take your time to read the full changelog.

Linux 4.11

by on May 7th, 2017

This release adds support for pluggable I/O schedulers framework in the multiqueue block layer. The Linux block layer has different I/O schedulers (deadline, cfq, noop, etc) with different performance characteristics each one, and users are allowed to switch between them on the fly. In Linux 3.13 a new multiqueue design that performs better with modern hardware (eg. SSD, NVM) has been added. However, this new multiqueue design did not include support for pluggable I/O schedulers. This release solves that problem with the merge of a multiqueue-ready I/O scheduling framework. A port of the deadline scheduler has also been added.
Based in work started in Linux 4.4 this release adds journalling support to RAID4/5/6 in the MD layer. With a journal device configured (typically NVRAM or SSD) a crash during degraded operations cannot result in data corruption.
Modern storage devices such as SSDs are making the usage of swapping attractive not just as a way to deal with excessive memory load, but also as a performance enhancement technique. Cloud providers, for example, can overcommit memory more aggressively and fit more VMs to a platform with a fast swap device. However, the swapping implementation was designed for traditional rotating hard disks, where the performance and latency of the swap did not matter as much as it does with modern storage. This release makes the swap implementation more scalable, making it more suitable for use with modern storage devices.
Due to several shortcomings in the stat() system call (like not being y2038 ready or not playing well with networking filesystems), a new system call has been worked through the years, with the final result being statx() available in this kernel release.
A new perf ftrace tool has been added. This tool intends to be a front-end for the already existing ftrace interface. In this release it supports two tracers: function_graph and function.
The Opal Storage Specification is a set of specifications for features of data storage devices that enhance their security. For example, it defines a way of encrypting the stored data so that an unauthorized person who gains possession of the device cannot see the data. That is, it is a specification for self-encrypting drives (SED). This release adds Linux support for Opal nvme enabled controllers. It enables users to setup/unlock/lock locking ranges for SED devices using the Opal protocol.
This release adds optional support for scrollback history not being flushed when switching between consoles (this breaks tools like clear_console).
AMDGPU power management continues to mature with Linux 4.11. Likely due to TTM memory management improvements, Linux 4.11 is faster for RADV Vulkan.
Intel has enabled frame-buffer compression by default for Skylake hardware and newer. Also Intel’s DRM driver also now handles DisplayPort MST audio. They added multi-stream transport capabilities a few kernels ago while this work is about allowing audio for DP MST displays.
Intel also has initial Geminilake graphics support. Geminilake SoCs will be shipping later this year.
The Nouveau DRM driver went through a Secure Boot code refactoring and based off that work NVIDIA even released the signed Pascal firmware for Pascal consumer cards along with code for bringing up accelerated support for the GeForce GTX 1050/1060/1070/1080. But that accelerated Pascal support isn’t landing until Linux 4.12.
As always read the full changelog.

Linux 4.10

by on February 25th, 2017

This release adds support for Intel GVT-g for KVM (KVMGT), a full GPU virtualization solution with mediated pass-through, starting from 4th generation Intel Core (Haswell) processors with Intel Graphics.
Perf c2c (cache to cache) is a new tool designed to analyse and track down performance problems caused by false sharing on NUMA systems. The tool is based on x86’s load latency and precise store facility events provided by Intel CPUs. In modern systems with multiple processors, different memory modules are physically connected to different CPUs. In these systems memory accesses to the local memory are faster than accesses to the memory connected to other processors. When a task is multi-threaded, different threads can run in different CPUs at the same time; if these threads try to access and modify the same memory, they can have performance issues due to the costs of synchronizing the CPU caches.
Perf sched timehist provides an analysis of scheduling events. By default it shows a table with the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task.
This release adds a mechanism that throttles back buffered writeback, which makes more difficult for heavy writers to monopolize the IO requests queue, and thus provides a smoother experience in Linux desktops and shells than what people was used to. The way Linux synchronizes to disk the data written to memory by processes has always had issues. When Linux writes data in the background, it should have little impact on foreground activity. This was not always the case because heavy writes fill up the block layer, and other I/O requests have to wait to be attended. The new algorithm for when to throttle can monitor the latencies of requests, and shrinks or grows the request queue depth accordingly, (it’s auto-tunable). This feature needs to be enabled explicitly in the configuration (and there can be regressions).
A new hybrid block polling method has been added. This method uses less CPU than pure polling: instead of polling after I/O submission, the kernel induces an artificial delay, and then polls after that. This still puts a sleep/wakeup cycle in the I/O path, but instead of the wakeup happening after the I/O has completed, it’ll happen before. Continuously polling a device can cause excessive CPU consumption and sometimes even worse throughput. With this hybrid scheme, Linux can achieve big latency reductions while still using the same (or less) amount of CPU.
Support for ARM devices such as the Nexus 5 & 6 or Allwinner A64 has been improved.
This release adds eBPF (extended Berkeley Packet Filters) hooks for cgroups, to allow eBPF programs for network filtering and accounting to be attached to cgroups, so that they apply to all sockets of all tasks placed in that cgroup.
This release implements a raid5 writeback cache in the MD subsystem (Multiple Devices). Its goal is to aggregate writes to make full stripe write and reduce read-modify-write. It’s helpful for workload which does sequential write and follows fsync for example.
A Intel feature (Intel Cache Allocation Technology) that allows to set policies on the L2/L3 CPU caches: real-time tasks could be assigned dedicated cache space.
The open source NVIDIA DRM driver has initial support for allowing supported graphics cards to hit their “boost” clock frequencies for achieving higher performance.
Through various pull requests there’s been efforts for getting the AMD Ryzen processor support squared away in the mainline Linux kernel.
For the AMDGPU driver, there have been various Radeon Southern Islands and Sea Islands improvements this cycle. There have also been more AMDGPU improvements around PowerPlay / power management.
The Linux 4.10 HID work includes a sensor-hub fix to support the Microsoft Surface 3 (other Surface 3 work landed in Linux 4.8) as well as supporting multi-touch data with the Surface 3. Surface 4 HID support was also added for this kernel cycle.
Here is the full changelog.

Linux 4.9

by on December 17th, 2016

This long term release adds several key features to the XFS file system, based on the reverse mapping work introduced in the previous release. This release adds the ability to share data extents between different files. That is the ability to deduplicate data and the ability to unshare data. It also adds copy-on-write support for data: instead of overwriting data, it copies data to a new location.
The Linux kernel has always mapped the memory used by kernel stacks directly in the kernel memory, an approach that makes harder to allocate stacks under memory load and proves no protection against stack overflows. This release allows to map the kernel stacks in virtual memory, which makes easier to allocate stacks under memory pressure and provides protection against stack overflows.
This release adds another TCP congestion control algorithm: BBR (Bottleneck Bandwidth and RTT). This algorithm is based on bandwidth measurements instead of packet loss. The Internet has predominantly used loss-based congestion control, relying on packet loss as the signal to slow down. On today’s Internet, loss-based congestion control causes the infamous bufferbloat problem, often causing seconds of needless queuing delay, since it fills the bloated buffers in many last-mile links. On today’s high-speed long-haul links using commodity switches with shallow buffers, loss-based congestion control has abysmal throughput because it overreacts to losses caused by transient traffic bursts.
BBR creates an explicit model of the network pipe by sequentially probing the bottleneck bandwidth and RTT. BBR has significantly increased throughput and reduced latency for connections on Google’s internal backbone networks and google.com and YouTube Web servers.
Protection keys is a memory protection hardware feature merged in Linux 4.6. But in that release, the use of this feature was limited to the kernel automatically using it in high-level APIs, such as mmap and mprotect. This release adds new syscalls that offer a more complete API to use protection keys.
Coming from the real-time patchset, the hardware latency tracer is a special purpose tracer that is used to detect large system latencies induced by the behaviour of certain underlying hardware or firmware interruptions, like SMIs (System Management Interrupts) on x86 systems, that the kernel is unaware of.
The hardware latency detector works by simply creating a thread that spins on a single CPU polling the CPU Time Stamp Counter for a specified amount of time (width) within a periodic window (window), and trying to find gaps where the polling was interrupted. This is useful for testing if a system is reliable for Real Time tasks.
Among the many drivers implementation there is the experimental support for AMDGPU Southern Islands and AMDGPU virtual display support.
There is also an improved P-State performance for some Intel Atom CPUs and added support for many ARM machines (from Raspberri Pi Zero, Broadcom boards to many Qualcomm platforms).
As usual read the full changelog.

Linux 4.8

by on October 8th, 2016

This release adds support for using Transparent Huge Pages, bigger than 4K (in x86), in the page cache (used for backing filesystem data) automatically without user intervention. This release adds support for transparent huge pages in the page cache in tmpfs/shmem.
Support for eXpress Data Path, a high performance, programmable network data path has been added. XDP provides bare metal packet processing at the lowest point in the software stack. Use cases include pre-stack processing like filtering to do DOS mitigation, forwarding and load balancing, flow sampling and monitoring.
A new feature in XFS filesystem called reverse mapping allows XFS to track the owner of a specific block on disk precisely. This reverse mapping infrastructure is the building block of several upcoming features: reflink, copy-on-write data, deduplication, online metadata and data scrubbing, highly accurate bad sector/data loss reporting to users, and significantly improved reconstruction of damaged and corrupted filesystems.
A security feature has been ported from Grsecurity’s PAX_USERCOPY: stricter checking of memory copies with hardened usercopy. This feature kills entire classes of heap overflow exploits and similar kernel memory exposures. Performance impact is negligible.
Another feature taken from Grsecurity has been ported: GCC plugin support. GCC plugins are loadable compiler modules that can be used for runtime instrumentation and static analysis, allowing to analyse, change and add further code during compilation. Grsecurity uses these mechanisms to improve security.
This release implements RFC 5570: Common Architecture Label IPv6 Security Option (CALIPSO). Its goal is to set Multi-Level Secure (MLS) sensitivity labels on IPv6 packets using a hop-by-hop option. It is intended for use only within MLS networking environments that are both trusted and trustworthy
A new feature virtio-vsocks for easier guest/host communication has been added. This can be used to implement hypervisor services and guest agents (like qemu-guest-agent for example).
This release adds a new congestion control, TCP New Vegas is a major update to TCP-Vegas. Like Vegas, NV is a delay based congestion avoidance mechanism for TCP. Its filtering mechanism is similar: it uses the best measurement in a particular period to detect and measure congestion. It develop to coexist with modern networks where links bandwidths are 10 Gbps or higher, where the RTTs can be 10’s of microseconds.
In an attempt to modernize it, the kernel documentation will be converted to the Sphinx system, which uses reStructuredText as its markup language.
As always there were a lot of drivers improvements.
AMDGPU OverDrive support: the open source AMD Linux driver stack now supports overclocking.
Initial NVIDIA Pascal support: Nouveau has initial support for Pascal GPUs, but unfortunately no support for the consumer GeForce GTX 1060/1070/1080 graphics cards. Only the GP100 is supported so far until NVIDIA ends up releasing the signed firmware blobs for supporting the open source driver with the consumer GeForce GTX 1000 series hardware.
The HDMI CEC framework has been talked about for years and developed out-of-tree. HDMI CEC is short for the Consumer Electronics Control and allows HDMI-connected devices to be commanded and controlled by a user with a single remote control. With Linux 4.8 this framework is finally included.
Raspberry Pi 3 SoC: The Broadccom BCM2837 SoC is supported by the mainline Linux 4.8 kernel. There is also a variety of other ARM improvements.
Here is the full changelog.

Linux 4.7

by on July 29th, 2016

Linux 4.7 kernel has been released. Here’s a recap of some of the biggest features.
Radeon RX 480 “Polaris” open-source support. With Linux 4.7 there is all the initial AMDGPU DRM support needed for firing up the RX 480, which can be used in conjunction with the latest Mesa, linux-firmware, and LLVM for having quite suitable open-source support for this newly launched graphics processor.
A number of new ARM platforms are now supported.
The Schedutil governor for the CPUFreq scaling driver is new and holds potential for making better CPU frequency scaling decisions based upon scheduler utilization data.
Async discard is now supported by the core block code.
Support for various Corsair and ASUS keyboards, among other new peripheral support in Linux 4.7.
The Microsoft Xbox One Elite Controller is now supported by the mainline Linux kernel.
Top features of 4.7:
– Support for Radeon RX480 GPUs
– Parallel directory lookups
– New ‘schedutil” frequency governor
– Histograms of events in ftrace
– perf trace calls stack
– Allow BPF programs to attach to tracepoints
– EFI ‘Capsule’ firmware updates
– Support for creating virtual USB Device Controllers in USB/IP
– Android’s sync_file fencing mechanism considered stable
– LoadPin, a security module to restrict the origin of kernel modules
Have a look at the full changelog.

Linux 4.6

by on May 20th, 2016

Linux 4.6 kernel is out.
Now we finally have mainline support for a number of new ARM SoCs and platforms/boards.
There are various Radeon and AMDGPU improvements to make the open-source AMD graphics driver stack more stable and robust.
Initial NVIDIA GeForce GTX 900 “Maxwell” open-source support. While Pascal is days away from shipping, with Linux 4.6 there is finally 3D/acceleration support for Maxwell when grabbing NVIDIA’s signed firmware blobs they’ve made available. Before getting too excited, the support isn’t as mature as Kepler and they don’t yet have any re-clocking support for being able to provide good performance.
Runtime AHCI power management for greater power savings has been added to this release.
There’s also Dell and Alienware laptop support improvements, including for the popular Dell XPS 13 Skylake laptop.
Many sources of information will probably point out that this kernel release has new and better security features. Not everyone will take this piece of news for granted.
This is the “press release” of these new features with an interview of Greg Kroah-Hartman.
And this is a post from Brad Spengler, the creator and maintainer of GrSecurity in response of that PR.
I’ll let you make up your own mind on this topic.
Top features of 4.6:
– USB 3.1 SuperSpeedPlus (10 Gbps) support
– Improve the reliability of the Out Of Memory task killer
– Support for Intel memory protection keys
– OrangeFS, a new distributed file system
– Kernel Connection Multiplexor, a facility for accelerating application layer protocols
– 802.1AE MAC-level encryption (MACsec)
– BATMAN V protocol
– dma-buf: new ioctl to manage cache coherency between CPU and GPU
– OCFS2 online inode checker
– Support for cgroup namespaces
– Add support for the pNFS SCSI layout
Read the full changelog.

Linux 4.5

by on March 18th, 2016

Linux 4.5 is out. Let’s see its prominent features.
A new tool called UBSAN checks a running kernel for various types of undefined behaviour that can lead to obscure bugs.
The new CONFIG_IO_STRICT_DEVMEM option, which blocks access to memory (via /dev/mem) claimed by device drivers, turned out to break booting on some systems, so it is now off by default.
The ARM multiplatform work that aims to build a single ARM kernel that can boot on a wide variety of processors has reached an important milestone with the merging of work to bring a number of minor platforms into the fold. This branch is the culmination of 5 years of effort to bring the ARMv6 and ARMv7 platforms together such that they can all be enabled and boot the same kernel.
The filesystems in user space (FUSE) subsystem has added support for the SEEK_HOLE and SEEK_DATA options to the lseek() system call.
The epoll_ctl() system call supports a new flag, EPOLLEXCLUSIVE, that causes epoll_wait() to only wake one process when a file descriptor becomes ready. See this article for a description of this option and the use case for it.
Direct-access (“DAX”) mappings now work properly with the msync() and fsync() system calls.
The ext4 filesystem has gained “project quota” support, wherein dispersed files can be assigned to the same “project” and given their own quota. The feature is rigorously undocumented, but some information be found in the header of this patch posting.
The implementation of the XFS XFS_IOC_FSSETXATTR and XFS_IOC_FSGETXATTR ioctl() commands has been moved up to the virtual filesystem level, and an implementation for the ext4 filesystem has been added. This operation, also severely undocumented, allows the querying (and setting) of various file attributes, including immutability, whether writes should always be synchronous, exclusion from backups, and more. See the defines near the top of this commit for the list of supported attributes.
The Ceph filesystem now has support for asynchronous I/O.
So the top features of this release are:
– Copy offloading with new copy_file_range(2) system call
– Experimental PowerPlay supports brings high performance to the amdgpu driver
– Btrfs free space handling scalability improvements
– Support for GCC’s Undefined Behavior Sanitizer
– Forwarded Error Correction support in the device-mapper’s verity target
– Add MADV_FREE flag to madvise(2)
– Better epoll multithread scalability
– cgroup unified hierarchy is considered stable
– Performance improvements for SO_REUSEPORT UDP sockets
– Proper control of socket memory usage in the memory controller
Read the full changelog.

Linux 4.4

by on January 12th, 2016

A new long time support linux kernel has been released. Let’s see its new feature.
This release introduces support of Direct I/O and asynchronous I/O for the loop block device. The advantages to use direct I/O and AI/O on read/write loop’s backing file are:
– double cache is avoided due to Direct I/O which reduces memory usage a lot;
– unlike user space direct I/O there isn’t cost of pinning pages;
– avoids context switches in some cases because concurrent submissions can be avoided.
The virtio-gpu driver now allows the virtualization guest to use the capabilities of the host GPU to accelerate 3D rendering. In practice, this means that a virtualized linux guest can run a opengl game while using the GPU acceleration capabilities of the host. This also requires running QEMU 2.5.
LightNVM adds support for Open-Channel SSDs, devices that share responsibilities with the operating system in order to implement and maintain features that typical SSDs keep strictly in firmware. LightNVM is a specification that gives support to Open-channel SSDs. LightNVM allows the host to manage data placement, garbage collection, and parallelism. Device specific responsibilities such as bad block management, FTL extensions to support atomic I/Os, or metadata persistence are still handled by the device.
In this release, and as a result from an effort that started two years ago, the TCP implementation has been refactored to make the TCP listener fast path completely lockless. During tests, a server was able to process 3,500,000 SYN packets per second on one listener and still have available CPU cycles – about 2 to 3 order of magnitude what it was possible before. SO_REUSEPORT has also been extended to add proper CPU/NUMA affinities, so that heavy duty TCP servers can get proper siloing thanks to multi-queues NICs.
This release also adds journalled RAID 5 support to the MD (RAID/LVM) layer and basic support for polling for specific I/O to complete, which can improve latency and throughput in very fast devices.
As always a very wide changes in drivers has been made. Check the full changelog.

Linux 4.3

by on November 7th, 2015

Straight to the new features of this kernel.
On the processors support there are new ARM SoC support and ARMv8.1 functionality is now integrated. Many updates in the power management of many drivers and Xen new features.
On the filesystems part the EXT3 driver has been removed and support for existing EXT3 filesystems will be handled by the EXT4 driver. There was some debate whether to nuke the EXT3 driver but in the end it was proved that EXT4 can reliably handle EXT3 filesystems without breaking compatibility
Many other fixes for EXT4, XFS, F2FS and Btrfs.
In the Graphics part there are many Intel changes and the Intel Skylake Graphics “Gen9” are enabled by default.
Initial support for the AMD R9 Fury “Fiji” graphics processors has been merged. However, this initial support doesn’t yet have any re-clocking / power management so the performance remains quite slow for now. You’ll also need to be on Mesa 11.0+ for using the AMDGPU accelerated graphics.
A big rework of the Nouveau DRM driver has been done for NVIDIA graphics support. There’s also been some re-clocking improvements for select GPUs and other changes.
OpenGL 3.3 support for VMware has been included. With the Linux VMWgfx kernel driver plus Mesa 11.0+, when using VMware Workstation 12 there will now be OpenGL 3.3 support exposed to Linux guest VMs rather than OpenGL 2.1.
As always various input driver were updated.