Discovering CPU features from userspace with ELF_HWCAP

As hardware architectures evolve they introduce new features – many of these features are abstracted away by the kernel yet result in some benefit to the user such as improved security or performance. However some features, like the introduction of a new CPU instruction can only benefit user-space if user-space is able to determine that this new instruction is available on their hardware. One example is the introduction of the CRC32 CPU instruction, initially added to the ARMv8.0 architecture and made mandatory by ARMv8.1 – User-space processes can use this instruction to accelerate CRC32 calculations resulting in a performance gain. In this post we’ll explore the ELF_HWCAP feature in the Linux kernel and understand how userspace can use it to find out about available features.

The kernel makes use of an ELF_HWCAP define which each architecture must define – usually a 32 or 64 bit variable. This variable is a bitfield with each bit representing a feature that the architecture is capable of supporting. The contents of the bitfield are entirely architecture dependent and often described in the kernel Documentation directory. One way of viewing this bitfield from userspace is simply to cat /proc/cpuinfo. Here is an example on ARM64:

$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm crc32 atomics cpuid asimdrdm
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd0f
CPU revision    : 0

On ARM64 the bitfield is expressed in cpuinfo under the ‘Features’ label (it’s flags on x86), here you can see we have support for the CRC32 instruction amongst others.

Of course each architecture has it’s own way of discovering capabilities, for ARM64 the prominent method is to read system registers that describe the capabilities. This is done in the arch/arm64/kernel/cpufeature.c file via the HWCAP_CAP macro. Once features are detected they are added to elf_hwcap (ARM64’s definition of ELF_HWCAP).

If you’re feeling adventurous and explore the cpufeature.c file, you may notice that in addition to elf_hwcap bitfield and the arm64_elf_hwcaps array that describes how to check for those features, the file also maintains another bitfield cpu_hwcaps (described by arm64_features). This bitfield also describes features but intended only for use within the kernel and as a result the features differ from ELF_HWCAP. Within the kernel ELF_HWCAP features are tested with cpu_have_feature, and cpu capabilities via cpus_have_const_cap and friends. Indeed it can be a little confusing.

Some of the complexity found in cpufeature.c relates to the support of heterogeneous systems, that’s systems where there are multiple CPUs and they differ in their micro architecture. In these systems the feature sets supported by the individual CPUs in the system may be different. Given that we live in a world of kernel preemption sometimes we can only consider a feature present if the feature is present on all CPUs.

The mechanism by which userspace becomes aware of the ELF_HWCAP is via the Auxiliary Vector. This is best described in an LWM article as follows “In essence, it is a list of key-value pairs that the kernel’s ELF binary loader (fs/binfmt_elf.c in the kernel source) constructs when a new executable image is loaded into a process. This list is placed at a specific location in the process’s address space;”. It’s essentially a list of information that the dynamic loader will likely need. Conveniently glibc provides a function getauxval that allows you to retrieve information from the auxiliary vector – and there is an example of it’s use in the kernel documentation arm64/elf_hwcaps.txt as follows:

bool floating_point_is_present(void)
        unsigned long hwcaps = getauxval(AT_HWCAP);
        if (hwcaps & HWCAP_FP)
                return true;

        return false;

A quick browse of GitHub provides a good example of real-world use. The ceph project relies on AT_HWCAP to determine which CRC function to use, on a supported ARMv8 platform it will use the CRC32 instructions, otherwise it will use a slow software algorithm.

Finally there is another way of finding out what ELF_HWCAPs are available, when the LD_SHOW_AUXV=1 environment variable is set the dynamic loader will display information from the auxiliary vector as follows:

$ LD_SHOW_AUXV=1 sleep 1
AT_SYSINFO_EHDR: 0xffff95f12000
AT_??? (0x33): 0x1270
AT_HWCAP:        1987
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x400040
AT_PHENT:        56
AT_PHNUM:        8
AT_BASE:         0xffff95ee8000
AT_FLAGS:        0x0
AT_ENTRY:        0x4073b0
AT_UID:          0
AT_EUID:         0
AT_GID:          0
AT_EGID:         0
AT_SECURE:       0
AT_RANDOM:       0xffffd8d86798
AT_EXECFN:       /bin/sleep
AT_PLATFORM:     aarch64

You’ll also notice an AT_HWCAP2 field on some platforms, as the number of features grow they can grow beyond the size of the ELF_HWCAP bitfield, hence AT_HWCAP2 provides an overflow.

You may also like...

Popular Posts