The indirection of outb, iowrite and friends

The {in,out}{b,w,l} functions of the kernel provide the mechanism for reading from and writing to what we may describe as port I/O (PIO). The API is not architecture specific and thus present in most architectures and used by a wide variety of device drivers. However the mechanism to support PIO varies widely between architectures and so under-the-hood these functions are a little more complex than first meets the eye. We’ll focus on the ARM implementation – this is interesting as this is an architecture that doesn’t provide I/O instructions or an I/O address space – yet despite this it’s still possible to write a driver that can talk to a device via I/O.

Let’s start by taking a look at the default implementation provided by include/asm-generic/io.h. The outb function, as follows, wraps up a __raw_writeb inside some barriers:

#ifndef outb
#define outb outb
static inline void outb(u8 value, unsigned long addr)
        __raw_writeb(value, PCI_IOBASE + addr);

The first point to note here is that we’re using the everyday writeb function to write an address within our processor’s memory address space. In other words this is memory mapped I/O (MMIO). The I/O address we are writing to is just another address in our large address space. This makes sense for ARM as this architecture doesn’t support I/O – unlike x86 it has no CPU instructions for issuing I/O transactions, it has no separate bus for I/O transactions and it has no separate address space for them either. (And just in case you’re wondering, x86 doesn’t use the default implementation as per include/asm-generic/io.h – it instead provides its own in arch/x86/include/asm/io.h – look for the BUILDIO macros – they simply translate to the ‘out’ assembly instruction.)

The second point to note is that the address is actually an offset, an offset to PCI_IOBASE. Why PCI? Well, even though a CPU architecture might not have support for I/O, it may be connected to a bus that does – and the most common bus with I/O support is the PCI bus. If you plug in a PCI card that needs to be programmed with I/O transactions – you need a way of issuing I/O transactions on its bus – for these architectures the way this is achieved is for the PCI host bridge peripheral to provide a memory region that when read from or written to will generate I/O transactions on the PCI bus. And therefore PCI_IOBASE points to this area in the processors address space.

The PCI_IOBASE define is a hard-coded virtual address for each architecture. Of course the actual physical address of the PCI host bridge could be anywhere depending on the design of the machine. A mapping is made between the bridge physical address and the hard-coded virtual address during PCI enumeration via the ‘pci_remap_iospace’ function inside the PCI core (as follows):

 *      pci_remap_iospace - Remap the memory mapped I/O space
 *      @res: Resource describing the I/O space
 *      @phys_addr: physical address of range to be mapped
 *      Remap the memory mapped I/O space described by the @res
 *      and the CPU physical address @phys_addr into virtual address space.
 *      Only architectures that have memory mapped IO functions defined
 *      (and the PCI_IOBASE value defined) should call this function.
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
#if defined(PCI_IOBASE) && defined(CONFIG_MMU)
        unsigned long vaddr = (unsigned long)PCI_IOBASE + res->start;

        if (!(res->flags & IORESOURCE_IO))
                return -EINVAL;

        if (res->end > IO_SPACE_LIMIT)
                return -EINVAL;

        return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
        /* this architecture does not have memory mapped I/O space,
           so this function should never be called */
        WARN_ONCE(1, "This architecture does not support memory mapped I/O\n");
        return -ENODEV;

The physical address usually comes from the device tree (or other firmware) as described by the host bridge’s ‘ranges‘ property. The following snippet from the r8a77965.dtsi device tree shows the ranges property for a PCIe host bridge peripheral – the first line in the ranges property describes an I/O window (0x01000000) of 1MB (0 0x00100000), whereby the first I/O address maps (0 0x00000000) to the CPU physical address of 0xfe100000 (0 0xfe100000).

pciec0: pcie@fe000000 {
        compatible = "renesas,pcie-r8a77965",
        reg = <0 0xfe000000 0 0x80000>;
        #address-cells = <3>;
        #size-cells = <2>;
        bus-range = <0x00 0xff>;
        device_type = "pci";
        ranges = <0x01000000 0 0x00000000 0 0xfe100000 0 0x00100000
                0x02000000 0 0xfe200000 0 0xfe200000 0 0x00200000
                0x02000000 0 0x30000000 0 0x30000000 0 0x08000000
                0x42000000 0 0x38000000 0 0x38000000 0 0x08000000>;
        /* Map all possible DDR as inbound ranges */
        dma-ranges = <0x42000000 0 0x40000000 0 0x40000000 0 0x80000000>;
        interrupts = <GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>,
                <GIC_SPI 117 IRQ_TYPE_LEVEL_HIGH>,
                <GIC_SPI 118 IRQ_TYPE_LEVEL_HIGH>;
        #interrupt-cells = <1>;
        interrupt-map-mask = <0 0 0 0>;
        interrupt-map = <0 0 0 0 &gic GIC_SPI 116 IRQ_TYPE_LEVEL_HIGH>;
        clocks = <&cpg CPG_MOD 319>, <&pcie_bus_clk>;
        clock-names = "pcie", "pcie_bus";
        power-domains = <&sysc R8A77965_PD_ALWAYS_ON>;
        resets = <&cpg 319>;
        status = "disabled";

This approach works well, so long as the peripheral that generates the I/O transactions does so when you read or write some MMIO. However this wasn’t the case for the LPC peripheral of a recent ARM64 HiSilicon Hip06 board – sending I/O transactions required a bit more work. As a result the kernel was adapted and the CONFIG_INDIRECT_PIO infrastructure was introduced. The outb function provided by this infrastructure does everything the asm-generic/io.h does – however if the provided address is within a specific range, rather than calling writeb it will search through a list of registered address ranges that match and call the associated outb handler for that range. Thus allowing different ranges to be handled completely differently as follows:

void logic_out##bw(type value, unsigned long addr)            \
{                                    \
    if (addr < MMIO_UPPER_LIMIT) {                   \
        write##bw(value, PCI_IOBASE + addr);            \
    } else if (addr >= MMIO_UPPER_LIMIT && addr < IO_SPACE_LIMIT) {  \
        struct logic_pio_hwaddr *entry = find_io_range(addr);   \
        if (entry && entry->ops)                \
            entry->ops->out(entry->hostdata,        \
                    addr, value, sizeof(type)); \
        else                            \
            WARN_ON_ONCE(1);                \
    }                               \

This may raise the question, how do drivers learn about the I/O addresses they need to write to? In the case for PCI express – the PCI standard describes a method where cards can advertise what their requirements are and the enumerating software can allocate an address. This makes it very easy for device drivers as they first obtain the assigned I/O address, often via ‘platform_get_resource‘ with an IORESOURCE_IO flag. This provides a struct resource which can then be used with ‘ioport_map’ to return a ‘cookie’ that can be used to make I/O accesses. We use the word cookie instead of address as it should actually be treated as an opaque value to the caller and indicates that the value should only be used with certain functions.

There are two main implementations of ioport_map – one from CONFIG_GENERIC_IOMAP and a generic version. Some architectures such as alpha and SH have their own. All these implementations achieve the same thing but in a slightly different way. The purpose of this function is to do the right thing such that the provided cookie can be used with the iowrite family of calls (another level of indirection over outb).

Where CONFIG_GENERIC_IOMAP is not set ioport_map (oddly defined in asm-generic/io.h) adds PCI_IOBASE to the provided address. The iowrite functions are also defined in the same file which translate to simple writel functions. (Hence here we add the PCI_IOBASE at the ioport_map stage).

When CONFIG_GENERIC_IOMAP is set, then ioport_map now defined in lib/iomap.c adds to the provided address and offset of PIO_OFFSET (see why we wanted to call it a cookie?). However this time, the iowrite functions are defined through some ugly macros in lib/iomap.c. When you call iowrite8 it will now look at the value of the address and decide to call writel (for MMIO) or outb (for anything with PIO_OFFSET added to it). In this use-case it’s the outb function that adds PCI_IOBASE.

It all seems a bit unnecessary and overly complex, though the lesson to take from this is to always use iowrite functions for PIO, definitely map any address first with ioport_map first (remembering its a cookie), oh and perhaps if you have a choice perhaps avoid PIO.

To complicate the picture a little more, some older difficult to share ISA devices work on fixed I/O addresses. This means that their device drivers expect to issue I/O to those fixed addresses – which is pretty horrible. Though thankfully due to configuration of the PCI express fabric (grep for PCI_BRIDGE_CTL_ISA and PCI_BRIDGE_CTL_VGA) and the automatic addition of the PCI_IOBASE offset this is no problem – this is why drivers such as drivers/video/fbdev/vga16fb.c use hard coded addresses and magically work.

You may also like...

Popular Posts