Sand Timer

We made Linux v6.19+ boot quicker for everyone!

Booting Linux quickly is highly desirable for embedded devices, particularly those that are battery-powered or require high availability. A common approach for reducing boot times with minimal effort is to remove unused features, effectively turning a general purpose operating system into a specialised one. However, this compromise of sacrificing features for boot time reduces flexibility and results in fragile systems, allowing boot times to easily regress as software development continues. Fortunately, as part of the effort to add real-time support to the v6.12 Linux kernel, the printk framework was updated to allow for console printing to be performed asynchronously. This eliminates the delays associated with slow consoles such as serial ports, potentially allowing the kernel to boot quicker without removing or suppressing console output as is common practice. In this blog post we’ll describe how slow consoles can reduce boot time, how the updated printk framework can potentially allow us to reduce boot times, and the work we’ve undertaken and upstreamed to deliver that benefit.

Consoles can be Slow

When Linux boots, the majority of its time is spent executing initialisation code provided by the kernel’s initcall mechanism, allowing each driver or subsystem to perform initialisation routines. However, this mechanism, which you can read more about here, isn’t generally parallelised. This means that a delay in a single initcall will delay the entire kernel boot time. A very common delay in drivers is the use of printk, especially where the attached console is a serial port, as is often the case in embedded systems. Printing characters to the serial port is very slow, thus every time a driver writes to a serial port, the whole kernel boot time is held up waiting for those characters to make it on the wire.

A common approach to improve boot time is to suppress console output, often by making use of the ‘quiet‘ and/or ‘loglevel‘ arguments on the kernel command line. Depending on the amount of data written during boot and the baud rate of the serial port, the boot time savings can be substantial. However, this is a compromise, why should we have to lose functionality (the loss of output during boot) in order to improve boot time?

A Happy and Unintended Consequence of PREEMPT_RT

For more than 20 years, kernel developers have been working to improve the Linux kernel’s real time performance to allow for a fully preemptible kernel with minimal unbounded delays. This has mostly existed as an out-of-tree patchset known as PREEMPT_RT, but thanks to the tireless work of the community, this was finally merged for v6.12. In fact, it was merged with much fanfare, with a printed pull request wrapped in gold being personally handed to Linus Torvalds!

The last sticking points for upstreaming this work involved the console and printk subsystems, a surprisingly complex part of the kernel. Of interest to us though, was the introduction of a legacy console printer kthread – a dedicated thread (when CONFIG_PREEMPT_RT is enabled) with the purpose of writing buffered console messages to the console. Thus making the process of writing messages to the console asynchronous to the calling task. Consequently, as printk no longer introduces these delays, we might expect the overall boot time to improve. But does it?

The Missing Piece

We were keen to see the impact the legacy console printer kthread had on boot time. Therefore we made use of a PocketBeagle 2 kindly gifted to us at last year’s Open Source Summit conference. We used Yocto and the latest release of the Linux kernel with PREEMPT_RT enabled. To highlight the impact of our a slow console, we used a baud rate of 9600. We also increased the amount of logging to the console by increasing the ‘loglevel‘ value and by enabling ‘initcall_debug‘.

We observed that whilst individual initcalls completed quicker (than before v6.12), we found that there was no significant improvement in overall boot time. Let’s take a look at our kernel output after adding some trivial debug:

Starting kernel …

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 7.0.0-ti (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 13.4.0, GNU ld (GNU Binutils) 2.42.0.20240723) #1 SMP PREEMPT_RT Tue Jul 22
  02:41:13 UTC 2025
[    0.000000] KASLR disabled due to lack of seed
[    0.000000] Machine model: BeagleBoard.org PocketBeagle2
...
[    0.000000] Kernel command line: console=ttyS2,9600 initcall_debug=1 loglevel=10 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
...
[    6.250399] calling  of_platform_sync_state_init+0x0/0x30 @ 1
[    6.250839] initcall of_platform_sync_state_init+0x0/0x30 returned 0 after 426 usecs
[    6.250878] init/main.c:console_on_rootfs: +filp_open
[    6.306008] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    6.306371] probe of fa00000.mmc returned 0 after 146929 usecs
[    6.369551] mmc1: new UHS-I speed SDR104 SDHC card at address 5048
[    6.381275] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[    6.411786]  mmcblk1: p1 p2
[    6.417850] probe of mmc1:5048 returned 0 after 45411 usecs
[   16.363508] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30040000.pruss
[   16.363550] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e010000.watchdog
[   16.363565] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to e000000.watchdog
[   16.363703] ti_sci_pm_domains 44043000.system-controller:power-controller: sync_state() pending due to 30300000.crc
[  184.228355] init/main.c:console_on_rootfs: -filp_open
[  184.323831] EXT4-fs (mmcblk1p2): recovery complete
[  184.334289] EXT4-fs (mmcblk1p2): mounted filesystem e5b814d1-5a20-4fc1-a1bc-4f7bf5194643 r/w with ordered data mode. Quota mode: none.
[  184.335081] VFS: Mounted root (ext4 filesystem) on device 179:2.

You’ll see that within 7 seconds, the kernel has quickly progressed through all of its initcalls. However, there is then a considerable amount of time that passes before the root filesystem has mounted. You would be forgiven for assuming this delay is related to the root filesystem, however thanks to our debug we can see that the time is, in fact sandwiched between the following printks:

/* Open /dev/console, for stdin/stdout/stderr, this should never fail */
 void __init console_on_rootfs(void)
 {
         printk("init/main.c:console_on_rootfs: +filp_open\n");
         struct file *file = filp_open("/dev/console", O_RDWR, 0);
         printk("init/main.c:console_on_rootfs: -filp_open\n");

This is code from init/main.c, at a late point during the kernel boot where it is opening /dev/console. However, this takes 178 seconds! We quickly identified that this delay was due to contention of the console lock against the new legacy threaded console printing, let’s take a look at that code:

static int legacy_kthread_func(void *unused)
 {
         for (;;) {
                wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
                if (kthread_should_stop())
                        break;
                console_lock();
                __console_flush_and_unlock();
         }
         return 0;
 }

As you’ll see, the background thread for printing our console messages calls console_lock to obtain a lock for the console. It then calls __console_flush_and_unlock, which is a function that will spin in a loop printing buffered console messages, before finally releasing the lock.

So despite printing our boot console via a thread, at the end of the boot the kernel is stuck waiting for a console lock such that it can create /dev/console for userspace to use. But that lock is held by the background printer, thread thus eliminating any boot time gains we’ve made up to this point.

Our Upstream Work to Improve Boot time

We could see that the introduction of kernel threads to print buffered console messages had the potential to provide a huge benefit to those wishing to optimise boot time. Therefore we came up with a solution to what is a lock contention problem, we updated the legacy printing thread such that the console lock is acquired and released for each and every log message that is written, rather than being held until the entire buffer has been flushed. This allows other users (such as the kernel init process) to obtain the lock briefly, allowing the boot process to continue. The updated source looks like this:

static int legacy_kthread_func(void *unused)
 {
        bool try_again;

 wait_for_event:
        wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());

        do {
                bool handover = false;
                u64 next_seq = 0;
                if (kthread_should_stop())
                        return 0;
                console_lock();
                console_flush_one_record(true, &next_seq, &handover, &try_again);                    
                if (!handover)
                        __console_unlock();
        } while (try_again);

        goto wait_for_event;
 }

As you can see, rather than calling __console_flush_and_unlock() which will flush the entire buffer, we now call a new function console_flush_one_record() before releasing the lock and repeating until completion.

By working with the upstream community, we upstreamed our work which is now available in Linux kernels from v6.19 and onwards.

A huge reduction in Boot Time

Let’s now take a look at the boot time with and without our upstream patches. Let’s start without our patches:

Starting kernel …

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 7.0.0-ti (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 13.4.0, GNU ld (GNU Binutils) 2.42.0.20240723) #1 SMP PREEMPT_RT Tue Jul 22
  02:41:13 UTC 2025
[    0.000000] KASLR disabled due to lack of seed
[    0.000000] Machine model: BeagleBoard.org PocketBeagle2
...
[    0.000000] Kernel command line: console=ttyS2,9600 initcall_debug=1 loglevel=10 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
...
[    6.250878] init/main.c:console_on_rootfs: +filp_open
...
[  184.228355] init/main.c:console_on_rootfs: -filp_open
...
[  184.335081] VFS: Mounted root (ext4 filesystem) on device 179:2.
[  184.337025] devtmpfs: mounted
[  184.383795] VFS: Pivoted into new rootfs
[  184.557231] Freeing unused kernel memory: 10816K
[  184.569127] Checked W+X mappings: passed, no W+X pages found
[  184.569199] Run /sbin/init as init process

As you can see from the above output, it takes 184 seconds for the kernel to boot and launch the first user process. The full dmesg output is approximately 171k characters, which will take at least 178 seconds to print at our baud rate of 9600. Let’s now boot the kernel again, but this time with our upstream patches, where the kernel isn’t held back by a slow console:

Starting kernel …

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 7.0.0-ti (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 13.4.0, GNU ld (GNU Binutils) 2.42.0.20240723) #1 SMP PREEMPT_RT Tue Jul 22
  02:41:13 UTC 2025
[    0.000000] KASLR disabled due to lack of seed
[    0.000000] Machine model: BeagleBoard.org PocketBeagle2
...
[    0.000000] Kernel command line: console=ttyS2,9600 initcall_debug=1 loglevel=10 root=PARTUUID=076c4a2a-02 rw rootfstype=ext4 rootwait
...
[    4.572019] init/main.c:console_on_rootfs: +filp_open
...
[    5.806344] init/main.c:console_on_rootfs: -filp_open
...
[    7.286016] VFS: Mounted root (ext4 filesystem) on device 179:2.
[    7.288029] devtmpfs: mounted
[    7.325621] VFS: Pivoted into new rootfs
[    7.497536] Freeing unused kernel memory: 10816K
[    7.509308] Checked W+X mappings: passed, no W+X pages found
[    7.509409] Run /sbin/init as init process

As you can see, our boot time has now reduced from 184.5 seconds to 7.5 seconds, a massive 95% reduction and a huge win for those wishing to reduce boot times. You can also see the reduction in delay for the ‘filp_open’ call.

The User Experience

When the legacy printer thread is used on a PREEMPT_RT kernel, console messages will not slow down the kernel’s progression through boot. So what happens when userspace starts and starts writing to /dev/console, or a getty is started on the console? In this case, the output is intermixed, here are some extracts.

[2026-04-24 16:37:21.964] [    0.000000] Linux version 7.0.0-ti (oe-user@oe-host) (aarch64-poky-linux-gcc (GCC) 13.4.0, GNU ld (GNU Binutils) 2.42.0.20240723) #1 SMP PREEMPT_
 RT Tue Jul 22 02:41:13 UTC 2025
...
[2026-04-24 16:37:27.468] [    0.000000] Root IRQ handler: gic_handle_irq
 INIT: [    0.000000] GICv3: GICv3 features: 16 PPIs
 [2026-04-24 16:37:27.598] [    0.000000] GICv3: GICD_CTLR.DS=0, SCR_EL3.FIQ=1
 [2026-04-24 16:37:27.662] [    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000001880000
 [2026-04-24 16:37:27.727] [    0.000000] ITS [mem 0x01820000-0x0182ffff]
 [2026-04-24 16:37:27.792] [    0.000000] GIC: enabling workaround for ITS: Socionext Synquacer pre-ITS
 [2026-04-24 16:37:27.857] [    0.000000] ITS@0x0000000001820000: Devices Table too large, reduce ids 20->19
 [2026-04-24 16:37:27.921] [    0.000000] ITS@0x0000000001820000: allocated 524288 Devices @80c00000 (flat, esz 8, psz 64K, shr 0)
 [2026-04-24 16:37:28.051] [    0.000000] ITS: using cache flushing for cmd queue
 [2026-04-24 16:37:28.116] [    0.000000] GICv3: using LPI property table @0x0000000080660000
 [2026-04-24 16:37:28.181] version 3.04 booting[    0.000000] GIC: using cache flushing for LPI property table
 [2026-04-24 16:37:28.310] [    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000080670000
...
[2026-04-24 16:37:32.712] [    0.435295] /bus@f0000/interrupt-controller@1800000: Fixed dependency cycle(s) with /bus@f0000/interrupt-controller@1800000
 [2026-04-24 16:37:32.842] [    0.619008] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
 [2026-04-24 16:37:32.906] [    0.619029] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
 [2026-04-24 16:37:32.971] Starting udev[    0.619037] HugeTLB: registered 32.0 MiB page size, pre-allocated 0 pages
 [2026-04-24 16:37:33.101] 
 [2026-04-24 16:37:33.101] [    0.619041] HugeTLB: 0 KiB vmemmap can be freed for a 32.0 MiB page
 [2026-04-24 16:37:33.166] [    0.619046] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
...
[2026-04-24 16:37:41.783] [    3.638479] ti-udma 485c0000.dma-controller: Channels: 35 (tchan: 20, rchan: 15)
 [2026-04-24 16:37:41.913] [    3.701867] 2800000.serial: ttyS7 at MMIO 0x2800000 (irq = 289, base_baud = 3000000) is a 8250
 INIT: [    3.721904] 2860000.serial: ttyS2 at MMIO 0x2860000 (irq = 290, base_baud = 3000000) is a 8250
 [2026-04-24 16:37:42.108] [    3.722844] printk: legacy console [ttyS2] enabled
 [2026-04-24 16:37:42.173] [    4.075971] xhci-hcd xhci-hcd.3.auto: xHCI Host Controller
 [2026-04-24 16:37:42.238] Entering runlevel: 5[    4.076617] xhci-hcd xhci-hcd.3.auto: new USB bus registered, assigned bus number 1
 [2026-04-24 16:37:42.367]                                                    
 [2026-04-24 16:37:42.367] [    4.077865] xhci-hcd xhci-hcd.3.auto: USB3 root hub has no ports
...
[2026-04-24 16:37:43.989] [    5.995514] init/main.c:console_on_rootfs: -filp_open
 [2026-04-24 16:37:44.053] [    6.328078] EXT4-fs (mmcblk1p2): recovery complete
 [2026-04-24 16:37:44.119] 
 [2026-04-24 16:37:44.119] [    6.332884] EXT4-fs (mmcblk1p2): mounted filesystem e5b814d1-5a20-4fc1-a1bc-4f7bf5194643 r/w with ordered data mode. Quota mode: none.
 [2026-04-24 16:37:44.249] Poky (Yocto Project Reference Distro) 5.0.12 pocketbeagle2 /dev/[    6.333663] VFS: Mounted root (ext4 filesystem) on device 179:2.
 [2026-04-24 16:37:44.443] ttyS2
 [2026-04-24 16:37:44.443] [    6.335667] devtmpfs: mounted
 [2026-04-24 16:37:44.443] 
 pocketbeagle2 login: [    6.379415] VFS: Pivoted into new rootfs
 [2026-04-24 16:37:44.508] [    6.473992] Freeing unused kernel memory: 10816K

It’s worth pointing out that the printk timestamps represent the time the console message was created, not delivered via the serial port. We can enable local timestamps from our terminal emulator (i.e. Minicom), as we’ve done above, which show when the messages were received. For this run, we removed the ‘initcall_debug’ and ‘loglevel’ arguments from the kernel command line, thus reducing the amount of data printed.

In the above output you’ll see that the kernel booted in approximately 6.5 seconds (as indicated by the ‘Freeing unused kernel memory’ message which is printed towards the end of boot), with userspace starting shortly after. You can also see that the kernel starts its boot with a Minicom timestamp of approximately 16:37:21. As you may expect, if it takes the kernel just 6.5 seconds to boot, then its not a surprise to see the first userspace output appear around 16:37:27 (as indicated by the sysvinit ‘INIT:’ message). But what is interesting to see is that the ‘Freeing unused kernel memory’ message, despite occurring after only 6.5 seconds is timestamped locally at 16:37:44 – that’s 23 seconds after start. This is interesting to note, as it shows that despite the kernel starting quick and launching userspace applications, the serial console takes much much longer to catch up.

How to Improve Boot Time

As you have now seen, it’s possible to reduce the boot time delays associated with a slow console, without having to suppress or reduce the console output. However, for this to work you’ll need to ensure the following requirements are met:

  • Linux kernel v6.19 or greater, (it should include a commit named “printk: Use console_flush_one_record for legacy printer kthread”)
  • CONFIG_PREEMPT_RT enabled
  • A legacy serial console, this will be dependant on your hardware, however if you have one you’ll see the word ‘legacy’ in the kernel output that shows when the console is enabled, as follows.
[    3.722844] printk: legacy console [ttyS2] enabled

A Side Note on Nbcon

As part of the effort to improve real time performance in the kernel, the console/printk subsystem was overhauled and a new framework known as Non-Blocking Consoles (nbcon) was introduced. This eliminates the use of the Big Kernel Lock (bad for real time), and of particular interest to us, introduced a dedicated printk kthread to handle the task writing of buffered nbcon printk messages. However, the use of nbcon requires console drivers (the drivers that know how to write to a given physical device) to support it. Effort is required to migrate the legacy drivers, and as such the kernel now has ‘legacy consoles’ and regular consoles. It’s also worth pointing out that unlike legacy consoles, nbcon doesn’t require PREEMPT_RT to be enabled as the feature of printing in the background is fundamental to what nbcon aims.

You may wonder, if nbcon uses a kernel thread to print messages asynchronously, then will that also provide minimised boot times? We previously investigated this and initially found some issues relating to using nbcon with a boot console, you can read more in our previous blog post:

Since then we worked with the upstream community to raise this issue and provide a proof of concept solution. The original authors improved upon this and fixed this issue in the 6.17 kernel. Thus eliminating delays for initialisation code that uses printk, however much like legacy consoles, we found that, at least at the time of writing, the overall boot time doesn’t improve as much as we’d expect (and when compared to legacy consoles), thus some further work may be required here.

It Doesn’t Stop Here

If you’re interested in reducing Linux boot time, you may be interested to join the Boot Time Special Interest Group, you can find more information on the elinux.org wiki page. You can also read the Phoronix write up on this work here.

And as always, we’re here to help, so feel free to get in touch.

You may also like...

Popular Posts