RS232

The Impact of Non-Blocking Consoles (nbcon) on Linux Boot Time

During the Linux boot process, the boot console is responsible for relaying boot messages to the user. On embedded systems this console is often attached to a serial port that can only handle a maximum data rate of a few kilobytes per second which degrades overall boot time. In this post we’ll explore how the use of Non-Blocking Consoles (nbcon) can be used to allow for a serial based console without an impact on boot time.

The Status Quo

The impact of booting Linux with a serial console is significant. For example, a Yocto core-image-minimal image on a Raspberry Pi 3 B+ with a serial console at 115200 baud takes approximately 4.7 seconds to boot. Yet at 9600 baud, the same image on the same hardware takes around 20.7 seconds to reach init. The impact of using a serial console is so great because initcalls during boot are generally not parallelised, so any delays from using printk (which blocks waiting for characters to be emitted) directly impact the overall boot time.

Given this, if we wish to reduce boot times, one solution is to simply suppress printing kernel messages to the console. For example by adding loglevel=1 to the kernel command line, we can further reduce the boot time to around 3.2 seconds. However, this is not ideal – being able to see boot messages is extremely useful not only to verify that everything is working as intended, but also to diagnose issues when things stop working. Ideally we would like to have an minimal boot time without the compromise of limiting console output.

Introducing Nbcon

The release of Linux 6.12 marked a significant milestone in the development of the kernel. After two decades of development, the out-of-tree PREEMPT_RT patches were finally merged allowing for a fully preemptible kernel (at least for some architectures). One of the last major hurdles that had prevented real-time support from being mainlined related to the implementation of printk and the console subsystem on which printk operates. Over the last couple of years the console subsystem has undergone a complete overhaul, adding an entirely new interface which supports threaded and atomic write operations known as nbcon. Additionally, the “Big Kernel Lock” (BKL) that protected access to the console subsystem from simultaneous use by concurrent processes, was dropped in favour of a more granular per-console lock mechanism. The introduction of nbcon, is explained by the author in one of the nbcon patch sets notes:

The current console/printk subsystem is protected by a Big Kernel Lock,
(aka console_lock) which has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and
makes forced takeover and output in emergency and panic situations a
fragile endeavour that is based on try and pray.

The goal of non-BKL (nbcon) consoles is to break out of the console lock
jail and to provide a new infrastructure that avoids the pitfalls and
allows console drivers to be gradually converted over.

The proposed infrastructure aims for the following properties:

  - Per console locking instead of global locking
  - Per console state that allows to make informed decisions
  - Stateful handover and takeover

A later patch introduced a dedicated printk kthread to handle the task writing of buffered printk messages. This is important as it means that callers of printk no longer get blocked while printing to the console as this work is now performed in a separate thread. And thus, a potential consequence of this may be an improvement in boot times! Let’s investigate this.

For our testing, we’ll use a Raspberry Pi 3 B+, and a Yocto core-image-minimal image with everything we need. After updating Yocto to use a 6.12 kernel and with matching kernel headers we were able to boot 6.12 on our Pi:

root@raspberrypi3-64:~# uname -a
 Linux raspberrypi3-64 6.12.8-v8 #1 SMP PREEMPT Mon Jan  6 14:23:38 UTC 2025 aarch64 GNU/Linux

Let’s now explore nbcon and its effect on boot time.

Measuring Printk without Nbcon

Let’s start by creating a simple kernel module to benchmark how long calls to printk take, we’ll use the following function to benchmark printk.

static int __init printk_test_init(void) {
        const int max_iter = 64;

        unsigned long long delta;
        ktime_t start, end;

        start = ktime_get();

        for (int i = 1; i <= max_iter; i++) {
                printk(KERN_INFO "%d/%d################################################################################\n", i, max_iter);
        }

        end = ktime_get();

        delta = ktime_us_delta(end, start);
        printk(KERN_INFO "Execution time: %lld usecs\n", delta);
        return 0;
}

Let’s run this on our Raspberry Pi, please note that the serial line is running at the default baudrate of 115200:

root@raspberrypi3-64:~# stty speed -F /dev/ttyS0
115200
root@raspberrypi3-64:~# modprobe printk-test
[   46.339181] printk_test: loading out-of-tree module taints kernel.
[   46.351831] 1/64################################################################################
[   46.360782] 2/64################################################################################
[   46.369744] 3/64################################################################################
...
[   46.920043] 64/64################################################################################
 [   46.929066] Execution time: 577235 usecs

And then after switching the console speed to 9600 baud using stty, we can run the test again:

root@raspberrypi3-64:~# stty speed 9600 -F /dev/ttyS0
root@raspberrypi3-64:~# modprobe printk-test
...
[  841.260147] Execution time: 6880447 usecs

The execution time of our module is 0.577 and 6.88 seconds at 115200 and 9600 baud respectively. Taking roughly 12 times longer to run at 9600 baud; this is to be expected, given that 115200 baud is 12 times faster than 9600 baud.

As we can see, the use of printk without the use of nbcon has a significant impact on execution time due to our chosen baudrate and the synchronous/blocking implementation of printk.

Making use of Nbcon

With the new nbcon interface, a per-console kthread now handles the task of writing data sent to the console out to the physical serial port. Let’s now see what impact nbcon has on calls to the printk function.

With the new nbcon interface already merged into the kernel, all that’s left for the kernel developers to do now is to implement this new interface within the serial port drivers. Thankfully, patches have been submitted for review to the kernel mailing list which bring nbcon support to the 8250 serial port driver that the Raspberry Pi uses.

Looking deeper into the patch notes, a more detailed description of the new nbcon interface is referenced in the v2 series of patches. Here we can see that a flag has been added to /proc/consoles to identify whether or not the console is an nbcon:

Users can verify the UART console is an NBCON console via the
proc filesystem. For example:

$  cat /proc/consoles
ttyS0                -W- (EC N  a)    4:64

The 'N' shows that it is an NBCON console.

There will also be a dedicated printing kthread. For example:

$ ps ax | grep pr/
   16 root       0:00 [pr/ttyS0]

As these patches are still in review and are yet to be merged into the kernel, it’s necessary to download and apply these patches to our kernel. Let’s take a look at /proc/consoles and check for the printk kthread on ttyS0:

root@raspberrypi3-64:~# uname -r
6.12.8-v8
root@raspberrypi3-64:~# cat /proc/consoles 
ttyS0                -W- (EC Np a)    4:64
root@raspberrypi3-64:~# ps ax | grep pr/
    75 ?        S<     0:00 [pr/ttyS0]
   362 ttyS0    S+     0:00 grep pr/

We can see that ttyS0 has ‘N’ set, indicating that our serial port (ttyS0) is indeed operating as an nbcon console, and we can see the associated printk kthread, [pr/ttyS0].

Measuring Printk with Nbcon

Now we’ll re-run our test, as we did before:

root@raspberrypi3-64:~# modprobe printk-test
...
[   47.578273] 1/64################################################################################
...
[   47.578689] 64/64################################################################################
[   47.578696] Execution time: 423 usecs

And again at a lower 9600 baud:

root@raspberrypi3-64:~# stty speed 9600 -F /dev/ttyS0
root@raspberrypi3-64:~# modprobe printk-test
 [  757.274444] 1/64################################################################################
...
[  757.274869] 64/64################################################################################
 [  757.274876] Execution time: 431 usecs

So now our test executes in a matter of microseconds, much much faster than it did when using a legacy console. So, what’s actually happening here? When we call printk with an nbcon enabled console, it now behaves asynchronously; all of the printk messages are buffered in the kernel’s printk buffer, and the task of writing these messages out to the serial port has been delegated to the printk kthread. Calling the printk function no longer introduces a significant overhead on the execution time of our module. Furthermore, we see that our choice of baudrate no longer impacts the execution time of printk. Though of course, the serial port baud rate is unchanged so it still takes the same amount of time for the characters to be printed.

The Effect of Nbcon on Boot Times

We’ve verified that printk for nbcon enabled consoles behave asynchronously and thus allows code that calls printk() to execute much faster. Given the volume of console messages produced during boot, we’d now like to see what effect this has on boot times.

Let’s measure the boot time with and without the nbcon driver patches at various baudrates. We’ll use the timestamps provided by printk (enabled using the CONFIG_PRINTK_TIME kernel configuration option) to measure the time taken to boot. We’ll use the “Run /sbin/init as init process” message as indication that the kernel has finished booting.

Baudrate (bps)Legacy console (s)nbcon console (s)Difference (s)
240082.25482.509+0.255
960019.10618.909-0.197
1152004.5064.461-0.045
Time taken to reach “clk: Not disabling unused clocks” both with and without nbcon at various baudrates, as reported by the printk timestamp.

As our results show, when using an nbcon enabled console there doesn’t appear to be any significant change in overall boot time. That’s unexpected and a little disappointing. Let’s take a closer look at the timings of the messages in the boot logs when the serial port is initialised, specifically the amount of time that passes between the line that relates to the console being enabled and the next line in the log:

[ 3.034870] printk: legacy console [ttyS0] enabled
[ 4.306170] Indeed it is in host mode hprt0 = 

Let’s measure the difference between these two lines (which is an approximation of the time taken to initialise the console) at various baud rates, with and without nbcon. It’s worth pointing out that when nbcon consoles are used the ‘legacy’ word is dropped from the above output.

BaudrateLegacy console (s)nbcon console (s)Difference (s)
240075.91776.848-0.931
960015.19715.307-0.110
1152001.2711.278-0.007

As the table above shows, immediately after console initialisation there is a delay in printk output, and this delay grows higher at lower baudrates. It’s also worth noting that this delay is unchanged between nbcon and legacy consoles. This suggests that after the serial console has been initialised it’s probably writing out it’s printk buffer to the serial port in a blocking manner.

Let’s further explore this by adding the “initcall_debug” option to the kernel command line. This will enable kernel module initcall tracing, printing the name of the function and its return code along with the execution time.

[    1.978032] probe of 3f215040.serial returned -517 after 8 usecs
...
[    3.065188] calling  deferred_probe_initcall+0x0/0xc8 @ 1
...
[    3.071900] 3f215040.serial: ttyS0 at MMIO 0x3f215040 (irq = 71, base_baud = 50000000) is a 16550
[    3.071944] printk: console [ttyS0] enabled
[    3.072055] probe of cam1_regulator returned 0 after 1515 usecs
[    3.503511] Indeed it is in host mode hprt0 = 00021501
[  123.043672] usb 1-1: new high-speed USB device number 2 using dwc_otg
[  123.177479] Indeed it is in host mode hprt0 = 00001101
[  123.318273] probe of 3f215040.serial returned 0 after 120247073 usecs
...
[  123.709772] initcall deferred_probe_initcall+0x0/0xc8 returned 0 after 120644576 usecs

With “initcall_debug” turned on (which results in a lot of additional output) and the serial port configured at 9600 baud, we can see that there is an initial attempt to initialise the serial port that happens around at 1.98 seconds, but this fails with -517 (EPROBE_DEFER), indicating that the serial port driver wishes to defer the initialisation to a later stage (due to dependent devices not yet being ready).

Some time later, we reach a call to deferred_probe_initcall where all of the deferred initcalls are retried. At this point, we see that the call to “probe of 3f215040.serial” takes around 120 seconds! To understand why this takes so long, we’ll need to look at the kernel source code, namely the register_console function, which is called by the serial port driver after device initialisation. This function makes a call to get_init_console_seq, which then calls console_flush_all.

Let’s take a look at the console_flush_all function:

static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
{
	struct console_flush_type ft;
	bool any_usable = false;
	struct console *con;
	bool any_progress;
	int cookie;

	*next_seq = 0;
	*handover = false;

	do {
		any_progress = false;

		printk_get_console_flush_type(&ft);

		cookie = console_srcu_read_lock();
		for_each_console_srcu(con) {
			short flags = console_srcu_read_flags(con);
			u64 printk_seq;
			bool progress;

			/*
			 * console_flush_all() is only responsible for nbcon
			 * consoles when the nbcon consoles cannot print via
			 * their atomic or threaded flushing.
			 */
			if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
				continue;

			if (!console_is_usable(con, flags, !do_cond_resched))
				continue;
			any_usable = true;

			if (flags & CON_NBCON) {
				progress = nbcon_legacy_emit_next_record(con, handover, cookie,
									 !do_cond_resched);
				printk_seq = nbcon_seq_read(con);
			} else {
				progress = console_emit_next_record(con, handover, cookie);
				printk_seq = con->seq;
			}

			/*
			 * If a handover has occurred, the SRCU read lock
			 * is already released.
			 */
			if (*handover)
				return false;

			/* Track the next of the highest seq flushed. */
			if (printk_seq > *next_seq)
				*next_seq = printk_seq;

			if (!progress)
				continue;
			any_progress = true;

			/* Allow panic_cpu to take over the consoles safely. */
			if (other_cpu_in_panic())
				goto abandon;

			if (do_cond_resched)
				cond_resched();
		}
		console_srcu_read_unlock(cookie);
	} while (any_progress);

	return any_usable;

abandon:
	console_srcu_read_unlock(cookie);
	return false;
}

So here we can see that console_flush_all is responsible for emitting all of the buffered printk messages on all registered consoles. Each message is emitted using either nbcon_legacy_emit_next_record or console_emit_next_record , depending on if the console is an nbcon or legacy console (flags & CON_NBCON). Unfortunately this process happens synchronously and, as expected, is the cause of the delay during boot that we see during console registration.

We can further confirm this is the case by calculating how long we’d expect printing to take to print all prior messages. For example, at 9600 baud, there were 14518 characters before “printk: console [ttyS0] enabled”, and data is sent at 960 bytes/second (9600 baud (bits/sec) = 9600/(8 data bits + 2 stop bits) = 960 bytes/sec). To transmit the 14518 characters, we expect this to take 15.122 seconds (14518/960). This aligns with what we see in the boot logs and tables above.

Now let’s compare the timings as reported by initcall_debug after console registration when using legacy vs nbcon console at 9600 baud:

initcallLegacy (s)nbcon (s)Difference (s)
initcall of_cfs_init+0x0/0xe00.0006820.000103-0.000579
initcall clk_disable_unused+0x0/0x158 0.0000050.000002-0.000003
initcall genpd_power_off_unused+0x0/0xc8 0.0000540.000038-0.000016
Comparison of initcall timings after console registration when using legacy vs nbcon console

Due to the fact that the 8250 serial driver is deferred and registration of the console happens quite late in the boot process, there’s only a few modules remaining that are still to be initialised that also make calls to printk. But we can see that there are consistent gains being made here emphasising that nbcon can improve the speed of boot. However, since these late initcalls only produce a small amount of output on the console, the speed up seen is marginal.

Improving Boot Time with Nbcon

We’ve seen that serial ports are slow, and we’ve seen that this can result in larger boot times due to the blocking nature of printk calls with legacy consoles. This is often mitigated by reducing the log level of the kernel, but it is a compromise. Fortunately, with the introduction of nbcon consoles, printk’s return immediately, leaving the slow emission of characters to a kthread. Unfortunately, the current implementation of nbcon performs an initial flush of the printk buffer upon initialisation in a synchronous manner. Thus where the console is initialised late in the boot process (as it appears to be on our Raspberry Pi), then much of the potential gain of using nbcon for reduced boot times is lost.

It’s possible that with further development of nbcon, the initial console_flush_all could be performed asynchronously thus resulting in no variation in boot time due to the level of console output or the baud rate.

Cover Image Credit: Seth Morabito

You may also like...

Popular Posts