Edge AI: 2 Second Linux Boot on i.MX8

The Demo

This demonstration shows that with our optimisation experience and expertise – it’s possible to cold boot an i.MX8, running Linux, to a useful state of functionality in only 2 seconds.

Our edge AI use-case is a ‘Tux Mascot Detector’. The i.MX8 based hardware captures video from the camera and performs image classification to determine if the object in view is the Linux Tux Mascot or something else. The OLED display indicates what has been detected, along with a confidence rating. The traffic lights will signal green if Tux has been detected, red if something else has been detected, or amber if there is insufficient confidence.

When the reset button is pushed the board resets and a cold boot occurs. The OLED and display illuminate as soon as the first inference has been performed.

The Boot Time

The optimised boot time, as measured from the bootloader’s first line of output to the first image inference result, is approximately 1.9 seconds – with the userspace application outputting console messages after just 1.6 seconds.

Prior to our software optimisations the original boot time was 19.6 seconds – thus representing a huge 90% reduction in boot time.

The Hardware

We used the following hardware:

Coral Dev Board – based on a NXP i.MX 8M SoC (quad Cortex-A53, Cortex-M4F) with Coral Edge TPU accelerator (4 TOPS @ int8) and 1 GB LPDDR4 RAM. Booted from eMMC (for optimised boot time) or micro SD (for original boot time). Powered by USB-C.
Coral Environmental Sensor Board – providing an 128×32 OLED display (via SPI).
Coral Camera – a 5MP camera using Omnivision’s OV5645 sensor with built-in ISP. Connected to the Coral Dev Board via MIPI CSI.
Pi Stop Traffic Lights – 3 LEDs connected to the Coral Dev Board via GPIO.

The Software

We used the following software:

Yocto distribution (based on meta-coral / warrior)
U-Boot and U-Boot SPL 2007.03 (coral u-boot-imx / release-chef)
Arm Trusted Firmware 1.5 (IMX ATF / imx_4.9.123_imx8mm_ga)
Linux 4.9.51 (coral linux-imx / release-chef)
Custom image classification application written in C using TensorFlow lite 2.6.0 and OpenCV 3.4.5
MobileNet v2 convolutional neural network

The Boot Time Optimisation Process

There are many approaches to reducing boot time – the most common are hibernation, suspend/resume and checkpoint. These all work based on the assumption that it will take less time to restore the previously saved state of a component than it will take to initialise it from scratch. For example restoring the contents of RAM may be quicker than executing an init daemon, startup scripts and applications.

The approach we’ve taken is cold-boot optimisation. We used our experience and expertise to carefully examine the software flow in a cold boot, identify inefficiencies and optimise them out. We’ve amassed a wide range of tools and knowledge to help us identify inefficiencies, including our own software that allows us to quickly identify inefficiencies we’ve previously seen.

Typically we’ll remove software features that are not required (thus specialising the software to a single purpose) and optimise the software features that are required. If there is functionality that is required, but not immediately on boot, then we may also reorder the initialisation of software.

Here are some of the optimisations that we performed for the demo:

Removed unused features – specifically features that contribute to initialisation delays or increase the size of the software (which results in increased delays whilst loading from storage).
During boot the kernel initialises one driver at a time – for some drivers with large initialisation times (e.g. hardware delays) we allowed them to initialise in parallel.
Replaced the systemd init process with our application and modified it to set up its own dependencies (e.g. mounting the sysfs filesystem so it can interact with /sys/class/gpio).
Removed inefficiencies in device drivers (e.g. the camera driver spends time writing to registers via I2C, however some of these writes were unnecessary as they match the power-on defaults of the sensor).
Suppressing output over the serial port across all boot components (as this is very slow).
Optimised the application by using V4L instead of gstreamer as OpenCV’s backend. We also used a thread to capture frames such that we can perform initialisation of the application whilst waiting for the first frame from the camera.
Used a read-only filesystem to reduce mount time and booted from eMMC (faster than SD).
Ensured that the kernel boots with an optimal CPU frequency.
Modified the application and tensorflow model to make use of the Edge TPU hardware acceleration. This reduced the inference time and thus latency of the first result.
Removed U-Boot as this provided little value.

The entire boot time optimisation process took approximately 3 weeks of effort.

Can it Boot Quicker?

Absolutely! We identified additional inefficiencies, however since there are diminishing returns on effort as the boot time decreases, we felt we reached a good point to stop.

It’s also worth pointing out that optimising software is only one part of reducing boot time. Many gains can also be made by making ‘boot time conscious’ decisions whilst designing hardware.

How Representative is this?

Achieving a minimal boot time can often involve trade-offs – especially when targeting very small boot times. For example it may require that user-space applications are redesigned, rewritten or must rely on lighter weight library alternatives. Perhaps it may require changes to the development process which can inconvenience developers or perhaps a functional requirement of the product must be solved in a completely different way. In other words, unless boot time is of paramount importance above all other product features, it is likely that when faced with these trade-offs product owners will likely prefer a smaller boot time rather than a minimal boot time. It may be possible to have both minimal boot time and no trade-offs – but it’s likely a very large amount of effort would be needed.

Whilst our demo performs its intended purpose, it doesn’t have features such as firmware update and network connectivity – however it’s unlikely that these features would be required immediately after the device has booted. Thus it would be reasonably trivial to add these features and to ensure their initialisation occurs after the first image inference.

How we Can Help You

Our experts can perform the boot time optimisation process on your product. Reach out to us at sales@thegoodpenguin.co.uk to arrange a call and discuss how best we can assist you.

Read more about how we support Coral customers on our Coral and The Good Penguin page.

Alternatively read more about the boot time reduction services we offer or visit our blog to learn more on boot time optimisation.