Automated tests that make use of interactive consoles are commonplace. Whether it’s a Linux console over a serial port or a proprietary CLI over a network socket – they require little in terms of hardware yet give access to lots of functionality. However writing a reliable automated test can be more challenging and time consuming than you may expect, in this post we’re going to share 5 common gotchas.
1. Go Slow, like a Human
A console or CLI is designed to be operated by a human, thus as soon as you replace the human with a machine and drive it out of spec you risk falling victim to issues and bugs. The most common occurrence of this is when an automated test transmits characters at super-human speeds. If you transmit characters too quickly, especially on serial interfaces without flow-control, not all of those characters will arrive. Buffers from the hardware UART right through to the application may overflow.
Though it’s not only buffer overflow issues you need to worry about – it’s not uncommon for there to be hidden timing related bugs in bootloaders such that some functionality like networking doesn’t work properly if you access it so soon after reset. Whilst these may be valid bugs, they may not be the bugs you want your tests to find.
The best mitigation is to add inter-character delays thus emulating human typing. The original Expect application provides support for this via the -h flag which not only adds an inter-character delay but adds a variable delay.
2. Expect the Future
In many cases a variant of the Expect library is used to assist in the nitty-gritty of obtaining the input/output stream from the device, performing pattern matching and sending commands – These are available in many languages including Tcl, Python and Go. However it’s easy to make incorrect assumptions about how these libraries work which lead to unexpected or unreliable results. Let’s consider automating the following U-Boot interaction:
=> version
U-Boot 2008.10-00273-g040dcfce8b24 (Sep 15 2020 - 10:23:58)
=> sleep 3
=> echo DONE
DONE
=>
Here we’re printing the U-Boot version, sleeping for 3 seconds and then echoing ‘DONE’ to the console. We can automate this by sending the characters “version” and then waiting to see “U-Boot 2008” before continuing. We could continue in this fashion until we end up with an Expect script such as the following:
#! /usr/bin/env python
import pexpect
ss = pexpect.spawn('picocom /dev/ttyUSB0 -b 115200')
ss.send(chr(3))
ss.expect("=>")
ss.send("version\n")
ss.expect("U-Boot 2008")
ss.send("sleep 3\n")
ss.expect("=>")
ss.send("echo DONE\n") # we unexpectedly send this whilst sleep is still running (!)
ss.expect("DONE") # this will timeout
Unfortunately if we ran this it would time out waiting for ‘DONE’ to be printed to the console. We actually sent the ‘echo DONE’ command whilst the ‘sleep 3’ command was still being executed – and as a result U-Boot ignored our input.
After we sent the ‘version’ command we correctly waited for the ‘U-Boot 2008’ string – at this point Expect makes a note of where it is in the captured buffer. We then send the ‘sleep 3’ command and wait for the ‘=>’ prompt – at this point Expect searches from where it left off in the buffer for the prompt. Thus we instantly match the prompt that was sent after the completion of the version command and before the sleep command – we matched the wrong prompt. This is because Expect doesn’t clear the captured buffer each time you send characters, it carries on from where it left off.
Of course we could have avoided this by always expecting a command prompt before issuing a command, effectively using them as ‘fence posts’. Issues such as this can lead to unexpected results and unreliable tests thus it’s highly recommended to understand how Expect works such that you’re always expecting data from the future and not the past.
3. Tolerate Failure, Sometimes
When operating a CLI we instintively overcome and tolerate minor issues – we’re so good at it we don’t even realise we are doing it. Let’s look at an example, the following is output from the end of a Linux kernel boot:
[ 8.443905] urandom_read: 1 callbacks suppressed
[ 8.443918] random: dd: uninitialized urandom read (512 bytes read)
INIT: Entering runlevel: 5
Configuring network interfaces... [ 8.662784] fsl-gianfar e0024000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
[ 9.163080] fsl-gianfar e0025000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
done.
Starting syslogd/klogd: done
Poky (Yocto Project Reference Distro) 2.7.3 /dev/ttyS0
[ 168.009659] random: crng login: init done
A human watching the console may be waiting for a login prompt but unfortunately the serial output from the login process that prints the ‘login: ‘ message has got mixed up with the serial output from the kernel providing some debug ‘[ 168.009659] random: crng init done’. Instinctively a human may notice the kernel output has stopped and hit the enter key a couple of times to get a new prompt. An Expect script however may be specifically looking for a line that starts with the text ‘login: ‘ and as a result times out. On Linux systems it’s a common occurance for kernel output to get mixed up with user-space output on the serial port. In this case a mitigation may be to allow the test to retry (by sending an enter keystroke) upon a timeout.
If a human encounters an error, an initial reaction may to simply repeat the command and in some cases this results in making forward progress. An example may include instructing a boot loader to download a file from the network – the network stack or network may be unreliable and thus trying again may overcome the initial error. Thus tests can be more resilient if they include handling for such issues.
Making tests more tolerable to failure will make tests more reliable – however great care should be taken to determine which failures can be worked around and which failures should be reported as test failures.
4. Avoid delays
There rarely is a good reason for a test to sleep for a set period of time between sending or expecting commands. When humans interact with a CLI the triggers for interactions are based on events – thus automated tests should do the same. If a test has ‘sleep’ calls in it, then it’s likely to break in the future when the relative timings of the software being tested change in the future.
For example, when powering on the device, rather than sleeping for 30 seconds until the device has booted, the test should instead wait for a login prompt. Not only does this make the test more robust, it will result in the test completing faster as the commands are executed as soon as they can be.
Likewise, when sending commands, rather than adding a sleep between commands to allow enough time for their execution, the test should instead wait to receive a command prompt before issuing the next command.
A sleep in a test should be seen as a symptom for a missing expect statement.
5. A Test is for Life
Many of the gotchas in this post have focused on making automated tests more reliable – however it’s also important that tests stand the test of time. Changes to the software on the device-under-test shouldn’t translate to a large increase in technical debt related to maintaining the tests.
One way to improve the life expectancy of a test is to reduce the coupling between the test code and the software feature being tested. For example if a test has been written to measure boot time (as measured to a boot prompt) then a good test would be to power on the device and wait until a prompt is seen – where a prompt is represented by a broad-matching regular expression that would likely match the vast majority of prompts on embedded devices. A poorly written test would likely power on the board, use expect statements to follow through the various boot loaders and kernel boot, and then try to match a specific boot prompt – As soon as someone changes the boot prompt the test will need to be updated.
Tests are more likely to stand the test of time where they test typical use-cases and user interactions at a high-level – such use-cases will likely remain similar regardless of the wide-range of technical implementations that underpin them. A test that physically pushes a button (perhaps via some USB relay to close button contacts) and visually checks that an LED has lit (perhaps via some light sensor) is much simpler and requires no upkeep as the underlying device firmware changes – it also represents a real world use-case. Whereas the equivalent test with a CLI may involve logging into a device and accessing some proprietary interface that can be used to trigger a button press – and then to use the same interface to verify that the firmware translated the button press into an output enable. The latter, being so highly coupled to the implementation, will need to be maintained as the device firmware evolves over time. Thus some balance needs to be made between relying on software and external hardware.
(This post, written by Andrew Murray was originally published on the CanaryQA website).