A recent customer asked us to make sense of USB bandwidth errors that prevented them from streaming multiple USB audio and video streams simultaneously. Their GStreamer pipelines would fail and a “uvcvideo: Failed to submit URB 0 (-28).” message would be seen in dmesg. The interesting part is that the errors would only occur if the streams were started in a particular order. In order to make sense of this and propose a solution we needed to gain a much better understanding of how the kernel reserves USB bandwidth and schedules it’s transfers. This post explores our new found understanding as well as showing how we can calculate, monitor and debug bandwidth usage.
We should point out that our customer’s system exclusively used USB 2.0 High-speed devices and as a result the majority of this post is specific to that and thus doesn’t cover the added complexities of other modes.
USB Streaming of audio and video normally uses isochronous transfers – these are packet based periodic transfers that are initiated by the host which have fixed but guaranteed bandwidth. Host controllers guarantee this bandwidth by planning a schedule of transfers ahead of time to ensure there is enough time reserved on the bus.
Let’s take a look at an Endpoint Descriptor of a typical UVC camera. This output can be obtained via ‘lspci -v’.
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 5
Transfer Type Isochronous
Synch Type Asynchronous
Usage Type Data
wMaxPacketSize 0x0b20 2x 800 bytes
bInterval 1
The above output shows that we have an isochronous endpoint – of particular interest here are the fields which describe the bandwidth supported by this endpoint: wMaxPacketSize and bInterval. The wMaxPacketSize field describes how much data can be transferred in one bus interval – in the case of high-speed devices the interval is known as a micro frame and is 125 microseconds long. High-speed devices can send up to 3 packets in a microframe but in the above example we see “2x 800 bytes” meaning we will read 2 packets of 800 bytes in each 125 microsecond microframe.
Finally the bInterval field describes how often data is sent – in our case the interval is 1 which means data is sent every bus interval. A value of 2 means every second bus interval, a value of 3 means every fourth bus interval and a value of 4 means every eight bus interval. Clearly this means that USB device makers can control how ‘bursty’ their data transfers are, i.e. send a little bit often, or send a big bit less often.
As we’ve covered in a previous blog post, an typical USB device may contain multiple endpoint descriptors which use differing amounts of bandwidth. During the initialisation of an audio or video stream, the relevant drivers will communicate with the device to determine which descriptor should be used.
Let’s now use the above information to determine the payload bandwidth requirements of this endpoint:
bandwidth per uframe = (num of transfers x packet size) / 2^(bInterval-1)
bandwidth = bandwidth per uframe in bits * number of uframes per second
This works out to be 102.4 Mbps – however this isn’t the whole story. The kernel keeps track of how much bandwidth it has reserved for isochronous streams in debugfs – let’s start a video stream and take a look:
$ grep "B:" /sys/kernel/debug/usb/devices
B: Alloc=272/800 us (34%), #Int= 4, #Iso= 5
This shows us that our single video stream is using 34% of the available bandwidth. The USB specification requires that no more than 80% of a microframe can be used for isochronous transfers – as microframes are 125 microseconds, than this gives a maximum of 100 microseconds of bus time. Due to the periodic nature of some isochronous transfers (i.e. endpoints that use a bInterval greater than 1 – known as a high-bandwidth endpoint), it’s possible that packets are sent as infrequently as every 8 microframes. Therefore it makes sense to consider the bandwidth usage over this larger period of time. It’s for these reasons that bandwidth is calculated from the available isochronous bandwidth (80%) of 8 microframes which is 800 microseconds. This reflects the maximum amount of time the bus can be used for transferring isochronous in every millisecond period.
You may have thought that on a bus that supports 384 Mbps of isochronous (80% of 480 Mbps) bandwidth, a stream of 102.4 Mbps would use 26.7% of it’s bandwidth (102.4/384) – Yet the kernel output is indicating 34%. This difference is because we’re not considering overheads. We’ve described the bandwidth of the endpoint in terms of payload data (stuff we put in a packet) – yet we’re comparing that with time on the bus. The part we haven’t considered is the protocol overhead, signalling imposed bit stuffing, host delays etc. The USB specification provides a helpful formula for converting payload data to bus time – naturally it is a worse-case calculation:
Isochronous Transfer
= (38 * 8 * 2.083) + (2.083 * Floor(3.167 + BitStuffTime(Data_bc))) +
Host_Delay
The Linux kernel diligently uses this formula to determine how much bus time is needed (and thus bus bandwidth) for reservation. So in our case, our 102.4 Mbps (payload bandwidth) stream actually uses up to 130 Mbps of bus bandwidth.
This brings us back to our customer, we added up the bus bandwidth requirements of their devices but despite this being less than 384 Mbps we still had bandwidth errors. To understand why, we needed to understand how the EHCI host controller driver schedules isochronous packets. What we learnt is best illustrated with an example…
Let’s consider that we have 5 audio endpoints that each require 7 microseconds of bus time every 1 millisecond (e.g. every 8th microframe). When we start these individual streams the EHCI host driver must schedule (i.e. reserve bandwidth) for these transfers – it does this by looking ahead and allocating transfers to microframes via a ‘first-fit’ algorithm. As follows:
microframe 1: 7+7+7+7+7 = 35 / 100 microseconds
microframe 2:
microframe 3:
microframe 4:
microframe 5:
microframe 6:
microframe 7:
microframe 8:
microframe 9: 7+7+7+7+7 = 35 / 100 microseconds
microframe 10:
...
As shown above, the EHCI scheduler adds the transfers to its schedule. It tries to schedule transfers as soon as possible, so it’s put all the transfers, one at a time, into the first frame – and then repeats the schedule every 8 frames. In these frames you’ll see we’ve used up 35 microseconds of the 100 available.
Now let’s add the video streams. Our video streams each require 34 microseconds every microframe. Let’s add them one at a time, the first video stream:
microframe 1: 7+7+7+7+7+34 = 69 / 100 microseconds
microframe 2: 34 = 34 / 100 microseconds
microframe 3: 34 = 34 / 100 microseconds
microframe 4: 34 = 34 / 100 microseconds
microframe 5: 34 = 34 / 100 microseconds
microframe 6: 34 = 34 / 100 microseconds
microframe 7: 34 = 34 / 100 microseconds
microframe 8: 34 = 34 / 100 microseconds
microframe 9: 7+7+7+7+7+34 = 69 / 100 microseconds
microframe 10: 34 = 34 / 100 microseconds
...
Finally, let’s start the second video steram:
microframe 1: 7+7+7+7+7+34 = 69 / 100 microseconds <- no room to start here
microframe 2: 34+34 = 68 / 100 microseconds <- start 2nd video stream from here
microframe 3: 34+34 = 68 / 100 microseconds
microframe 4: 34+34 = 68 / 100 microseconds
microframe 5: 34+34 = 68 / 100 microseconds
microframe 6: 34+34 = 68 / 100 microseconds
microframe 7: 34+34 = 68 / 100 microseconds
microframe 8: 34+34 = 68 / 100 microseconds
microframe 9: 7+7+7+7+7+34 = 69 / 100 microseconds <- no room, -ENOSPC
microframe 10: 34 = 34 / 100 microseconds
...
This time there was no room in the first microframe: the existing 69 microseconds plus an additional 34 microseconds would exceed the budget of 100 microseconds. But that’s OK, as the scheduler will move onto the next microframe. As you can see it scheduled the first transfer of this stream at microframe 2, and then continued to scheduled packets for subsequent frames. Unfortunately we hit a problem in microframe 9 – there isn’t enough room. But we can’t skip this microframe and move onto the next because the endpoint requires that we schedule a packet in each microframe. In such a scenario the EHCI driver would report an -ENOSPC error and the stream would fail to start.
Let’s take a look at our bus bandwidth requirements – 5 audio streams requiring 7 microseconds every 8 microframes: thus 35 microseconds of bus time in each 800 microsecond window. And 2 video streams requiring 34 microseconds every microframe: thus 544 microseconds of bus time in each 800 micosecond window. So a total bus bandwidth requirement of 35+544 = 579 microseconds – yet our bandwidth budget is 800 microseconds. As you can see, due to scheduling constraints the instantaneous bandwidth (per microframe) can exceed the bandwidth limit whilst the average bandwidth (per 1ms frame) is within the limits.
Fortunately we can overcome the above issues in our scenario. Let’s try starting the streams in a different order – let’s start with the video:
microframe 1: 34+34 = 68 / 100 microseconds
microframe 2: 34+34 = 68 / 100 microseconds
microframe 3: 34+34 = 68 / 100 microseconds
microframe 4: 34+34 = 68 / 100 microseconds
microframe 5: 34+34 = 68 / 100 microseconds
microframe 6: 34+34 = 68 / 100 microseconds
microframe 7: 34+34 = 68 / 100 microseconds
microframe 8: 34+34 = 68 / 100 microseconds
microframe 9: 34+34 = 68 / 100 microseconds
microframe 10: 34+34 = 68 / 100 microseconds
...
No problem, now let’s start the audio streams:
microframe 1: 34+34+7+7+7+7 = 96 / 100 microseconds <- start 4 audio streams here
microframe 2: 34+34+7 = 75 / 100 microseconds <- start the 5th audio stream here
microframe 3: 34+34 = 68 / 100 microseconds
microframe 4: 34+34 = 68 / 100 microseconds
microframe 5: 34+34 = 68 / 100 microseconds
microframe 6: 34+34 = 68 / 100 microseconds
microframe 7: 34+34 = 68 / 100 microseconds
microframe 8: 34+34 = 68 / 100 microseconds
microframe 9: 34+34 = 68 / 100 microseconds
microframe 10: 34+34+7+7+7+7 = 96 / 100 microseconds
...
As you can see we were able to schedule 4 of the audio streams in the first microframe but then ran out of room, we then scheduled the remaining audio stream packet in the next microframe. This is acceptable because we’re still meeting the bInterval requirements – only we start the whole stream a microframe later. In other words, by starting the streams in a different order – the scheduler was able to spread out the transfers thus avoiding the per microframe bandwidth limit.
In this scenario we’ve experienced some of the limitations of the EHCI scheduler. The scheduler aims to schedule transfers as soon as possible, e.g. ‘first-fit’ – but if it chose to instead schedule packets into the emptiest microframe first, then we likely could have started our streams in any order. However, comments in the source code do suggest that if we have better ideas then we should “…write a smarter scheduler!”.
As you can see there is a lot more complexity that first meets the eye (and we’ve only covered high-speed devices in this blog post. We’ve looked at bus-bandwidth vs payload-bandwidth and the algorithm of the EHCI scheduler. However we’ve previously written about other bandwidth issues that can arise when supporting multiple UVC USB cameras – find out more below: