Jetson Nano and PCI Screamer PCBs

PCIe DMA Attack against a secured Jetson Nano (CVE-2022-21819)

Congratulations! Your Jetson Nano (T210) product is finished and ready to ship worldwide. Secure boot is enabled, Linux and it’s bootloaders are locked down and the file system holding your precious IP is encrypted. Even JTAG is disabled with an OTP (One-Time-Programmable) security fuse. Yet, lurking in the Jetson Nano’s Linux kernel is a PCIe IOMMU vulnerability allowing an attacker to circumvent all that security. An off-the-shelf $500 tool, the PCI Screamer, and open source software, PCI Leech, gives total RAM and kernel access at over 20MB/s. Yes – faster than the expensive JTAG probes you just locked out. How is this possible?

This blog post aims to answer that question and show how we investigated and fixed the problem. To begin, we need to look at the architecture and peripheral interconnect of a modern SoC (System-On-Chip).

Background

Central to every SoC is a bus fabric connecting the CPU(s), RAM and peripherals together. For the Jetson Nano this is ARM’s AXI (Advanced eXtensible Interface) with lower performance AHB buses attached for low-speed peripherals. To enhance performance, work is off-loaded from the CPU to various accelerators. The most fundamental accelerator is the DMA (Direct Memory Access) controller, a co-processor dedicated to moving data between peripherals and memory. The CPU programs the DMA controller to perform a transfer and notify it when completed. Dedicated flow control signals from the bus and peripherals allow the DMA controller to operate autonomously. The AXI bus lets multiple ‘bus-masters’ perform transfers simultaneously and independently, including the CPU(s), DMA and GPU.

DMA is a key element of all high performance systems, but there is a security catch. The DMA controller has full access to all RAM and peripherals in the system. If incorrectly programmed it can write to application or kernel process memory, causing an immediate segfault or mysterious untraceable bugs.  It can also read from all RAM, including the kernel areas holding passwords, encryption keys and your super-secret application IP. This is the same situation as a CPU without an MMU (Memory Management Unit) where software tasks can trample over each other.

A single central DMA controller can be a bottleneck in high-performance systems, so the highest bandwidth peripherals are usually equipped with their own independent bus-mastering DMA controllers. Each has full access to the system bus. This gets interesting when the peripheral itself is an expansion bus controller such as Firewire, Thunderbolt or PCIe. Devices attached to these expansion buses can themselves operate as DMA bus masters and initiate transfer requests to their bus (e.g. PCIe). If the addresses are right then the read/write requests pass through to the host bus (e.g. AXI). This is obviously great if your NVMe drive can independently be asked to DMA data into host memory, but is terrible if a rogue PCIe device were to read your kernel memory…

Obviously bus masters need to be limited to only the address ranges required to to their job e.g. an NVMe drive only actually needs to access the buffer it’s supposed to be reading / writing, not all of RAM! Enter the IOMMU (Input Output Memory Management Unit) to provide a private virtual address space for each bus-master, with a translation table mapping virtual addresses to physical bus addresses. A properly configured IOMMU will prevent memory corruption caused by bad drivers and also prevent an attacker using DMA. Except the IOMMU isn’t always enabled correctly.

Note: the IOMMU only maps accesses originating from the peripheral (e.g. PCI bus). It does not map requests in the other direction from the host AXI bus to the peripheral. In PCI the BARs (Base Address Registers) perform the mapping and protection of accesses from the AXI bus to the PCI bus’s address space. For T210 the BARs live in the AFI bridge between AXI and Root Complexes.

DMA attacks have been well described on x86 platforms, and comprehensive hardware and software tools exist to exploit the holes. The PCI Leech software is a great example – see this presentation by the author Ulf Frisk. Dedicated PCIe hardware hacking tools are also available off-the-shelf, such as the PCI Screamer devices from Lambda Concepts – an M.2 PCIe card with FPGA and USB bridge allowing DMA attacks to be performed. These tools are used by PC gamers seeking to gain advantage in online tournaments by modifying game parameters in real-time.

Test for PCIe vulnerability

This brings us to NVIDIA’s Jetson Nano SoM (System on Module) fitted with the Tegra T210. We recently implemented a Yocto BSP with secure file system encryption for a customer, alongside a security review and hardening. Our review highlighted the potential for a PCIe DMA attack. Ideally we’d simply disable PCIe, but our customer requires PCIe for bulk data storage to an M.2 NVMe SSD. We could not find examples of PCIe attacks against ARM platforms, so we investigated to see if the theoretical attack was possible in practice.

In embedded systems it’s often helpful to investigations by checking how the hardware is actually configured, and then compare against what the OS thinks. We begin by looking at the IOMMU registers, which show that the IOMMU (also called the SMMU) is enabled. However the PCIe controller (called AFI in the T210) has a value of 0x00000000 meaning it is not assigned to an ASID (Address Space ID) and is unmapped.

0x70019010 = 0xFFFFFFFF  MC_SMMU_CONFIG_0   = IOMMU Enabled
0x70019238 = 0x00000000  MC_SMMU_AFI_ASID_0 = Translation disabled!
0x70019228 = 0xFFFFFFFF  MC_SMMU_TRANSLATION_ENABLE_0_0 = All mapped
0x7001922C = 0xFFFFFFFF  MC_SMMU_TRANSLATION_ENABLE_1_0 = All mapped
0x70019230 = 0xFFFFFFFF  MC_SMMU_TRANSLATION_ENABLE_2_0 = All mapped
0x70019234 = 0xFFFFFFFF  MC_SMMU_TRANSLATION_ENABLE_3_0 = All mapped

If a peripheral is not mapped by the Tegra IOMMU then all requests are routed through to the AXI bus with a 1:1 mapping. So, from the hardware configuration, we’d expect a PCIe device should have full DMA access to the system RAM.

Next we check the Linux kernel’s view of IOMMU configuration. DebugFS shows us all the assigned ASIDs:

$ ls /sys/kernel/debug/70019000.iommu/
     
    as000    as002    as004    as006    as008    as010    masters
    as001    as003    as005    as007    as009    as011

DebugFS also shows the bus-masters mapped by the IOMMU that use these ASIDs. Note that there are more bus-masters than ASIDs because one ASID can be shared by multiple masters using the same address range:

$ ls /sys/kernel/debug/70019000.iommu/masters/

    50000000.host1x                  7000d100.i2c
    54080000.vi                      7000d400.spi
    54100000.tsecb                   7000d600.spi
    54340000.vic                     70012000.se
    54380000.nvjpg                   70090000.xusb
    54480000.nvdec                   700d0000.xudc
    544c0000.nvenc                   702ef000.adsp
    54500000.tsec                    aconnect@702c0000:adsp_audio
    54600000.isp                     flush_all_threshold_map_pages
    54680000.isp                     flush_all_threshold_unmap_pages
    546c0000.i2c                     mc
    57000000.gpu                     sdhci-tegra.2
    70006000.serial                  sdhci-tegra.3
    70006040.serial                  serial8250
    70006200.serial                  smmu_test
    7000c000.i2c                     snd-soc-dummy
    7000c400.i2c                     tegra-carveouts
    7000c500.i2c                     tegradc.0
    7000c700.i2c                     tegradc.1
    7000d000.i2c

AFI is not listed, which confirms what we saw from the IOMMU registers; that the PCIe controller is not mapped by the IOMMU.

Before we can use the PCI Screamer to attack the PCIe we need to know which addresses to dump. The Nano SOM (System on Module) has 4GB of RAM located as 2x 2GB banks at these physical addresses:

1st 2GB of RAM   base=0x0000000080000000 size=0x80000000 (2GB)
2nd 2GB of RAM   base=0x0000000100000000 size=0x80000000 (2GB)

Tegra Linux reports it is using RAM in 3 regions:

$ cat /sys/kernel/debug/memblock/memory

    0: 0x0000000080000000..0x00000000afffffff
    1: 0x00000000b0200000..0x00000000fedfffff
    2: 0x0000000100000000..0x000000017f1fffff

The gaps are because the Tegra Memory Controller also supports independent Carve-Outs and Protection Regions that operate independently of the IOMMU. They protect critical systems such as the TSEC Security Processor.

iram              base=0x0000000040001000 size=0x0003f000
ramoops           base=0x00000000b0000000 size=0x00200000
VPR               base=0x00000000d7000000 size=0x19000000
Nck Carveout      base=0x00000000ff080000 Size=0x00200000
RamDump Carveout  base=0x00000000ff280000 Size=0x00080000
GSC5 Carveout     base=0x00000000ff300000 Size=0x00100000
GSC4 Carveout     base=0x00000000ff400000 Size=0x00100000
GSC2 Carveout     base=0x00000000ff500000 Size=0x00100000
GSC1 Carveout     base=0x00000000ff600000 Size=0x00100000
BpmpFw Carveout   base=0x00000000ff700000 Size=0x00080000
Lp0 Carveout      base=0x00000000ff780000 Size=0x00001000
SecureOs Carveout base=0x00000000ff800000 Size=0x00800000
GSC3 Carveout     base=0x000000017f300000 Size=0x00d00000

Now we have all the information we need, we can fit the PCI Screamer into the M.2 socket and dump memory using PCI Leech:

# sudo pcileech dump -min  0x80000000 -max  0xd7000000 -device fpga
$ sudo pcileech dump -min  0xf0000000 -max  0xff280000 -device fpga
$ sudo pcileech dump -min 0x100000000 -max 0x17f300000 -device fpga

The PCI Leech software reads all the regions perfectly at 20MB/s:

[+] using FTDI device: 0403:601f (bus 1, device 25)
 [+] FTDIFTDI SuperSpeed-FIFO Bridge000000000001
  Current Action: Dumping Memory                             
  Access Mode:    Normal                                                       
  Progress:       848 / 2034 (42%)           
  Speed:          20 MB/s                        
  Address:        0x00000000B5000000                      
  Pages read:     217088 / 520832 (41%)           
  Pages failed:   0 (0%)

We end up with 3 memory dumps from the running system. A quick strings dump shows DMESG logs, kernel symbols, application symbols and even the bash prompt buffer and history. On the Jetson Nano command line we typed a easily recognisable command:

root@jetson-nano:~$ echo "This is typed on the root bash prompt"
This is the root prompt

After dumping the memory we can easily see the command string, as well as every version of it on the way from bash to the serial port:

...
6a4bc750 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......| 6a4bc760 6c 6f 67 6f 75 74 00 00 88 ec d2 78 7f 00 00 00 |logout.....x....| 6a4bc770 e4 60 6c 6c 55 00 00 00 21 00 00 00 00 00 00 00 |.`llU...!.......| 6a4bc780 35 30 00 78 7f 00 00 00 88 ea d2 78 7f 00 00 00 |50.x.......x....| 6a4bc790 b0 17 70 75 55 00 00 00 41 00 00 00 00 00 00 00 |..puU...A.......| 6a4bc7a0 65 63 68 6f 20 22 54 68 69 73 20 69 73 20 74 79 |echo "This is ty| 6a4bc7b0 70 65 64 20 6f 6e 20 74 68 65 20 72 6f 6f 74 20 |ped on the root | 6a4bc7c0 62 61 73 68 20 70 72 6f 6d 70 74 22 00 00 00 00 |bash prompt"....| 6a4bc7d0 f0 ac 6c 6c 55 00 00 00 41 00 00 00 00 00 00 00 |..llU...A.......|

We have full read/write memory access (except secure and carveout areas) despite locking down Linux and blowing the security fuses to disable JTAG.
Oh dear!
Now to see why this has happened and what we can do about it.

Test if IOMMU can protect from a PCIe attack

We need to prove the IOMMU is capable of operating correctly, and can deny PCI Leech access, before we figure out how to make Linux use it. To do this we manually configure the IOMMU to protect the AFI (PCIe controller).

  • Construct our own page table for the PCIe SWGROUP of the IOMMU
  • Use a free ASID of 0xf
  • Set a single large region (4MB) PDE (Page Directory Entry) for our test region
  • Build a PDE in an unused area of RAM between 0x17f200000 and 0x17f300000
$ devmem2 0x17f200000 w 0xe0080000    # PDE entry VA 0x0 to PA 0x800000000
$ devmem2  0x7001901c w 0x0000000f    # Setting base for ASID 0xf
$ devmem2  0x70019020 w 0xe017f200    # Set address of the PDE base
$ devmem2  0x70019238 w 0x8000000f    # Enable SSMU for AFI using ASID 0xf
$ devmem2  0x70019030 w 0x0           # Clear cache
$ devmem2  0x70019034 w 0x0           # Set address of the PDE base

This changes the PCI Leech behaviour as expected. Reading from the PCIe Virtual Address (VA) 0x00000000 returns the values at physical address 0x80000000. However reading from an unmapped region generates IOMMU errors from the target kernel. Yay, we’ve proven the IOMMU can protect from PCIe! So why isn’t it being used by Linux?

Discover why Linux doesn’t use the IOMMU for PCIe

We start with the Linux device tree. The base device tree contains entries enabling the IOMMU mapping for AFI.

/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
 pcie@1003000 {             /* items removed for clarity */

             iommus = <&smmu TEGRA_SWGROUP_AFI>;
             iommu-map = <0x0 &smmu TEGRA_SWGROUP_AFI 0x1000>;             iommu-map-mask = <0x0>;     bus-range = <0x00 0xff>;

             /* items removed for clarity*/
}

These entries look identical to those for other IOMMU mapped peripherals, so they ought to work. We need to confirm these settings make it into the final device tree, using the dtc tool to dump out the device tree from the running Jetson Nano system.

$ dtc -q -I fs /sys/firmware/devicetree/base > /tmp/dev-tree-local.txt

The dumped device tree shows the above 3 lines have gone missing. The very ones that enable the PCIe IOMMU! How has this happened?

NVIDIA’s device trees are constructed from a base device tree that includes many other sub-device tree files, each of which can add or remove items from the final device tree. A search through the kernel source revealed:

/nvidia/platform/t210/porg/kernel-dts/porg-platforms/tegra210-porg-pcie.dtsi

pcie@1003000 {
             /delete-property/ iommus;
             /delete-property/ iommu-map;
             /delete-property/ iommu-map-mask;
             /* items removed for clarity*/
}

Ah ha, these lines remove the very entries in the base device tree that enable the IOMMU for the PCIe.

Why has NVIDIA done this? It’s in the current Jetpack and has been in the OE4T repo for years. Also it only affects the Jetson Nano (T210) and not other platforms. Forum postings suggest that NVIDIA customers have experienced problems with third-party Linux drivers that don’t support IOMMU properly for PCIe devices – especially WiFi cards. NVIDIA’s advice has been to fix the failing driver, or temporarily disable the IOMMU. We speculate that at some point a ‘turn off the IOMMU’ patch must have made its way into the Tegra Linux tree, where it stayed.

Fix the problem

We now rebuild the kernel with the /delete-property/ entries removed and re-test. This time when we check the IOMMU masters listed in DebugFS we see three new entries:

$ ls /sys/kernel/debug/70019000.iommu/masters/

    0000:00:02.0
    0000:01:00.0
    1003000.pcie
    # other entries removed for clarity

These are for the PCIe controller itself, and both the PCIe attached devices (PCI Screamer and a Realtek network controller). This is looking promising. Now we need to test with the PCI Screamer. Do we still have access?

PCI Leech still runs and reports that it can read RAM. It dumps just as fast as before. Yet now the memory dumps are all 0xff because the IOMMU returns 0xff to peripherals reading from unmapped areas.

...
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
...

The Jetson Nano Linux kernel also reports memory protection faults for the unmapped regions:

smmu_dump_pagetable(): fault_address=0x0000000080000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
mc-err:   status = 0x6000000e; addr = 0x80000000
mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s

Brilliant, the Problem is solved, and it’s an interesting journey to arrive at the solution.

We fixed the issue for our customer and verified that PCIe works correctly for Ethernet and NVMe drivers. Then we reported the issue to NVIDIA under responsible disclosure, and waited for a Linux Driver Package (L4T) update to be issued before publishing this blog post.

Please get in touch if you’d require assistance with your NVIDIA or Linux project. We’re here to help. For customers using the Jetson Nano, we recommend to review the Security Bulletin and update to the L4T 32.7.1 or later.

Disclosure timeline

  • 17th Dec 2021 – Problem identified and reported to our customer
  • 20th Dec 2021 – We report the issue to NVIDIA
  • 20th Dec 2021 – NVIDIA acknowledge the issue
  • 23rd Dec 2021 – We release a fix to our customer
  • 9th Feb 2022 – NVIDIA inform us the issue will be fixed in a Jetson Nano security update
  • 10th Feb 2022 – NVIDIA confirm fix is being tested and will be in L4T 32.7.1 update for Nano.
  • 8th Mar 2022 – NVIDIA inform us that the issue is assigned CVE‑2022‑21819 with a CVSS v3.1 score of 7.6.
  • 8th Mar 2022 – NVIDIA release L4T 32.7.1 and make available a Security Bulletin describing the vulnerability.
  • 10th Mar 2022 – We publish this blog post

You may also like...

Popular Posts