Baking Android for ARM Morello without Morello

Access to ARM Morello boards is fairly limited at the moment, but we can still explore the new architecture with the help of a Fixed Virtual Platform (FVP) and software stack packages that are available from ARM. In a previous blog post we provided an introduction to Arm Morello and CHERI. In this post we’re going to boot Morello (in simulation) and see what we can learn. Let’s first download and install the Fixed Virtual Platform for the Morello architecture, as follows (for a list of installation options run the FVP bash script with the –help argument):

$ wget https://developer.arm.com/-/media/Arm%20Developer%20Community/Downloads/OSS/FVP/Morello%20Platform/FVP_Morello_0.11_33.tgz
$ tar -xzf FVP_Morello_0.11_33.tgz
$ cd FVP_Morello_0.11_33
$ ./FVP_Morello.sh –destination ${YOUR FVP INSTALL DIR}

Now we can run the Morello model in simulation using:

$ ./${YOUR FVP INSTALL DIR}/models/Linux64_GCC-6.4/FVP_Morello

Which should result with a pop up similar to this one:

Now that we have a working simulator let’s close the window and download the Morello software stack from ARM, we follow these instructions, though there is a video from Linaro that can walk you through the process. The setup steps consist of downloading required ubuntu packages and checking out a github repo, we found that using a docker file for all of the installation steps is the most convenient way about it.

It is worth pointing out that the software stack for Android and Busybox file systems both use a shimming layer in user space that will wrap all system calls. This is done via a library which is integrated into the system libc and acts as glue code between the non capability aware A64 ABI that runs in the Android kernel and the C64 ABI that runs in the user space. The shimming library’s limitations can be found here. The kernel-user ABI can be compiled in pure capability mode but that feature is still experimental, the transitional ABI will for example not check calls to mmap() against any capability. The Morello kernel tree out of the box provides only core architecture functionality: handling registers, faults, access to capabilities in memory etc. A pure capability kernel is still a work in progress – though there are some public signs of life here.

Next, from the workspace directory context we issue the following commands:

$./build-scripts/build-all.sh -p fvp -f android-nano
$ cd /android
$ source build/envsetup.sh
$ lunch morello_nano-eng
$ m bionic-unit-tests
$ cd ..

Which will build the images and the bionic unit tests provided with the software stack. We are now ready to run the Android images on the FVP using a convenient helper script that is provided with the stack:

$ ./run-scripts/run_model.sh -m ${YOUR FVP INSTALL DIR}/models/Linux64_GCC-6.4/FVP_Morello -f android-nano

We should see more pop up windows on top of the FVP with the serial output from Android:

At this point we can run the Android Debug Brige (adb) to connect to the virtual device and send files across, from a new console window within the Morello workspace context:

$ cd /android
$ source build/envsetup.sh
$ lunch morello_nano-eng  
$ adb shell

In a yet another new console window we will push the unit tests:

$ cd /android
$ for BUILD_SUFFIX in 64 c64; do
  adb push out/target/product/morello/data/nativetest${BUILD_SUFFIX}/bionic-unit-tests/bionic-unit-tests  \
    /data/nativetest${BUILD_SUFFIX}/bionic-unit-tests/bionic-unit-tests
  adb push out/target/product/morello/data/nativetest${BUILD_SUFFIX}/bionic-loader-test-libs \
    /data/nativetest${BUILD_SUFFIX}/
$ done

Then run the tests from the adb shell on the device:

$ for BUILD_SUFFIX in 64 c64; do
  /data/nativetest${BUILD_SUFFIX}/bionic-unit-tests/bionic-unit-tests
$ done

We should then see the following output confirming that the tests are running on the device, this might take some time.

Let’s now push a custom simple app to the device using adb in the same way and start exploring the new user space restrictions. To add our app to the build tree we need to create a blueprint file, we follow the steps described here.

After running clang –help we can notice that the capability-enabled toolchain comes with new CHERI-options that can be passed to the compiler which allow fine-tune and control the way the capabilities will be handled: tightening of the bounds/comparison/conversion to integers etc. We can compile the code for the old integral based or the new ABI using aapcs/purecap in the -mabi option and we have new error types that can catch the now illegal gotchas in your code, as shown below:

Let’s see what happens when we compile and run the following code:

size_t sizeInt      = sizeof(int64_t);
size_t sizePointer  = sizeof(void*);

printf("The size of int64 is %lu-bit and pointers are %lu-bit \r\n", sizeInt*8, sizePointer*8);

The output is as follows:

The size of int64 is 64-bit and pointers are 128-bit

As expected, the new enhanced pointer type will now span 128-bits in the visible memory.

We can use new helper functions that live in <archcap.h> to probe the capabilities:

static void printPointer(void *pointer) 
{
    size_t addr   = cheri_address_get(pointer);
    size_t base   = cheri_base_get(pointer);
    size_t offset = cheri_offset_get(pointer);
    size_t len    = cheri_length_get(pointer);
    size_t perm   = cheri_perms_get(pointer);
    bool   tag    = cheri_tag_get(pointer);

    printf("addr %lX base %lX offset %lX length %lu 
            permissions %lX tag %d \r\n", addr, base, offset, len, perm, tag);
}

So now we can inspect the following and common C pointer operations on an array:

    uint8_t   array[21] = {42};    
    uint8_t * pArray10  = &array[10];
    uint8_t * pArray15  = pArray10 + 5;

    printPointer(array);
    printPointer(pArray10);
    printPointer(pArray15);

The output from our printPointer function is as follows:

addr FFFFD6DE8C20 base FFFFD6DE8C20 offset 0 length 21 permissions 37041 tag 1
addr FFFFD6DE8C2A base FFFFD6DE8C20 offset A length 21 permissions 37041 tag 1
addr FFFFD6DE8C2F base FFFFD6DE8C20 offset F length 21 permissions 37041 tag 1

We can also see that the offset and base are indeed correct for the sub capabilities derived from their parent and that their capability-metada is inferred from the original pointer. Note that the capabilities themselves are 128-bit wide/aligned but their values/offset fields are subject to the same old pointer arithmetic we know and love. The tag bit is valid, as these are all valid pointers that were created explicitly. Now, referring the ARM Morello architecture document we can interpret the permission field (0x37041) in binary as:

1101 1100 0001 0000 0 1

Among other things it can be seen that the Load/Store is 1, execute is 0 and that the pointer is not locked, these are clearly data pointers. Please note that we could have used the provided API to get these fields atomically with the provided interface.

Now, let’s try to access an out of bound global value and catch the exception using a signal:

    int someValue  = 0x8000;
    int *pAddress = &someValue;

    printPointer(pAddress);
    pAddress++;
    printPointer(pAddress);

    printf("Some value is %d \r\n", *pAddress);

This snippet will result in segmentation fault:

addr 2BAE18 base 2BAE18 offset 0 length 4 permissions 37041 tag 1
addr 2BAE1C base 2BAE18 offset 4 length 4 permissions 37041 tag 1
Signal [11] caught segfault at address 0x2BAE1C with code 12

We have new codes for interpreting the cause of the associated segfault:

#define SEGV_CAPTAGERR    10
#define SEGV_CAPSEALEDERR 11
#define SEGV_CAPBOUNDSERR 12
#define SEGV_CAPPERMERR   13
#define SEGV_CAPACCESSERR 14

As can be seen 12 stands for an out of bounds dereference error, which is exactly what we have done. We could use the API to obtain new pointers from the parent pointer and have fine-grained control over its address and bounds:

    uint8_t array[21] = {42};

    uint8_t *pArray10 = &array[10];
    uint8_t *pArray10Bounded = cheri_bounds_set(pArray10, 4);

    printPointer(array);
    printPointer(pArray10);
    printPointer(pArray10Bounded);

    pArray10Bounded--;

The result is:

addr FFFFF421BF28 base FFFFF421BF28 offset 0 length 21 permissions 37041 tag 1 
addr FFFFF421BF32 base FFFFF421BF28 offset A length 21 permissions 37041 tag 1
addr FFFFF421BF32 base FFFFF421BF32 offset 0 length 4  permissions 37041 tag 1
Signal [11] caught a segfault at address 0xfffff421bf32 with code 12

The address and base of the new capability was set to the lower bound of the parent and the the length is 4, so we have constrained the accessible memory area of the “child” pointer. Now lets see if we can increase the bounds obtained from that child pointer in a grandchild pointer using the same API call:

addr FFFFF421BF32 base FFFFF421BF32 offset 0 length 5 permissions 37041 tag 0
Signal [11] caught segfault at address FFFFF421BF32 with code 10

The API is happy to do this but the tag is now set to 0, this is due to the fact that bounds cannot be increased as per specification, so when trying to access the data we get a segfault again. This time we get code 10 which is reserved for handling capabilities with invalid tags, as such pointers cannot be dereferenced.

Lets have a look at function pointers.

    typedef void (*functionPointer)(void);

    functionPointer fp = &functionPointersTest; 

    printPointer(fp);

We get:

addr 21DDD1 base 200300 offset 1DAD1 length 760832 permissions 2C243 tag 1

Starting with permissions (0x2c243) :

1011 0000 1001 0000 1 1

Store is disabled, load to register and execution are enabled; it has access to system registers and instructions. The base/offset are into the executable text segment in the system memory taken by our user-space application. Lets try to derive a new pointer which would set the storing permission to 1 using the API and try to execute this:

    archcap_perms_t perm = archcap_c_perms_get(fp) | ARCHCAP_PERM_STORE;

    functionPointer fpE = archcap_c_perms_set(fp, perm);

    printPointer(fpE);

    fpE();

We will get:

addr 21DDD1 base 200300 offset 1DAD1 length 760832 permissions 2C243 tag 0
Signal [11] caught segfault at address 0x21E00 with code 10

The tag has been invalidated and the pointer cannot be dereferenced as again we tried a non-monotonic capability manipulation which is forbidden in this architecture. Finally lets see what happens if we were to try to cast a data pointer into a function pointer and vice-versa:

    int *pData = (int *)fp;
    printPointer(pData);

    int *pAddress = &someValue;
    functionPointer fpAddr = (functionPointer)pAddress;
    printPointer(pAddress);

This will result in:

addr 21E3A1 base 200300 offset 1E0A1 length 762880 permissions 2C243 tag 1
addr 2BB700 base 2BB700 offset 0     length 4      permissions 37041 tag 1

So we can see that despite our naive attempts the capability permissions of these pointers have not changed. This does not mean that using capability enabled toolchain will help to completely save the application from bad code, quite the contrary as everything the language allows for is still a valid operation, as shown above.

We can use capabilities to have one pointer having write permissions and other pointers having the read permission, in the spirit of a single writer/multiple reader scheme. It is neat that we can have this intended design at the architecture level:

    uint8_t array[21] = {42};

    uint8_t *pWriter = archcap_c_perms_set(&array[0], 0x10041);
    uint8_t *pReader = archcap_c_perms_set(&array[0], 0x20041);

    printPointer(pWriter);
    printPointer(pReader);

    pWriter[0] = 1;

    uint8_t value = pReader[0];

    printf("Value is %u \r\n", value);

    pReader[0] = 1;

Which results in:

addr FFFFC6C06C48 base FFFFC6C06C48 offset 0 length 21 permissions 10041 tag 1
addr FFFFC6C06C48 base FFFFC6C06C48 offset 0 length 21 permissions 20041 tag 1
Value is 1 
Signal [11] caught a segfault at address 0xffffc6c06c48 with code 13

Code 13 stands for error caused by an invalid permission, we have tried to write a value to the array via our reader capability and thus effectively tried to yet again derefernce the pointer against its usage defined in its metadata. We have barely scratched the surface and as mentioned before the whole stack is not yet fully capability aware, but hopefully the above basic examples give you some idea of what can be possible. The stack comes with more advanced demo – look for the comparment-demo app.

The Morello architecture is a good candidate platform for secure applications that might require fine grained memory control. It is up to the programmer to design their application in a way that will make the most of all new referential and spacial security benefits. In some cases it may be simple as recompiling the current source code with the correct flags, in others some extra work might be involved, some examples can be found here. We will be looking out for a pure capability enabled Linux kernel and plan to come back to this subject, it would also be very interesting to see integration of C64 for languages that have memory security at their core like Rust. Happy tinkering.