ARM Cortex-M Run-Time Library Analysis

by Tom Vajzovic

The ARM Run-Time ABI is a set of library functions which are used by compilers for ARM platforms to carry out basic operations which are not represented by a library function call in the source language, but which are sufficiently complicated to make it inefficient to translate them into machine instructions inline.

For example: calculating the length of a null-terminated string in the C programming language is carried out using the C standard library function strlen(). The compiler translates this API programming interface into the ABI binary layer with a call to the library function of the same name (found in libc). On the other hand, multiplying two 32-bit integers together is translated to simple machine instructions; no library function is needed because the target processor is able to carry out this operation directly. Multiplying two 64-bit numbers falls in the gap between these two more common cases. The programming syntax in the source language is exactly the same. No header file declares a function which is called, but neither is there a single machine instruction available to the compiler. Instead it has to translate the operation into a call to a function from the run-time library.

Historically compilers provided their own run-time libraries, with which all code they generated had to be linked. For GCC the run-time library is libgcc. On the ARM platform however, the names and specifications of these functions is standardized. Code from multiple compilers can be linked together and use the run-time library from any of them.

The run-time library functions tend to be very simple leaf or near-leaf functions (that is, they have no or few dependencies on other functions). Certainly no run-time library function may call a function from outside the same library. Also, almost all of them are pure functions (that is, they access no global data and their return value depends only on their arguments).

On these pages some of the run-time library functions from various compilers for ARM Cortex-M microcontrollers are disassembled and examined to determine their efficiency in terms of code size and speed. Where they are suboptimal, hand-written assembly code alternatives are provided.

Why Optimize?

It is widely recognized that premature optimization adds needless complexity to software projects. This increases both development time and the risk of programming mistakes going undetected. It is a common policy to only optimize once the overall design is complete, and then to only optimize specifically identified bottlenecks as required. However, when writing a library that is going to be re-used in many different projects, it is impossible to know in advance whether a particular function will become a hotspot.

Further, the risk of introducing bugs by optimizing the run-time library is mitigated in two ways. Firstly, because these functions are simple and short, a very high level of confidence can be gained from a relatively painless code review. Secondly because these are generally leaf or near-leaf pure functions, they are very easy candidates for extensive (or even exhaustive) unit-testing.

A final objection to optimization, and one which is very valid in most cases, is that it creates code which is complicated and time-consuming for future developers to work with. Because the run-time library should never be called directly, but only by compiler generated code, there is no increase in complexity of any other source module. Because the interface to the run-time library is defined by a fixed standard, it is unlikely that the library code itself will ever have to be re-worked either.

In conclusion, some of the most popular compilers for the ARM Cortex-M platform provide suboptimal run-time libraries, and optimizing these functions is a very good idea with few drawbacks.

Compiler Packages Tested

Both the ARM Compiler version 5 (which is proprietary) and the ARM Compiler version 6 (based on open-source Clang) use the same closed-source run-time library. This has two variants: standardlib is the default, and microlib is more optimized for size. Either variant of the library can be used with either version of the compiler. The library tested is from ARM Compiler 5.06 update 6 and ARM Compiler 6.9. It distributed as part of both ARM Development Studio DS-5 v5.28.1 and Keil MDK v5.25. In fact the functions tested have not changed since at least version 5.05 update 1 (from Keil MDK v5.13).

The GCC compiler is fully open source. The run-time library tested is part of libgcc versions 4.9.3, 5.4.1, 6.3.1 and 7.2.1, all packaged by ARM.

Function Groups

The following function groups have been examined so far:

64-bit integer shifting functions for Cortex-M0.

64-bit integer shifting functions for Cortex-M3 and Cortex-M4.

64-bit integer multiplication function for Cortex-M0.

64-bit integer multiplication function for Cortex-M3 and Cortex-M4.

Copyright

The text and presentation of this analysis is copyright 2018 Tom Vajzovic. You may not copy it except as permitted by law.

The ARM and GCC routines presented here are subject to separate copyright. Displaying them in this way is academic fair use and so I have not sought a licence from the copyright holders. You must not take them from here to use them for any other purpose. You shouldn't want to anyway, because they are suboptimal.

You may use my versions (which are better) according to the terms of the The Truly Free Licence (public domain).