Profiling

There are times, when you might want to find out what resources (time and RAM) a particular operation requires. Not because you are nosy, but because the resources of a microcontroller are limited, therefore, if you are out of luck, the desired operation might not even fit within the constraints of the chip. In order to locate the bottleneck, you will need to do a bit of profiling. Or perhaps, a lot. This is what we are going to discuss now.

Since you are not going to face serious difficulties when running micropython on a computer, profiling makes really sense only in the context of the microcontroller, so this might be a golden opportunity to brush up on how the firmware has to be compiled and uploaded. It is not by accident that we spent some time on this at the very beginning of this document.

Profiling in python

Measuring time

If you are interested in the execution time of a complete function, you can measure it simply by making use of the python interpreter

%%micropython

from utime import ticks_us, ticks_diff

def test_function(n):
    for i in range(n):
        q = i*i*i
    return q # return the last

now = ticks_us()
test_function(100)
then = ticks_diff(ticks_us(), now)

print("function test_function() took %d us to run"%then)
function test_function() took 27 us to run

In fact, since our function is flanked by two other statements, this construct easily lends itself to a decorator implementation, as in (taken from http://docs.micropython.org/en/v1.9.3/pyboard/reference/speed_python.html)

%%micropython

import utime

def timed_function(f, *args, **kwargs):
    myname = str(f).split(' ')[1]
    def new_func(*args, **kwargs):
        t = utime.ticks_us()
        result = f(*args, **kwargs)
        delta = utime.ticks_diff(utime.ticks_us(), t)
        print('Function {} time = {:6.3f}ms'.format(myname, delta/1000))
        return result
    return new_func

@timed_function
def test_function(n):
    for i in range(n):
        utime.sleep_ms(10)

test_function(10)
Function test_function time = 100.682ms

(If you need an even better estimate, you can get the ticks twice, and yank run_my_function() in the second pass: in this way, you would get the cost of measuring time itself:

from utime import ticks_us, ticks_diff

now = ticks_us
then = ticks_diff(ticks_us(), now)

print("the time measurement took %d us"%then)

Then you subtract the results of the second measurement from those of the first.)

The memory cost of a function

While time is money, RAM is gold. We shouldn’t pass up on that! The micropython has a very handy function for printing a pretty summary of the state of the RAM. You would call it like

%%micropython

import micropython
print(micropython.mem_info())
mem: total=2755, current=663, peak=2289
stack: 928 out of 80000
GC: total: 2072832, used: 704, free: 2072128
 No. of 1-blocks: 6, 2-blocks: 3, max blk sz: 6, max free sz: 64745
None

If you call mem_info() after you executed your function, but before calling the garbage collector (if that is enabled, that is), then from the two reports, you can figure out how many bytes the function has eaten.

Profiling in C

With the profiling method above, we can measure the cost of a complete function only, but we cannot say anything about individual instructions in the body. Execution time is definitely a significant issue, but even worse is the problem of RAM: it might happen that the function allocates a huge amount of memory, but cleans up properly before returning. Such a function could certainly wreak havoc, even if it is rather inocuous-looking from the outside. So, what do we do? We should probably just measure. It is not going to hurt.

In the example below (profiling.c), I discuss both time and RAM measurements in a single module, because splitting them wouldn’t be worth the trouble. The function, whose behaviour we inspect, does nothing, but calculate the length of a three-dimensional vector. With that, we can figure out, how much the assignment, and how much the actual calculation cost.

https://github.com/v923z/micropython-usermod/tree/master/snippets/profiling/profiling.c

#include <math.h>
#include <stdio.h>
#include "py/obj.h"
#include "py/runtime.h"
#include "mphalport.h"  // needed for mp_hal_ticks_cpu()
#include "py/builtin.h" // needed for mp_micropython_mem_info()

STATIC mp_obj_t measure_cpu(mp_obj_t _x, mp_obj_t _y, mp_obj_t _z) {
    size_t start, middle, end;
    start = m_get_current_bytes_allocated();

    float x = mp_obj_get_float(_x);
    float y = mp_obj_get_float(_y);
    float z = mp_obj_get_float(_z);
    middle = m_get_current_bytes_allocated();

    float hypo = sqrtf(x*x + y*y + z*z);
    end = m_get_current_bytes_allocated();
    mp_obj_t tuple[4];
    tuple[0] = MP_OBJ_NEW_SMALL_INT(start);
    tuple[1] = MP_OBJ_NEW_SMALL_INT(middle);
    tuple[2] = MP_OBJ_NEW_SMALL_INT(end);
    tuple[3] = mp_obj_new_float(hypo);
    return mp_obj_new_tuple(4, tuple);
}

STATIC MP_DEFINE_CONST_FUN_OBJ_3(measure_cpu_obj, measure_cpu);

STATIC const mp_rom_map_elem_t profiling_module_globals_table[] = {
    { MP_ROM_QSTR(MP_QSTR___name__), MP_ROM_QSTR(MP_QSTR_profiling) },
    { MP_ROM_QSTR(MP_QSTR_measure), MP_ROM_PTR(&measure_cpu_obj) },
};
STATIC MP_DEFINE_CONST_DICT(profiling_module_globals, profiling_module_globals_table);

const mp_obj_module_t profiling_user_cmodule = {
    .base = { &mp_type_module },
    .globals = (mp_obj_dict_t*)&profiling_module_globals,
};

MP_REGISTER_MODULE(MP_QSTR_profiling, profiling_user_cmodule, MODULE_PROFILING_ENABLED);

The above-mentioned mem_info() function of the micropython module can directly be called from C: after including the builtin.h header, we can issue mp_micropython_mem_info(0, NULL);, defined in modmicropython.c, which will print everything we need. Although its signature contains two arguments, a size_t and an mp_obj_t pointer to the arguments, the function does not seem to care about them, so we can pass 0, and NULL without any meaning.

The function mp_micropython_mem_info() doesn’t carry out any measurements in itself, it is only for pretty printing. The stats are collected by mp_micropython_mem_total(), mp_micropython_mem_current(), and mp_micropython_mem_peak(). Unfortunately, these functions are all declared STATIC, so we cannot call them from outsize modmicropython.c. If you need a numeric representation of the state of the RAM, you can make use of the m_get_total_bytes_allocated(void), m_get_current_bytes_allocated(void), and m_get_peak_bytes_allocated(void) functions of py/malloc.c. All three return a size_t.

With the help of these three functions, we could, e.g., return the size of the consumed memory to the micropython interpreter at the end of our calculations. This is what we do, when collecting the bits an pieces, and returning the 4-tuple at the end of the measure_cpu function.

https://github.com/v923z/micropython-usermod/tree/master/snippets/profiling/micropython.mk

USERMODULES_DIR := $(USERMOD_DIR)

# Add all C files to SRC_USERMOD.
SRC_USERMOD += $(USERMODULES_DIR)/profiling.c

# We can add our module folder to include paths if needed
# This is not actually needed in this example.
CFLAGS_USERMOD += -I$(USERMODULES_DIR)
!make clean
!make USER_C_MODULES=../../../usermod/snippets CFLAGS_EXTRA=-DMODULE_PROFILING_ENABLED=1 all
%%micropython

import profiling
print(profiling.measure(123, 233, 344))
(672, 672, 672, 433.305908203125)