Avoiding Race Conditions in ChibiOS/RT: a guide to Mutex

Published onApril 19, 2023 Updated onFebruary 24, 2024 Posted by Rocco Marco Guglielmi CategoryChibiOS - Further readings

In this article, we will explain what race conditions are and provide examples to help you understand their impact on multi-threading applications.

We will also introduce a powerful tool to prevent race conditions: the mutex. We will explain what a mutex is, how it works, and provide examples of how to use it in ChibiOS. By the end of this article, you will have a better understanding of how to avoid race conditions and write more reliable ChibiOS code.

Race condition

A race condition is a phenomenon that can occur in concurrent or multi-threaded programs when two or more threads access shared resources or execute critical sections of code in an unpredictable order, leading to unexpected behavior and incorrect results. In a race condition, the outcome of the program depends on the exact timing and interleaving of the threads, which can vary between different runs of the same program. Race conditions can cause various problems, including data corruption, deadlocks, and inconsistent behavior, and they can be difficult to diagnose and fix.

To understand the problem better let us look at the following code where two threads are juggling a shared variable. One thread is incrementing the variable while the other is decreasing it.

static int32_t shared_variable = 0;

static THD_FUNCTION(Thread1, arg) {
  ...
  while (true) {
    shared_variable++;
    chThdSleepMilliseconds(1);
  }
}

static THD_FUNCTION(Thread2, arg) {
  ...
  while (true) {
    shared_variable--;
    chThdSleepMilliseconds(1);
  }
}

The threads in the code are designed to run at the same time interval, with the expectation that the shared_variable will remain at 0 and only briefly be 1 or -1 depending on which thread runs first. However, there is a problem with the shared_variable++ and shared_variable-- operations. These operations are not atomic, meaning they are not performed as a single, indivisible step and can be interrupted by other operations.

For instance, the operation shared_variable++ entails reading the current value of shared_variable from memory, incrementing this value, and then writing the updated value back to memory. Considering an ARM Cortex architecture, the assembly translation of this C instruction would appear as follows:

LDR R0, =shared_variable ; Load the address of shared_variable into R0
LDR R1, [R0]             ; Load the value of shared_variable into R1
ADD R1, R1, #1           ; Increment the value in R1
STR R1, [R0]             ; Store the value back into shared_variable

On the other hand, the shared_variable-- instruction in this context would be represented as:

LDR R0, =shared_variable ; Load the address of shared_variable into R0
LDR R1, [R0]             ; Load the value of shared_variable into R1
SUB R1, R1, #1           ; Decrement the value in R1
STR R1, [R0]             ; Store the value back into shared_variable

In a fully pre-emptive operating system like ChibiOS, a thread can be interrupted at any time by another thread with a higher priority. For example, Thread1 can be preempted in the middle of the operation shared_variable++ after that the current value of the variable is loaded in a CPU register.

Consider a scenario illustrating a race condition with assembly code. Initially, shared_variable is 0. Thread 1 begins by loading and incrementing shared_variable. However, before Thread 1 can complete its operation, Thread 2 preempts. Thread 2 performs its entire cycle, decrementing shared_variable, which results in a value of -1. When Thread 1 resumes, it continues with the outdated value it loaded earlier (which was 0), incrementing it to 1. This outcome illustrates a race condition where shared_variable ends up being 0 instead of the expected 1, due to concurrent access and modification by both threads without synchronization.

; Initial state: shared_variable is 0. Thread 1 starts incrementing.
LDR R0, =shared_variable ; Load the address of shared_variable into R0
LDR R1, [R0]             ; Load the value of shared_variable into R1

; Pre-emption occurs here. Thread 2 runs its complete cycle.
LDR R0, =shared_variable ; Load the address of shared_variable into R0 (again for Thread 2)
LDR R1, [R0]             ; Load the value of shared_variable into R1 (for Thread 2)
SUB R1, R1, #1           ; Decrement the value in R1 (Thread 2 operation)
STR R1, [R0]             ; Store the decremented value back into shared_variable
; shared_variable is now -1 after Thread 2's operation.

; Thread 1 resumes, using the value of shared_variable before preemption.
ADD R1, R1, #1           ; Increment the value in R1 (Thread 1 resumes)
STR R1, [R0]             ; Store the incremented value back into shared_variable
; shared_variable is now 0, but it was expected to be 1!

This is a race condition and its result can be unpredictable because it depends on the order of execution of the threads accessing the same variable.

Mutual exclusion

The solution to the issue of race conditions is to prevent threads from interrupting each other during operations on shared resources. Specifically, we want to ensure that two threads sharing a resource cannot operate on it simultaneously, which is where mutual exclusion comes in. The idea behind mutual exclusion is to create a zone in which only one thread can access the resource at any given time.

The mechanism for enforcing mutual exclusion is called a mutex. In ChibiOS/RT, mutexes are implemented as a modular library that can be disabled at compile time. To use mutexes in your code, you need to ensure that they are enabled in chconf.h.

/**
 * @brief   Mutexes APIs.
 * @details If enabled then the mutexes APIs are included in the kernel.
 *
 * @note    The default is @p TRUE.
 */
#if !defined(CH_CFG_USE_MUTEXES)
#define CH_CFG_USE_MUTEXES                  TRUE
#endif

While the ChibiOS/RT mutex API includes several functions, in practice, you can achieve mutual exclusion in most cases using just three main functions: chMtxObjectInit, chMtxLock, and chMtxUnlock.

All the functions operate on a variable of type mutex_t that represents a mutex. As these APIs need to modify the mutex they receive a pointer to it.

chMtxObjectInit

The chMtxObjectInit function is used to initialize a mutex object, which allocates any necessary resources and sets the initial state of the mutex. This initialization needs to be called before using the mutex: this means that you may want to initialize the object in the main before instantiating the threads that are going to rely on the mutex itself.

/**
 * @brief   Initializes s @p mutex_t structure.
 *
 * @param[out] mp       pointer to a @p mutex_t structure
 *
 * @init
 */
void chMtxObjectInit(mutex_t *mp)

chMtxLock and chMtxUnlock

The chMtxLock function is used to acquire the mutex and enter a mutually exclusive zone. If the mutex is already held by another thread, the calling thread will be blocked until the mutex is released. The chMtxUnlock function is used to release the mutex and exit the mutually exclusive zone, allowing other threads to acquire the mutex and access the shared resource.

/**
 * @brief   Locks the specified mutex.
 * @post    The mutex is locked and inserted in the per-thread stack of owned
 *          mutexes.
 *
 * @param[in] mp        pointer to the @p mutex_t structure
 *
 * @api
 */
void chMtxLock(mutex_t *mp)

/**
 * @brief   Unlocks the specified mutex.
 * @note    Mutexes must be unlocked in reverse lock order. Violating this
 *          rules will result in a panic if assertions are enabled.
 * @pre     The invoking thread <b>must</b> have at least one owned mutex.
 * @post    The mutex is unlocked and removed from the per-thread stack of
 *          owned mutexes.
 *
 * @param[in] mp        pointer to the @p mutex_t structure
 *
 * @api
 */
void chMtxUnlock(mutex_t *mp)

How to use the mutex

The working principle is quite simple: if a thread tries to acquire a mutex that is already locked, it gets suspended until the mutex does not get released. De facto this forbids that thread to enter its mutual exclusion zone as another thread is keeping the mutex locked.

    /* Begin of the mutual exclusion zone. */
    chMtxLock(&my_mutex);

    /* Doing some operation on the shared variable. */

    chMtxUnlock(&my_mutex);
    /* End of the mutual exclusion zone. */

Protecting a shared variable

The following code shows once more the example proposed before where two threads are accessing a shared variable. This time the two threads are getting synchronized by acquiring and releasing the same mutex.

#include "ch.h"
#include "hal.h"
#include "chprintf.h"

/* Shared variable. */
static volatile int32_t shared_var = 0;

/* Mutex. */
static mutex_t my_mutex;

static THD_WORKING_AREA(waThread1, 128);
static THD_FUNCTION(Thread1, arg) {

  (void) arg;

  while (true) {

    /* Begin of the mutual exclusive zone. */
    chMtxLock(&my_mutex);
    shared_var++;
    chMtxUnlock(&my_mutex);
    /* End of the mutual exclusive zone. */

    chThdSleepMilliseconds(1);
  }
}

static THD_WORKING_AREA(waThread2, 128);
static THD_FUNCTION(Thread2, arg) {

  (void) arg;

  while (true) {

    /* Begin of the mutual exclusive zone. */
    chMtxLock(&my_mutex);
    shared_var--;
    chMtxUnlock(&my_mutex);
    /* End of the mutual exclusive zone. */

    chThdSleepMilliseconds(1);
  }
}


int main(void) {

  /* ChibiOS/HAL and ChibiOS/RT initialization. */
  halInit();
  chSysInit();

  sdStart(&SD5, NULL);

  /* Initializing the mutex. */
  chMtxObjectInit(&my_mutex);

  /* Contenders. */
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO - 2, Thread1, NULL);
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO - 1, Thread2, NULL);

  while (true) {
    chprintf((BaseSequentialStream*)&SD5, "%d\r\n", shared_var);
    chThdSleepMilliseconds(200);
  }
}

Note that the mutex is getting initialized in the main before being used.

Protecting a shared hardware resource

In certain use cases, it may be necessary for two threads to access the same hardware resource. In such cases, a mutex can be used to ensure that only one thread at a time can access the resource. Some drivers in ChibiOS/HAL have native support for mutexes to facilitate multithreaded applications. For example, the SPI driver offers specific APIs, such as spiAcquireBus and spiReleaseBus, which lock and unlock a mutex associated with that driver.

To better understand how this works, let’s consider the following example.

static THD_FUNCTION(Thread1, arg) {
  ...
  while (true) {
    spiAcquireBus(&SPID1);
    spiStart(&SPID1, &spicfg_thd1);
 
    /* Some SPI operations. */
    spiSelect(&SPID1);
    spiExchange(&SPID1, ...);
    spiUnselect(&SPID1);
   
    spiReleaseBus(&SPID1);
    chThdSleepMilliseconds(100);
  }
}

static THD_FUNCTION(Thread2, arg) {
  ...
  while (true) {
    spiAcquireBus(&SPID1);
    spiStart(&SPID1, &spicfg_thd2);
 
    /* Some SPI operations. */
    spiSelect(&SPID1);
    spiExchange(&SPID1, ...);
    spiUnselect(&SPID1);
   
    spiReleaseBus(&SPID1);
    chThdSleepMilliseconds(100);
  }
}

In the example, both Thread1 and Thread2 are operating on the same SPI driver, but before starting any activity, they first acquire the bus by calling the spiAcquireBus function, which locks the mutex associated with SPID1. Similarly, when the operation is complete, each thread releases the bus by calling the spiReleaseBus function, allowing the other thread to use the SPI driver.

It is important to note that after acquiring the bus, each thread reconfigures the SPI with its own unique configuration.

However, it is worth noting that this API is only available if the SPI_USE_MUTUAL_EXCLUSION switch is enabled in halconf.h. The reason behind this switch is that the capability of acquiring and releasing the bus relies on ChibiOS/RT to provide the mutex functionality. Meanwhile, ChibiOS/HAL can be run standalone therefore switches like this allow HAL to be decoupled from RT completely.

Priority inversion and priority inheritance

Blocking shared resources can lead to priority inversion, where a higher-priority thread is blocked by a lower-priority thread holding the resource. Unlike semaphores, ChibiOS mutexes are not affected by this issue because ChibiOS/RT handles thread priorities opportunistically when using mutexes, applying priority inheritance. To gain a better understanding of this topic, it’s necessary to have a broader explanation of how ChibiOS/RT scheduling works. Therefore, we invite you to read The Complete Reference for Multithreading in ChibiOS/RT if you want to learn more.