A further reading about ChibiOS multi-threading

A detailed explanation of multithreading in ChibiOS/RT

Multithreading explained

In the article “A close look to ChibiOS demos for STM32” we have briefly explained how multithreading works but in this article we want to discuss more in detail the mechanisms behind ChibiOS/RT. This article is addressed to those developers already experienced with this RTOS.

A single thread application could be imagined like a straight line. Executing code we are travelling along that line and the beginning of our line is the application entry point. We can jump from a point of the line to an other, but we will always be on the same line.

In a multithreaded application, the code execution can split in more than a branch: this mean that our line could fork and execution proceeds long two or more separate branches. Each branch is a thread and our line forks when a thread creates a new thread. As example let us consider Fig.2.

Threads flow
An example of execution flow in a multi-thread application.

The application starts at point 0 and the execution proceed with a single thread from here and out named Yellow. When the execution reaches the point 1 Yellow calls a new thread, Blue, and the execution forks. From this moment the execution proceed long two separate lines until the point 2 when Yellow creates another thread: Azure. At point 3 is Blue which creates Red and we have four threads which are executed together until the point 4 when Red terminate returning on its caller and the three remaining threads proceed over.

Most MCUses have only one CPU with single core: this means that actually only one line at time can proceed. Anyway, in multithreaded application it seems like all the lines proceed forward simultaneously. This because a kernel assigns the resources alternately to each line, according to established rules, pushing each line forward. The operation of pausing a line saving its state and resuming an another is called context switch.

The adopted rules depends on the kernel. As example ChibiOS/RT is a real time preemptive kernel based on priorities capable also of preemptive round-robin and cooperative scheduling. The rule is simple:

among all the threads ready for execution, the one with the highest priority is the one being executed, no exceptions to this rule.

Priority multithreading

So that multi-threading could work, the execution of a thread must be continuously suspended and resumed granting resources to each thread.

A thread could suspend itself or be preempted by another thread:

  • A thread could be preempted: this means that a thread could be stopped to assign resources to another thread in any moment. Preemption happens when another thread with an highest priority becomes ready for the execution. In this case there is a context switch from the lower priority thread to the higher one.
  • Suspension happens when thread voluntarily releases resources calling certain functions.

As example if a thread named Thread1 calls the function

chThdSleepMilliseconds(200);

it is releasing resources entering in suspension status, returning ready after 200 milliseconds. This happens because the thread has configured a virtual timer for an alarm before to suspended it-self. When the alarm is triggered Thread1 becomes ready. If Thread1 is the one at highest priority among all the threads ready for the execution, it preempts the current thread. Some examples will clarify better.

Example 1: suspension and preemption

Lets consider the following code:

static THD_WORKING_AREA(waThread1, 128);
static THD_FUNCTION(Thread1, arg) {
  (void)arg;
  chRegSetThreadName("Thread1");
  while (true) {
    palTogglePad(GPIOA, GPIOA_LED_GREEN);
    chThdSleepMilliseconds(200);
  }
}
int main(void) {
  halInit();
  chSysInit();
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO + 1, Thread1, NULL);
  while (true) {
    palTogglePad(GPIOA, GPIOA_LED_RED);
    chThdSleepMilliseconds(375);
  }
}

In this case we have three threads. Ordering them by decreasing priority they are Thread1, main and idle. The last one is a thread always ready with the lowest priority which is executed only when other thread are not ready. As its name suggest, it is an empty thread which does nothing waiting for an event. Lets take a look to an hypothetical timing flow for the considered scenario. To improve readability, we have overestimate the time consumed by operations.

Example 1 threads timing flow
The timing flow of the example 1

In the previous figure threads are ordered by priority on the y-axis, on the x-axis there is time. The colored rectangles represent the current thread. A grey arrow represent that related thread is sleeping, otherwise green arrow represent time in which a thread is ready to be executed but waits other threads with higher priority. Now, we can certainly say that the timing flow is:

  1. The application enters in main(). When chSysInit() is executed main() becomes a thread named main: it runs until the creation of Thread1,
  2. When Thread1 has been created, both main and Thread1 are ready but the last one has higher priority, so main is preempted. Thread1 execute its code toggling the green LED then goes sleep for 200ms.
  3. Now main is resumed and enters into its loop toggling red LED then goes sleep for 375ms.
  4. Both main and Thread1 are suspended and idle is the only one to be ready then is executed.
  5. Thread1 becomes ready and preempts idle: it toggles the green LED then goes sleep for 200ms.
  6. and so on

We should also notice that:

  • Thread1 is never preempted,
  • If Thread1 never release the CPU ownership, other threads would be never executed.
  • The main could be preempted by Thread1 (1-2 and 7-8).
  • The most of time we are in idle thread (and this would be noticed more markedly if we hadn’t overestimate the time consumed by operations).

The reader should figure out that:

We could compute the CPU usage as the inverse of the running percentage of the idle thread.

Before to proceed we should spent few more words about priorities. In ChibiOS priorities are integer in a range which starts from LOWPRIO up to HIGHPRIO. The main thread is started at the NORMALPRIO priority level which is in the middle of the whole range. Quoting chibios.org:

When designing your application you should never think in terms of absolute priorities because the numeric range may change in future versions, much better reason in terms of relative priorities from the above symbolic levels, for example NORMALPRIO+5, NORMALPRIO-1, this is guaranteed to be portable.

Priority inversion and priority inheritance

Priority multithreading implies a known problem named priority inversion. In this scenario low priority thread (L) and an higher priority thread (H) uses a shared resource. In certain cases L preempt H “inverting” their relative priorities. The solution to this scenario is the priority inheritance: in this case L temporarily inherit priority from H. In this way resources are freed as soon as possible and H could be executed. The following example will clarify better.

Example 2: benefits achieved through priority inheritance

Lets consider the following code ignoring its correctness.

static THD_WORKING_AREA(waThread1, 128);
static THD_FUNCTION(Thread1, arg) {
  (void)arg;
  chRegSetThreadName("Thread1");
  while (true) {
    /* Somehow the SPI block requires 200ms */
    spiAcquireBus(&SPID1);
    spiStart(&SPID1, &spicfg);
    spiSelect(&SPID1);
    spiExchange(&SPID1, 200, txbuf, rxbuf);
    spiUnselect(&SPID1);
    spiReleaseBus(&SPID1);
    chThdSleepMilliseconds(200);
  }
}
static THD_WORKING_AREA(waThread2, 128);
static THD_FUNCTION(Thread2, arg) {
  (void)arg;
  chRegSetThreadName("Thread2");
  while (true) {
    /*
        ...
        Operation block requires (50ms)
        ...   
     */
    chThdSleepMilliseconds(100);
  }
}
static THD_WORKING_AREA(waThread3, 128);
static THD_FUNCTION(Thread3, arg) {
  (void)arg;
  chRegSetThreadName("Thread3");
  while (true) {
    /*
        ...
        Operation block requires (75ms)
        ...   
     */
    chThdSleepMilliseconds(75);
  }
}
static THD_WORKING_AREA(waThread4, 128);
static THD_FUNCTION(Thread4, arg) {
  (void)arg;
  chRegSetThreadName("Thread4");
  while (true) {
    /* Somehow the SPI block requires 150ms */
    spiAcquireBus(&SPID1);
    spiStart(&SPID1, &spicfg);
    spiSelect(&SPID1);
    spiExchange(&SPID1, 150, txbuf, rxbuf);
    spiUnselect(&SPID1);
    spiReleaseBus(&SPID1);
    chThdSleepMilliseconds(100);
  }
}
int main(void) {
  halInit();
  chSysInit();
  
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO - 2, Thread2, NULL);
  chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO - 3, Thread3, NULL);
  chThdCreateStatic(waThread4, sizeof(waThread4), NORMALPRIO - 4, Thread4, NULL);
  chThdSleepMilliseconds(250);
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO - 1, Thread1, NULL);
  while (true) {
    chThdSleepMilliseconds(5000);
  }
}

In this example we have six threads. We can figure out that:

  • Ordering thread by decreasing priority they are mainThread1Thread2, Thread3, Thread4 and idle.
  • Thread1 and Thread4 uses the same resources acquiring it (SPID1);
  • The main insert a delay between the creation of Thread2, Thread3, Thread4 and the creation of Thread1;
  • Thread1 loop requires 200ms for its loop activities plus 200ms of sleep;
  • Thread2 loop requires 50ms for its loop activities plus 100ms of sleep;
  • Thread3 loop requires 75ms for its loop activities plus 75ms of sleep;
  • Thread4 loop requires 150ms for its loop activities plus 100ms of sleep;
  • After creating threads main enter in a loop in which it does nothing.

If priority inheritance paradigm isn’t applied an hypothetical timing flow would be:

Example 2-1 threads timing flow
The timing flow of the example 2 without priority inheritance.

Note that even if Thread1 is ready it cannot run until Thread4 releases SPID1. Even more, during the whole time, Thread4 is preempted by Thread2 and Thread3 building up a considerable delay.

Because priority inversion, Thread1 has a priority lower than Thread2, Thread3 and Thread4.

If priority inheritance paradigm is applied the delay is considerably reduced:

Example 2-2 threads timing flow
Fig.5 – The timing flow of the example 2 with priority inheritance.

Note that, when Thread1 is ready Thread4 inherits its priority and preempt Thread3: now Thread2 and Thread3 cannot preempt Thread4 anymore. When Thread4 releases the shared resource, priorities are restored and Thread1 can run.

ChibiOS/RT mitigates the priority inversion problem adopting the priority inheritance paradigm.

Threads with same priority: round robin and cooperative scheduling

In ChibiOS/RT, the scheduling of threads having the same priority could happens in two ways:

  • Round-robin scheduling
  • Cooperative scheduling.

The preemptive round-robin is a scheduling method based on time-sharing: each threads takes an equal time slot, known as quantum, to perform its job. In ChibiOS this time slot is a parameter named CH_CFG_TIME_QUANTUM, expressed in system ticks and defined in chconf.h. When this value is greater than zero a thread is preempted after a quantum in order to give resources to other threads at same priority.

This preemption could be also disabled setting CH_CFG_TIME_QUANTUM to zero. In this case thread cannot be preempted by equal priority threads. The only way to make each thread run is the cooperative scheduling where each thread voluntarily yields to other threads through the api

chThdYield();

The following examples will clarify what we said.

Example 3: Preemptive round-robin

Lets consider the follow example when CH_CFG_TIME_QUANTUM is 500 and system tick period is 100us (which means to have a quantum of 50ms):

static THD_WORKING_AREA(waThread1, 128);
static THD_FUNCTION(Thread1, arg) {
  (void)arg;
  chRegSetThreadName("Thread1");
  while (true) {
    /*
        ...
        Operation block requires (100ms)
        ...   
     */
  }
}
static THD_WORKING_AREA(waThread2, 128);
static THD_FUNCTION(Thread2, arg) {
  (void)arg;
  chRegSetThreadName("Thread2");
  while (true) {
    /*
        ...
        Operation block requires (50ms)
        ...   
     */
  }
}
static THD_WORKING_AREA(waThread3, 128);
static THD_FUNCTION(Thread3, arg) {
  (void)arg;
  chRegSetThreadName("Thread3");
  while (true) {
    /*
        ...
        Operation block requires (150ms)
        ...   
     */
  }
}
int main(void) {
  halInit();
  chSysInit();
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO, Thread1, NULL);
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO, Thread2, NULL);
  chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO, Thread3, NULL);
 
  while (true) {
    /*
        ...
        Operation block requires (200ms)
        ...   
     */
  }
}

In this example we have five threads. Lets do some statement.

  • All threads have the same priority except idle which has a lower priority.
  • All the operation blocks never release the CPU

An hypothetical timing flow would be:

Example 3 threads timing flow
The timing flow of the example 3 when quantum is 50ms.

Looking to code and Fig.6 we can figure out that:

  • each thread is preempted after that quantum has elapsed;
  • the idle thread never runs since it has a lower priority;
  • every thread most likely is preempted in the middle of its operation block;

Note that an higher priority thread would preempt these threads anyway creating a mixed scheduling. In this case when the higher priority thread becomes ready round-robin would be paused and resumed when higher priority thread goes sleep.

Example 4: cooperative scheduling

Lets consider the follow example when CH_CFG_TIME_QUANTUM is 0:

static THD_WORKING_AREA(waThread1, 128);
static THD_FUNCTION(Thread1, arg) {
  (void)arg;
  chRegSetThreadName("Thread1");
  while (true) {
    /*
        ...
        Operation block requires (100ms)
        ...   
     */
    chThdYield();
  }
}
static THD_WORKING_AREA(waThread2, 128);
static THD_FUNCTION(Thread2, arg) {
  (void)arg;
  chRegSetThreadName("Thread2");
  while (true) {
    /*
        ...
        Operation block requires (50ms)
        ...   
     */
    chThdYield();
  }
}
static THD_WORKING_AREA(waThread3, 128);
static THD_FUNCTION(Thread3, arg) {
  (void)arg;
  chRegSetThreadName("Thread3");
  while (true) {
    /*
        ...
        Operation block requires (150ms)
        ...   
     */
    chThdYield();
  }
}
int main(void) {
  halInit();
  chSysInit();
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO, Thread1, NULL);
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO, Thread2, NULL);
  chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO, Thread3, NULL);
 
  while (true) {
    /*
        ...
        Operation block requires (200ms)
        ...   
     */
    chThdYield();
  }
}

In this example we have five threads. Lets do some statement.

  • All threads have the same priority except idle.
  • The operation blocks never release the CPU
  • The code is identical to the Example 3 except for the insertion of a yield statement after each operation block.

An hypothetical timing flow would be:

Example 4 threads timing flow
The timing flow of the example 4 with cooperative round-robin.

Lets do some statement:

  • unlike the previous case, there isn’t a preemption
  • each thread must yield or it will run forever
  • each threads execute the whole operation block before to yield
  • when a thread yield is queued and has to wait the execution of every other thread having equal priority.

Conclusions

At this point the reader should figure out that the scheduling in ChibiOS/RT is deterministic and follow specific rules. Note that, mixing the cases here presented it is possible to create new mixed scheduling scenarios. As example, it is possible to use a sleep instead of yield in cooperative scheduling or it is possible use yield also when during round-robin releasing resources before the quantum has elapsed.

Anyway, the reader should understand that scheduling strategy should be chosen wisely. More details about how and when use all these strategies will be the topic in the next article about multithreading.

Replies to A detailed explanation of multithreading in ChibiOS/RT

  • I have to regret this post.
    I’ve always been on tightrope, between chibios and nuttx.
    Now I have to stay on my fingertips on the tightrope.
    Excellent post, because it clarified details that I did not notice in the chibios.
    Thank you.

    • Thanks, and sorry for this late reply.
      I will write more about this argument in future. It is just matter of time. Graphs and examples require a lot of work and time. It takes something like 6/7 days to write a detailed article.

  • Hi,
    Your tutorial is great. I want to implement chibios in my project. Is it possible?

    My application is to

    1.Read the 2 sensors data (First sensor polling time is 1 minute and second sensor polling time is for every one sec)
    2.Store the sensor data in the SD card
    3.Transmit the data into FTP server using GSM module for every pre defined interval (minimum 5 mins).

    Can I run multitasking with these requirements.

    It would greatly help how to do that using chibios or any thing.

    • Sure, you can use multi-threading as well as all the utilities coming from ChibiOS\HAL

      For example, you could read data from sensors using a thread to do the read and store the data in a queue within a precise time window and an additional thread that serves the queue storing the SD card and pushes the data into the GSM module.

      There are a lot of things involved. I suggest you create the project step by step solving one problem at a time.
      For more info about generical questions related to ChibiOS, you would use the official forum of ChibiOS instead of PLAY Embedded, where comments are always welcome but the response rate is quite lower.

Leave a Reply