Demystifying C: Types

Introduction

In this article, we will give an exhaustive overview of types and the associated keywords that often generate confusion while working with C.

C is a rigorous language when it comes to variables. Through its syntax, it requires you to specify how big need to be the associated memory area, in which memory section the variable will be stored, what is its scope and what kind of operation and function can operate on it. This is achieved by specifying the type of the variable when declaring it. Optionally a type can be accompanied by specific keywords that will influence the behavior of the compiler.

But let’s start from the basics.

Primitive types

C offers four built-in data types also known as primitive

  • char, the smallest addressable type with a size of 1 Byte. While a char is de-facto an integer number, it is used to represent a character. This is done by assigning each character to a number from 0 to 127 (see ASCII table)
  • int, an integer number type
  • float, real floating-point type, usually referred to as an IEEE 754 single-precision floating-point type. In a nutshell, a float uses scientific notation to have a very wide range (see Floating-point arithmetic)
  • double, real floating-point type, usually referred to as an IEEE 754 double-precision floating-point type. It has double the size of a float, hence a wider range.

As an example declaring a variable of type char and assigning its value to ‘a’ looks like

char myvar = 'a';

As said, the type specifies how many bits need to be reserved in memory and allocated when a variable is declared.

TypeSizeRange
char8 bitdepends on compiler options
intdepends on the compiler and architecture, influenced by modifiersdepends
float32 bit1.2E-38 to 3.4E+38
double64 bit2.3E-308 to 1.7E+308
Size and range of Primitives in C

Type modifiers

The primitives can be combined with four special C Keywords knowns as modifiers that will directly influence the size and range of the variable

  • signed/unsigned, to specify if the type comes with the sign. This modifier directly impacts the range too.
  • long/short, mainly used with integers to change the size of it.

The modifier is added during the declaration of a variable, typically in front of the type. While many compilers don’t care about the order, only certain combinations make sense. The following combinations are possible

/* Char with no explicit signedness. */
char myvar;

/* Char with sign. */
signed char myvar;

/* Char without sign. */
unsigned char myvar;

/* Short integer with sign. */
short myvar;
short int myvar;
signed short myvar;
signed short int myvar;

/* Short integer without sign. */
unsigned short myvar;
unsigned short int myvar;

/* Integer with sign. */
int myvar;
signed myvar;
signed int myvar;

/* Integer without sign. */
unsigned myvar;
unsigned int myvar;

/* Long integer with sign. */
long myvar;
long int myvar;
signed long myvar;
signed long int myvar;

/* Long integer without sign. */
unsigned long myvar;
unsigned long int myvar;

/* Very Long integer with sign. */
long long myvar;
long long int myvar;
signed long long myvar;
signed long long int myvar;

/* Very long integer without sign. */
unsigned long long myvar;
unsigned long long int myvar;

The following table explains how the size and range are influenced by these modifiers.

Note that if no sign modifier is applied to an integer, the compiler assumes signed as default. Furthermore, the default signedness of a char depends on some compiler options. Finally, close attention needs to be paid to int and long int whose size and range depend on the compiler, architecture, or OS in use.

Portable Fixed-Width Integer

This problem of not well-defined integers is well known and has been later addressed with the Portable Fixed-Width Integer introduced with C-99

TypeDescriptionSizeRange
int8_t8-bit integer with sign1 byte-128 to 127
uint8_t8-bit integer without sign1 byte0 to 255
int16_t16-bit integer with sign2 byte-215 to 215-1
uint16_t16-bit integer without sign2 byte0 to 216-1
int32_t32-bit integer with sign4 byte-231 to 231-1
uint32_t32-bit integer without sign4 byte0 to 232
int64_t64-bit integer with sign8 byte-263 to 263-1
uint64_t64-bit integer without sign8 byte0 to 264-1
Size and range of C-99 Portable Fixed-Width Integer

For this reason, using these types is recommended when you want to keep strict control of your application as typically happens when developing firmware on embedded systems.

Storage classes

The storage classes are C keywords that can be used to determine the lifetime, visibility and memory location of a variable. There are 4 classes: auto, static, register and extern. Like the type modifiers, the storage class is added during the declaration of a variable, typically in front of the type.

auto

This is the default storage class and limits the visibility of a variable to the block where the variable is declared. A block is delimited by brackets {}. Examples of blocks are

/* Simple block. */
{
  /* unsigned int a = 0; would have the same effect. */
  auto unsigned int a = 0;
}

/* Nested block. */
{
  {
    int a = 0;
  }
}

/* Still a block. The block is executed when main is called. */
int main() {
  int a = 0;
}

/* Still a block. The block is executed when condition is true. */ 
if(condition) {
  int a = 0;
}

/* Still a block. The block is executed N times. */ 
for(i = 0; i < N; i++) {
  int a = 0;
}

When calling a function, some space in the Stack section (RAM) is automatically allocated: it is called Stack Frame and it contains the address of the return value of the function, the function arguments and all the automatic variables declared in the function. The Stack Frame is deallocated when the function returns. This implies that the Stack used sizes increases and reduces over the program lifetime. For this reason, the area of memory of a Stack Frame can be populated with garbage value from a precedently executed function impacting the initial value of automatic variables. With this in mind, we can state that:

  • an automatic variable is a local variable that is visible only in the block in which it is declared
  • an automatic variable cannot be declared outside of a function
  • its initial value can be garbage so it is good practice to initialize it before using it
  • the lifetime of an automatic variable is tied to the function in which it is declared

To conclude let’s do some examples.

Not an automatic variable

#include <stdio.h>

auto int myvar = 3

int main() {

  printf("%d", myvar);
  return 0;
}

This code will throw an error at compile time such as

main.c:3:10: error: file-scope declaration of ‘myvar’ specifies ‘auto’
    3 | auto int myvar = 3;

This is because the variable myvar cannot be an automatic variable as it is outside of a function. A variable outside the functions without a keyword is a global variable in C. Its lifetime is the entire application’s lifetime and it is. It is stored in the RAM and more precisely in the .data section if it is initialized or in the .BSS section if it is uninitialized.

Uninitialized automatic variable

#include <stdio.h>

int main() {
  int myvar;
  printf("%d", myvar);

  return 0;
}

This code will print a garbage value as the Stack Frame is not necessarily initialized to 0.

Automatic variable scope

#include <stdio.h>

int main() {
  {
    int myvar;
  }
  printf("%d", myvar);

  return 0;
}

This code will return an error on the printf line at compile time as the variable myvar is not accessible outside of its block.

#include <stdio.h>

void myfunction(void) {
  int myvar = 0;
  printf("%d", myvar);
}

int main() {

  myfunction();

  myvar += 10;
  printf("%d", myvar);
  return 0;
}

This code will return also an error at compile time as the variable myvar exists only in myfunction and not in main.

Lexical scoping

#include <stdio.h>
 
/* 1. myvar declared outside of any block. */
int myvar = 50;

void myfunc(void) {
  printf("% d\r\n", myvar);
}

int main() {
  /* 2. myvar declared inside main. */
  int myvar = 100;
  {
    /* 3. myvar declared inside a sub-block in main. */
    int myvar = 200;
    printf("% d\r\n", myvar);
    myvar += 50;
    printf("% d\r\n", myvar);
  }
  printf("% d\r\n", myvar);
  
  myfunc();
  return 0;
}

This code will compile and print out

 200
 250
 100
 50

It may appear weird has the three declared variables have the same name. This can actually be done, although strongly discouraged because it impacts the code readability and may be a source of confusion.

In C, variables with the same name that are declared in different blocks of code are treated as separate variables. The C programming language follows the rule of lexical scoping or static scoping, which means that a local variable takes precedence over a global variable with the same name.

This means that, within a block of code, if a local variable with the same name as a global variable is declared, the local variable will be used instead of the global variable. If the local variable is not used within that block of code, then the global variable will be used.

In this example, we have three variables with the same name:

  1. myvar declared outside of any block with a value of 50. This is a global variable.
  2. myvar declared inside the main function with the value 100. This is a local variable to the main function.
  3. myvar declared inside a sub-block within the main function with the value 200. This is a local variable to the sub-block.

Within the sub-block, the local variable with the value 200 takes precedence over the other two variables, so it is used. Outside of the sub-block, the local variable with the value 100 takes precedence over the global variable with the value 50, so it is used. When myfunc is called, the global variable with the value 50 is used because it is the only variable with that name in the scope of myfunc.

static

This storage class instructs the compiler to keep a local variable in existence during the entire lifetime of the program. This means the variable continues to exist and keeps its value even when the function is terminated.

As an automatic variable, a static one is visible only in the block where it is declared. If this variable is declared outside of any block it is visible only in the file where it is declared. It is stored in the RAM memory and more precisely in the .data section if it is initialized or in the .BSS section if it is uninitialized.

Let’s do some examples

An automatic variable vs a static variable

#include <stdio.h>

void myfunc(void) {
  int auto_var = 3;
  static int static_var = 3;
  printf("The auto_var is %d, static_var is %d\r\n", auto_var, static_var);
  auto_var++;
  static_var++;
}

int main() {
  int count = 0;
  for(count = 0; count < 5; count++) {
      printf("Iteration %d: ", count);
      myfunc();
  }
  return 0;
}

that will generate the following output

Iteration 0: The auto_var is 3, static_var is 3
Iteration 1: The auto_var is 3, static_var is 4
Iteration 2: The auto_var is 3, static_var is 5
Iteration 3: The auto_var is 3, static_var is 6
Iteration 4: The auto_var is 3, static_var is 7

While auto_var dies and is recreated every time that myfunc returns, static_var survives and keeps its value. It is worth noticing that the initialization of the static variable happens only once.

Visibility within the sheet

#include <stdio.h>

static int myvar = 0;

void myfunc() {
  myvar++;
}

int main() {

  int count = 0;
  for(count = 0; count < 5; count++) {
      printf("%d\r\n", myvar);
      myfunc();
  }
  return 0;
}

In this example, myvar is visible both from myfunc and main. The result will be

0
1
2
3
4

register

This storage class instructs the compiler to keep a variable in a register of the CPU. This has some implications:

  • The size of the variable is tied to the size of a register
  • It is not possible to use the & operator on the variable as the register has no memory address hence is not possible to reference the variable

extern

This storage class instructs the compiler to access a global variable defined in some other source file. Note that the extern keyword declares the variable but does not defines it: in other words, no memory is allocated when using extern and if the variable is not defined somewhere an error is thrown.

Some examples will shed light on the matter

Typical extern use case

In this example, we have two source files for our project.

/* main.c */

#include <stdio.h>

extern int myvar;

int main() {
  printf("myvar is %d", myvar);

  return 0;
}
/* mylib.c */

int myvar = 10;

Building this project will result in

myvar is 10

Note then that myvar is a global variable made accessible using the extern storage class. Note also that the variable has been defined and initialized in mylib.c and declared in main.c where it has been used

Summary table

Storage classStorageDefault valueScopeLifetime
autoStack framegarbageblockend of block
static.data or .bsszeroblockend of program
registerCPU registergarbageblockend of block
extern.data or .bsszeroglobalend of program
Data storage classes in C summary

Qualifiers

To complete the picture, C offers two type qualifiers that modify the property of a variable and allows the compiler to perform optimizations correctly:

  • const
  • volatile

Like the type modifiers and the storage classes, also qualifiers are added during the declaration of a variable, typically in front of the type.

const

This qualifier tells the compiler that a variable will be constant during the entire lifetime of the program. This opens to optimization such as moving this variable in the .ro session of the memory: it stands for read-only and typically it is mapped onto the Flash memory to spare RAM space.

/* This is a global constant variable. */
const int myvar = 10;

/* This is a static constant variable. */
static const int myvar = 10;

int myfunc() {
  /* This is an automatic constant variable. */
  const int myvar = 10;
  return myvar;
}

volatile

This qualifier tells the compiler the value of that variable can change at any time by something unrelated to the C code. The implications of such an occurrence are crucial: this tells the compiler to not do any assumption on that variable. For the purpose of shedding some light follow me in this last digression.

Let us say that at a specific memory address (e.g. 0x40000010) is located a 32-bit secret register that allows us to unlock some special features of our machine. To unlock these features we need to write a secret sequence onto it (e.g. 0x10105555, 0x11105522, 0xFF505050). Keen to unlock the full potential of our machine, we will promptly write

/* Pointer to an unsigned 32-bit. Initializing this to the memory
   address of the register, I may use the pointer to write into
   the register. */
uint32_t * mypointer = 0x40000010;

int main() {

  /* Writing the sequence. */
  *mypointer = 0x10105555;
  *mypointer = 0x11105522;
  *mypointer = 0xFF505050;
}

On paper, that does exactly what we said. Unfortunately, if the compiler optimizations are enabled, this may not work. The compiler sees that we are writing three values in the same memory location without doing any operation in between so it skips the first two assignments and performs only the last one.

Here comes to play volatile. The following code solves the issue

/* Pointer to a volatile unsigned 32-bit. */
volatile uint32_t * mypointer = 0x40000010;

int main() {

  /* Writing the sequence. */
  *mypointer = 0x10105555;
  *mypointer = 0x11105522;
  *mypointer = 0xFF505050;
}

Adding the keyword volatile the compiler stops to do assumptions about the value assigned to that specific memory location. Consequently, all three assignments will be performed.

This example serves well in three possible real-life scenarios:

  • Memory-mapped peripheral registers
  • Global variables modified by an Interrupt Service Routine
  • Global variables within a multi-threaded application

Replies to Demystifying C: Types

Leave a Reply