
Demystifying C: Types
Introduction
In this article, we will give an exhaustive overview of types and the associated keywords that often generate confusion while working with C.
C is a rigorous language when it comes to variables. Through its syntax, it requires you to specify how big need to be the associated memory area, in which memory section the variable will be stored, what is its scope and what kind of operation and function can operate on it. This is achieved by specifying the type of the variable when declaring it. Optionally a type can be accompanied by specific keywords that will influence the behavior of the compiler.
But let’s start from the basics.
Primitive types
C offers four built-in data types also known as primitive
- char, the smallest addressable type with a size of 1 Byte. While a char is de-facto an integer number, it is used to represent a character. This is done by assigning each character to a number from 0 to 127 (see ASCII table)
- int, an integer number type
- float, real floating-point type, usually referred to as an IEEE 754 single-precision floating-point type. In a nutshell, a float uses scientific notation to have a very wide range (see Floating-point arithmetic)
- double, real floating-point type, usually referred to as an IEEE 754 double-precision floating-point type. It has double the size of a float, hence a wider range.
As an example declaring a variable of type char and assigning its value to ‘a’ looks like
char myvar = 'a';
As said, the type specifies how many bits need to be reserved in memory and allocated when a variable is declared.
Type | Size | Range |
---|---|---|
char | 8 bit | depends on compiler options |
int | depends on the compiler and architecture, influenced by modifiers | depends |
float | 32 bit | 1.2E-38 to 3.4E+38 |
double | 64 bit | 2.3E-308 to 1.7E+308 |
Type modifiers
The primitives can be combined with four special C Keywords knowns as modifiers that will directly influence the size and range of the variable
- signed/unsigned, to specify if the type comes with the sign. This modifier directly impacts the range too.
- long/short, mainly used with integers to change the size of it.
The modifier is added during the declaration of a variable, typically in front of the type. While many compilers don’t care about the order, only certain combinations make sense. The following combinations are possible
/* Char with no explicit signedness. */ char myvar; /* Char with sign. */ signed char myvar; /* Char without sign. */ unsigned char myvar; /* Short integer with sign. */ short myvar; short int myvar; signed short myvar; signed short int myvar; /* Short integer without sign. */ unsigned short myvar; unsigned short int myvar; /* Integer with sign. */ int myvar; signed myvar; signed int myvar; /* Integer without sign. */ unsigned myvar; unsigned int myvar; /* Long integer with sign. */ long myvar; long int myvar; signed long myvar; signed long int myvar; /* Long integer without sign. */ unsigned long myvar; unsigned long int myvar; /* Very Long integer with sign. */ long long myvar; long long int myvar; signed long long myvar; signed long long int myvar; /* Very long integer without sign. */ unsigned long long myvar; unsigned long long int myvar;
The following table explains how the size and range are influenced by these modifiers.
Note that if no sign modifier is applied to an integer, the compiler assumes signed as default. Furthermore, the default signedness of a char depends on some compiler options. Finally, close attention needs to be paid to int and long int whose size and range depend on the compiler, architecture, or OS in use.
Portable Fixed-Width Integer
This problem of not well-defined integers is well known and has been later addressed with the Portable Fixed-Width Integer introduced with C-99
Type | Description | Size | Range |
---|---|---|---|
int8_t | 8-bit integer with sign | 1 byte | -128 to 127 |
uint8_t | 8-bit integer without sign | 1 byte | 0 to 255 |
int16_t | 16-bit integer with sign | 2 byte | -215 to 215-1 |
uint16_t | 16-bit integer without sign | 2 byte | 0 to 216-1 |
int32_t | 32-bit integer with sign | 4 byte | -231 to 231-1 |
uint32_t | 32-bit integer without sign | 4 byte | 0 to 232 |
int64_t | 64-bit integer with sign | 8 byte | -263 to 263-1 |
uint64_t | 64-bit integer without sign | 8 byte | 0 to 264-1 |
For this reason, using these types is recommended when you want to keep strict control of your application as typically happens when developing firmware on embedded systems.
Storage classes
The storage classes are C keywords that can be used to determine the lifetime, visibility and memory location of a variable. There are 4 classes: auto, static, register and extern. Like the type modifiers, the storage class is added during the declaration of a variable, typically in front of the type.
auto
This is the default storage class and limits the visibility of a variable to the block where the variable is declared. A block is delimited by brackets {}. Examples of blocks are
/* Simple block. */ { /* unsigned int a = 0; would have the same effect. */ auto unsigned int a = 0; } /* Nested block. */ { { int a = 0; } } /* Still a block. The block is executed when main is called. */ int main() { int a = 0; } /* Still a block. The block is executed when condition is true. */ if(condition) { int a = 0; } /* Still a block. The block is executed N times. */ for(i = 0; i < N; i++) { int a = 0; }
When calling a function, some space in the Stack section (RAM) is automatically allocated: it is called Stack Frame and it contains the address of the return value of the function, the function arguments and all the automatic variables declared in the function. The Stack Frame is deallocated when the function returns. This implies that the Stack used sizes increases and reduces over the program lifetime. For this reason, the area of memory of a Stack Frame can be populated with garbage value from a precedently executed function impacting the initial value of automatic variables. With this in mind, we can state that:
- an automatic variable is a local variable that is visible only in the block in which it is declared
- an automatic variable cannot be declared outside of a function
- its initial value can be garbage so it is good practice to initialize it before using it
- the lifetime of an automatic variable is tied to the function in which it is declared
To conclude let’s do some examples.
Not an automatic variable
#include <stdio.h> auto int myvar = 3 int main() { printf("%d", myvar); return 0; }
This code will throw an error at compile time such as
main.c:3:10: error: file-scope declaration of ‘myvar’ specifies ‘auto’ 3 | auto int myvar = 3;
This is because the variable myvar
cannot be an automatic variable as it is outside of a function. A variable outside the functions without a keyword is a global variable in C. Its lifetime is the entire application’s lifetime and it is. It is stored in the RAM and more precisely in the .data section if it is initialized or in the .BSS section if it is uninitialized.
Uninitialized automatic variable
#include <stdio.h> int main() { int myvar; printf("%d", myvar); return 0; }
This code will print a garbage value as the Stack Frame is not necessarily initialized to 0.
Automatic variable scope
#include <stdio.h> int main() { { int myvar; } printf("%d", myvar); return 0; }
This code will return an error on the printf line at compile time as the variable myvar
is not accessible outside of its block.
#include <stdio.h> void myfunction(void) { int myvar = 0; printf("%d", myvar); } int main() { myfunction(); myvar += 10; printf("%d", myvar); return 0; }
This code will return also an error at compile time as the variable myvar
exists only in myfunction
and not in main
.
Lexical scoping
#include <stdio.h> /* 1. myvar declared outside of any block. */ int myvar = 50; void myfunc(void) { printf("% d\r\n", myvar); } int main() { /* 2. myvar declared inside main. */ int myvar = 100; { /* 3. myvar declared inside a sub-block in main. */ int myvar = 200; printf("% d\r\n", myvar); myvar += 50; printf("% d\r\n", myvar); } printf("% d\r\n", myvar); myfunc(); return 0; }
This code will compile and print out
200 250 100 50
It may appear weird has the three declared variables have the same name. This can actually be done, although strongly discouraged because it impacts the code readability and may be a source of confusion.
In C, variables with the same name that are declared in different blocks of code are treated as separate variables. The C programming language follows the rule of lexical scoping or static scoping, which means that a local variable takes precedence over a global variable with the same name.
This means that, within a block of code, if a local variable with the same name as a global variable is declared, the local variable will be used instead of the global variable. If the local variable is not used within that block of code, then the global variable will be used.
In this example, we have three variables with the same name:
myvar
declared outside of any block with a value of 50. This is a global variable.myvar
declared inside themain
function with the value 100. This is a local variable to themain
function.myvar
declared inside a sub-block within themain
function with the value 200. This is a local variable to the sub-block.
Within the sub-block, the local variable with the value 200 takes precedence over the other two variables, so it is used. Outside of the sub-block, the local variable with the value 100 takes precedence over the global variable with the value 50, so it is used. When myfunc
is called, the global variable with the value 50 is used because it is the only variable with that name in the scope of myfunc
.
static
This storage class instructs the compiler to keep a local variable in existence during the entire lifetime of the program. This means the variable continues to exist and keeps its value even when the function is terminated.
As an automatic variable, a static one is visible only in the block where it is declared. If this variable is declared outside of any block it is visible only in the file where it is declared. It is stored in the RAM memory and more precisely in the .data section if it is initialized or in the .BSS section if it is uninitialized.
Let’s do some examples
An automatic variable vs a static variable
#include <stdio.h> void myfunc(void) { int auto_var = 3; static int static_var = 3; printf("The auto_var is %d, static_var is %d\r\n", auto_var, static_var); auto_var++; static_var++; } int main() { int count = 0; for(count = 0; count < 5; count++) { printf("Iteration %d: ", count); myfunc(); } return 0; }
that will generate the following output
Iteration 0: The auto_var is 3, static_var is 3 Iteration 1: The auto_var is 3, static_var is 4 Iteration 2: The auto_var is 3, static_var is 5 Iteration 3: The auto_var is 3, static_var is 6 Iteration 4: The auto_var is 3, static_var is 7
While auto_var
dies and is recreated every time that myfunc
returns, static_var
survives and keeps its value. It is worth noticing that the initialization of the static variable happens only once.
Visibility within the sheet
#include <stdio.h> static int myvar = 0; void myfunc() { myvar++; } int main() { int count = 0; for(count = 0; count < 5; count++) { printf("%d\r\n", myvar); myfunc(); } return 0; }
In this example, myvar
is visible both from myfunc
and main
. The result will be
0 1 2 3 4
register
This storage class instructs the compiler to keep a variable in a register of the CPU. This has some implications:
- The size of the variable is tied to the size of a register
- It is not possible to use the & operator on the variable as the register has no memory address hence is not possible to reference the variable
extern
This storage class instructs the compiler to access a global variable defined in some other source file. Note that the extern keyword declares the variable but does not defines it: in other words, no memory is allocated when using extern
and if the variable is not defined somewhere an error is thrown.
Some examples will shed light on the matter
Typical extern use case
In this example, we have two source files for our project.
/* main.c */ #include <stdio.h> extern int myvar; int main() { printf("myvar is %d", myvar); return 0; }
/* mylib.c */ int myvar = 10;
Building this project will result in
myvar is 10
Note then that myvar
is a global variable made accessible using the extern
storage class. Note also that the variable has been defined and initialized in mylib.c
and declared in main.c
where it has been used
Summary table
Storage class | Storage | Default value | Scope | Lifetime |
---|---|---|---|---|
auto | Stack frame | garbage | block | end of block |
static | .data or .bss | zero | block | end of program |
register | CPU register | garbage | block | end of block |
extern | .data or .bss | zero | global | end of program |
Qualifiers
To complete the picture, C offers two type qualifiers that modify the property of a variable and allows the compiler to perform optimizations correctly:
- const
- volatile
Like the type modifiers and the storage classes, also qualifiers are added during the declaration of a variable, typically in front of the type.
const
This qualifier tells the compiler that a variable will be constant during the entire lifetime of the program. This opens to optimization such as moving this variable in the .ro session of the memory: it stands for read-only and typically it is mapped onto the Flash memory to spare RAM space.
/* This is a global constant variable. */ const int myvar = 10; /* This is a static constant variable. */ static const int myvar = 10; int myfunc() { /* This is an automatic constant variable. */ const int myvar = 10; return myvar; }
volatile
This qualifier tells the compiler the value of that variable can change at any time by something unrelated to the C code. The implications of such an occurrence are crucial: this tells the compiler to not do any assumption on that variable. For the purpose of shedding some light follow me in this last digression.
Let us say that at a specific memory address (e.g. 0x40000010
) is located a 32-bit secret register that allows us to unlock some special features of our machine. To unlock these features we need to write a secret sequence onto it (e.g. 0x10105555
, 0x11105522
, 0xFF505050
). Keen to unlock the full potential of our machine, we will promptly write
/* Pointer to an unsigned 32-bit. Initializing this to the memory address of the register, I may use the pointer to write into the register. */ uint32_t * mypointer = 0x40000010; int main() { /* Writing the sequence. */ *mypointer = 0x10105555; *mypointer = 0x11105522; *mypointer = 0xFF505050; }
On paper, that does exactly what we said. Unfortunately, if the compiler optimizations are enabled, this may not work. The compiler sees that we are writing three values in the same memory location without doing any operation in between so it skips the first two assignments and performs only the last one.
Here comes to play volatile
. The following code solves the issue
/* Pointer to a volatile unsigned 32-bit. */ volatile uint32_t * mypointer = 0x40000010; int main() { /* Writing the sequence. */ *mypointer = 0x10105555; *mypointer = 0x11105522; *mypointer = 0xFF505050; }
Adding the keyword volatile the compiler stops to do assumptions about the value assigned to that specific memory location. Consequently, all three assignments will be performed.
This example serves well in three possible real-life scenarios:
- Memory-mapped peripheral registers
- Global variables modified by an Interrupt Service Routine
- Global variables within a multi-threaded application
Some of your C-99 types in the table of stdint.h have typos. And the unsigned still range from 0 to 2^n -1, not 0 to 2^n .
Thanks Tim, I fixed it.
Great work. The sample code are given precisely to know the working of these variables. I hadn’t seen them before commenting on LinkedIn post though! But –
I would love to give a code example to understand the span and lifetime of variable!
code:
—————————————–
int a = 50; // no.1
void main()
{
int a; // no. 2
a = 100; //no. 3
{
int a;// no. 4
a = 200; //no. 5
printf(“%d”, a);
}
printf(“%d”, a);
}
—————————————–
try all combinations of commenting these “numbered” lines , you will understand the full working of variables,
Note: in one of the combinations, when the working is too wierd, try to search for “shadowing” of variables!
Thanks Rohit,
this is actually a great example. I slightly modified it for a more comprehensive example and added it here
Cool article! I will link to your article from within our own C-tutorial (see https://embeetle.com/#embedded-dev/c-tutorial)
Thanks Kristof for sharing it