Demystifying C: Pointers

Published onJanuary 9, 2023 Updated onJanuary 17, 2024 Posted by Rocco Marco Guglielmi CategoryGeneral knowledge - Advanced concepts

Introduction

In our last article in the series Demystifying C, we discussed the various types available in the C programming language. In this article, we’ll build on that foundation by exploring a concept that can be intimidating for C-newbies: pointers

Pointers can be a daunting concept for those new to the C programming language, but they are a very powerful tool that can greatly improve the efficiency of your code. In this article, we will demystify pointers and explain how they work, why they are useful, and how to use them in your C programs.

Whether you are a beginner looking to improve your understanding of pointers or an experienced programmer looking to refresh your knowledge, this article has something for you.

So let us get started on demystifying pointers in C!

What is a pointer

A pointer is a particular type of variable in C that stores the memory address of another variable (as its name suggests a pointer points to a memory address). They are commonly used to manipulate data stored in arrays and structures, as well as to pass large blocks of data to functions more efficiently.

Some typical use cases for pointers include:

Accessing elements of an array.
Implementing dynamic memory allocation.
Building complex data structures.

Declaration, initialization and access

Pointers can have different types depending on the type of variable they point to. For example, a pointer to an int can only store the memory address of an int variable, and a pointer to a double can only store the memory address of a double variable.

To declare a pointer with a specific type in C, we can use the * symbol followed by the name of the pointer and the type of the variable it points to as shown below

/* Declaration of a pointer to an integer. */
int* mypointer

/* Declaration of a pointer to a char. */
char* mypointer

/* Declaration of a pointer to a double. */
double* mypointer

Let us now consider the following code:

int myvar = 42;
int* mypointer = &myvar;

In the first line, we just declared myvar, a regular integer variable that stores the value 42 (often referred to as the right value of myvar). This variable is stored in memory at a specific location, which in this example we will assume to be the address 0x40000080 (often referred to as the left value of myvar).

In the second line, we declare an integer pointer called mypointer and we initialize it with the address of the myvar variable. We can hence notice that:

The & operator, also known as the “address of” operator, is a unary operator that returns the memory address of a given variable
Initializing a pointer involves assigning it a memory address.

It is worth noting that, when the memory address of a variable is known, it is possible to initialize a pointer by assigning it a constant value that represents that address

int* mypointer = 0x40000080;

Now that mypointer contains the memory address of myvar, it can be used to access the value stored in myvar through the use of the * operator.

#include <stdio.h>

int main(void) {
  int myvar = 42;
  int* mypointer = &myvar;

  printf("The value of myvar is: %d\n", *mypointer);

  return 0;
}

that will result in

The value of myvar is: 42

Size and type

As mentioned previously, a pointer is a variable specifically designed to store a memory address. It should come as no surprise that pointers of different types always have the same size. The size of a pointer depends on the architecture of the machine on which the program is running. On most modern systems, pointers are typically 4 bytes (32 bits) on a 32-bit system and 8 bytes (64 bits) on a 64-bit system.

An easy way to check the size of a pointer on the system in use is to exploit sizeof, a unary compile-time operator that returns the size, in bytes, of a given data type or expression.

#include <stdio.h>

int main(void) {

  printf("Size of int*: %lu\n", sizeof(int*));
  printf("Size of double*: %lu\n", sizeof(double*));

  return 0;
}

On a 32-bit architecture, this would result in

Size of int*: 4
Size of double*: 4

So, what is then the purpose of specifying a type while declaring a pointer?

In C, the type of a pointer is important because it helps the compiler check for errors and generate correct code for accessing and manipulating the value the pointer points to. This is related to how the data is stored in memory. In modern architectures, each memory address refers to a cell of memory that is 8 bits (1 byte) in size.

Let us do an example to clear up any doubts.

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint32_t myvar = 0x12345678;
  
  printf("Memory address of myvar: 0x%X\n", &myvar);
  return 0;
}

and let us say that this results in

Memory address of myvar: 0x40001000

What would the memory look like? The answer is

Memory address	Element Value
0x40001000	0x78
0x40001001	0x56
0x40001002	0x34
0x40001003	0x12

Memory layout of a uint32_t variable stored at the memory address 0x40001000 with little-endian byte order

In this hypothetical example, the uint32_t variable myvar occupies 4 consecutive memory addresses (4 bytes). The value of myvar, which is 0x12345678, is stored in little-endian order, with the least significant byte (LSB) at the lowest memory address (0x40001000) and the most significant byte (MSB) at the highest memory address (0x40001003).

This should give the astute reader an idea of where we are heading. In any case, let us extend our previous example.

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint32_t myvar = 0x12345678;
  uint32_t* mypointer = &myvar;
  
  printf("Memory address of myvar: 0x%X\n", &myvar);
  printf("Value of myvar: 0x%X\n", *mypointer);
  return 0;
}

that will print out

Memory address of myvar: 0x40001000
Value of myvar: 0x12345678

The pointer type is crucial for the compiler to interpret the expression *mypointer, as it determines that 4 contiguous memory cells should be loaded when dereferencing the pointer.

We will see in the next section that the type of the pointer is also crucial for pointer arithmetic.

Pointer arithmetic

Pointer arithmetic is a type of arithmetic that is specific to pointers in C programming. It involves performing arithmetic operations on pointers, such as incrementing ++, decrementing --, adding +, and subtracting -, to manipulate and access contiguous memory areas. The astute reader may have already deduced that pointer arithmetic is particularly well-suited for handling arrays or structures.

When used with pointers, these operators behave differently than when used with other types of operands. For example, adding a value to a pointer does not perform a mathematical addition, but rather moves the pointer by a certain number of elements of the data type to which the pointer points.

Let us dive into some examples.

A simple increment

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint16_t myvar = 0x1234;
  uint16_t *mypointer = &myvar;

  /* print the address of myvar using the mypointer */
  printf("mypointer = 0x%X\n", mypointer);
  /* increment the pointer */
  mypointer++;
  /* print the incremented address */
  printf("mypointer = 0x%X\n", mypointer);

  return 0;
}

If myvar would be located at 0x40001000 the result of this code would be

mypointer = 0x40001000
mypointer = 0x40001002

This program declares a uint16_t type variable myvar initialized to 0x1234, and a pointer mypointer to a uint16_t pointing to the address of myvar. It then prints the value of mypointer, increments it using the ++ operator, and it prints the pointer again.

In this specific case, the value of mypointer is moved to the contiguous memory address considering the type of myvar is uint16_t (2 Bytes).

Type matters

Let us extend the concept of the previous example by changing the type

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint32_t myvar = 0x12345678;
  uint32_t *mypointer = &myvar;

  /* print the address of myvar using the mypointer */
  printf("mypointer = 0x%X\n", mypointer);
  /* increment the pointer */
  mypointer++;
  /* print the incremented address */
  printf("mypointer = 0x%X\n", mypointer);

  return 0;
}

assuming again that myvar is located at 0x40001000 the result of this code is

mypointer = 0x40001000
mypointer = 0x40001004

The Art of Back-and-Forth Navigation

Let us consider this final example about pointer arithmetics

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint32_t myvar = 0x12345678;
  uint32_t *mypointer = &myvar;

  /* print the address of myvar using the mypointer */
  printf("mypointer = 0x%X\n", mypointer);
  /* add 4 to the pointer, which moves it 4 uint32_t addresses ahead */
  mypointer += 4;
  /* print the incremented address */
  printf("mypointer = 0x%X\n", mypointer);
  /* decrement the pointer */
  mypointer--;
  /* print the incremented address */
  printf("mypointer = 0x%X\n", mypointer);
  return 0;
}

in this case, with the usual assumption about myvar address, the result will be

mypointer = 0x40001000
mypointer = 0x40001010
mypointer = 0x4000100C

In other words the operation mypointer += 4; incremented the initial value of mypointer of 4 * sizeof(uint32_t) and this makes the jump of 0x10. The decrement operation moves the pointer back by 4 bytes.

Pointer and arrays

Pointers and arrays are fundamental concepts in C programming, and they are often used in conjunction with one another. In this chapter, we will compare and contrast pointer arithmetic and array indexing and will learn how to use these techniques to access and manipulate memory addresses and array elements. We will also see how pointers can be used to dynamically allocate memory for variables, including arrays, and how they can be used to manipulate strings in C.

Introducing arrays

An array is a data structure in C that allows us to store and access a fixed-size sequenced collection of elements of the same data type. Arrays are stored in contiguous blocks of memory, with each element of the array occupying a specific memory location.

Let us consider the following code

uint16_t myarray[5] = {1, 2, 3, 4, 5};

In this example, we have declared an array myarray of five uint16_t elements and we have initialized the elements of the array with the values 1, 2, 3, 4, and 5.

We can access the elements of the array using indices, with the first element of the array having an index of 0 and the last element having an index of n - 1, where n is the size of the array. For example, accessing the third element of the array will look like

uint16_t third_element = myarray[2];

But how it looks like the memory layout of an array? Let us assume that the elements of myarray are stored in memory starting at the address 0x40001000. Then each element will occupy a contiguous block of memory as shown below:

Memory Address	Element Value
0x40001000	1
0x40001002	2
0x40001004	3
0x40001006	4
0x40001008	5

Memory layout of an array of uint16_t elements

In C, an array name is automatically converted to a pointer to the first element of the array when it is used in an expression. This means that we can use the array name myarray as a pointer to access and manipulate the elements of the array. For example, if we print out the value of myarray:

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint16_t myarray[5] = {1, 2, 3, 4, 5};
  printf("Memory address of myarray: 0x%X\n", myarray);
  
  return 0;
}

The result will be

Memory address of myarray: 0x40001000

Anyway, the array name myarray can be considered a “pseudo-pointer” in the sense that it behaves like a pointer in many contexts, but it is not a true pointer variable and cannot be modified like a normal one: the array name is a constant expression. This means that you cannot change the value of the array name itself, as it is a fixed memory address.

Exploiting pointers with arrays

Building on this introduction to arrays, let us now explore some examples of using pointers with arrays.

Dereferencing a pointer to retrieve array elements

We can use the * operator to dereference the pointer and access the value of the first element of the array as follows:

uint16_t first_element = *myarray;

Similarly, it is possible to use pointer algebra to access a generic element of the array. For example, the following code

uint16_t third_element = *(myarray + 2);

behaves exactly like

uint16_t third_element = myarray[2];

Traversing an array using a pointer

If we would print the content of myarray, most likely we would write

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint16_t myarray[5] = {1, 2, 3, 4, 5};

  printf("Elements of myarray: ");
  for (int i = 0; i < 5; i++) {
    printf("%d ", myarray[i]);
  }
  printf("\n");
  return 0;
}

the result of this code is

Elements of myarray: 1 2 3 4 5

It is possible to perform the same operation using the array pseudo pointer and the pointer arithmetic

#include <stdio.h>
#include <stdint.h>

int main(void) {
  uint16_t myarray[5] = {1, 2, 3, 4, 5};

  printf("Elements of myarray: ");
  for (int i = 0; i < 5; i++) {
    printf("%d ", *(myarray + i));
  }
  printf("\n");
  return 0;
}

Pointer and functions

In C when a function is called, a block of memory called Stack Frame is created to store the function’s local variables and arguments. The stack frame is created on the top of the call stack, and it is destroyed when the function returns. This allows the function to have its own set of variables that are separate from the variables in the calling function.

The take out from this statement is that, in C, arguments are passed to functions by value (call-by-value), which means that the function receives a copy of the argument: any changes that the function makes to the argument will not affect the original value of the argument in the calling function. For example the following code

#include <stdio.h>

void increment(int x) {
  x++;
}

int main() {
  int myvar = 5;
  increment(myvar);
  printf("The value of myvar is %d\n", myvar);

  return 0;
}

will result in

The value of myvar is 5

This because x is a copy of myvar and any change in increment() is not changing the value of myvar

The result of the following code will clear any doubts

#include <stdio.h>

void increment(int x) {
  printf("The address of x is 0x%X\r\n", &x);
}

int main() {
  int myvar = 5;
  increment(myvar);
  printf("The address of myvar is 0x%X\r\n", &myvar);

  return 0;
}

The address of x is 0x4000DF1C
The address of myvar is 0x4000AD34

Modify variables in a function

In this specific scenario, pointers play a crucial role. Using pointers it is possible to simulate a call-by-reference behavior. To operate on the original value within the function increment, the solution is to pass a pointer to myvar and dereference the pointer inside the function.

#include <stdio.h>

void increment(int *x) {
  (*x)++;
}

int main() {
  int myvar = 5;
  increment(&myvar);
  printf("The value of myvar is %d\n", myvar);

  return 0;
}

that will now result in

The value of myvar is 6

Returning multiple values from a function

Similarly, we can use pointers to return multiple values from a function by modifying the values of the variables that the pointers point to. A simple example would be

#include <stdio.h>

void divide(int a, int *quotient, int *remainder) {
  *quotient = a / 3;
  *remainder = a % 3;
}

int main() {
  int x = 10;
  int quotient;
  int remainder;

  divide(x, &quotient, &remainder);
  printf("Quotient: %d, Remainder: %d\n", quotient, remainder);

  return 0;
}

Quotient: 3, Remainder: 1

Passing large amounts of data more efficiently

When you pass a large amount of data to a function, it can be more efficient to pass a pointer to the data rather than making a copy of the data. This is because making a copy of the data can be time-consuming and use up a lot of memory. A simple example would be

#include <stdio.h>

struct Data {
  int *array;
  size_t size;
};

void print_sum(struct Data *data) {
  int sum = 0;
  for (size_t i = 0; i < data->size; i++) {
    sum += data->array[i];
  }
  printf("Sum: %d\n", sum);
}

int main() {
  int large_array[1000];
  for (size_t i = 0; i < 1000; i++) {
    large_array[i] = i;
  }

  struct Data data = {large_array, 1000};
  print_sum(&data);

  return 0;
}

Sum: 499500

Passing a pointer of an unknown type to a function

There can be situations where a function or method needs to accept a pointer to any type of object without specifying the actual type of the object. This is often done for abstraction purposes, where the type of the object is not important for the function to perform its task. In such cases, it is possible to use a “pointer to void” or void*. This mechanism utilizes the fact that all pointers have the same size, regardless of the type of object they point to. The type of object is only needed for accessing the data structure or performing pointer arithmetic.

The following example explains how this mechanism can be used

#include <stdio.h>

void print_value(void* value, int type) {
  if(type == 1) {
    int *int_value = (int*) value;
    printf("Integer value: %d\n", *int_value);
  } 
  else if(type == 2) {
    float *float_value = (float*) value;
    printf("Float value: %f\n", *float_value);
  }
}

int main() {
  int int_var = 5;
  float float_var = 3.14;
  print_value(&int_var, 1);
  print_value(&float_var, 2);
  return 0;
}

In this case, the function print_value receives the value as a pointer to void and then depending on the type the pointer is cast and used. While this example may look silly, imagine that such a function is part of a library and everything will fall into place. When defining a function at the library level sometimes is not possible to know the type of data structure the function will operate on and can also be that knowing it doesn’t make actually any difference.

In this example, the function print_value accepts a pointer of type void* as an argument, which can point to any type of object. Depending on the type argument passed to the function, the void* pointer is cast to the appropriate type and used to access the value. While this example may seem simple, it demonstrates how a function at the library level can be designed to operate on any type of object without specifying the actual type of the object. This approach can be useful when it is not possible to know the type of data structure that the function will operate on, or when the specific type is not important for the function to perform its task.

To clarify this let us now consider the following example

#include <string.h>
#include <stdio.h>

int main() {
  char src[] = "Hello World!";
  char dest[20];
  
  memcpy(dest, src, sizeof(src));
  printf("Copied string: %s\n", dest);
  
  return 0;
}

and the prototype of the memcpy function

void* memcpy(void* dest, const void* src, size_t n);

This is a standard C library function that is used to copy a block of memory (n bytes) from a source location (src) to a destination location(dest). The function can be used to copy arrays, structures, or any other type of memory block using pointers to void.

Pointer and Dynamic allocation

Pointers are key in dynamic memory allocation because they are how the allocated memory is accessed. When we allocate memory dynamically, we are requesting a block of memory from the operating system at runtime. The operating system then sets aside a block of memory for our program and returns a pointer to the first byte of that block. We can then use the pointer to access and manipulate the memory block as needed.

malloc is a standard C function that is used to dynamically allocate memory. It takes a single argument, which is the number of bytes of memory to allocate and returns a pointer to the allocated memory or NULL if the memory allocation fails.

The following example shows how to use the malloc function to dynamically allocate memory for an array of integers:

#include <stdio.h>
#include <stdlib.h>

int main() {
  int n = 10;
  /* Allocating memory for an array of 10 integers. */
  int *array = malloc(n * sizeof(int));
  
  /* As the malloc can fail returning NULL, it is good practice 
     to check the operation result. */
  if (array == NULL) {
    printf("malloc failed\n");
    exit(1);
  }

  /* Using the array as needed. Note I may have used
     also the pointer arithmetic. */
  for (int i = 0; i < n; i++) {
    array[i] = i * 2;
  }

  for (int i = 0; i < n; i++) {
    printf("%d ", array[i]);
  }
  printf("\n");

  /* Freeing the memory once done with it. */
  free(array);

  return 0;
}

0 2 4 6 8 10 12 14 16 18

Common mistakes when working with pointers

In this final chapter, we will explore the most common mistakes made when working with pointers and how to avoid them for seamless programming.

Dereferencing a null pointer

This occurs when we try to access the value at a memory address that is NULL. To avoid this mistake, we should always check that a pointer is non-null before dereferencing it.

msg_t myfunct (int32_t* prt) {
  if(prt == NULL) {
    return MSG_ERROR;
  }
  
  /* Normal function execution. */   
}

Accessing memory out of bounds (aka Overflow)

This occurs when you try to access memory that is outside the bounds of the memory block that we have allocated. This problem may result in a segmentation fault: as the program tries to read or write to a memory location that it does not have permission to access it gets terminated with an error message. While this can be frustrating for the programmer, as it can be difficult to track down the cause of the error, faults of this type are typically the less dangerous. In some other cases, an overflow may result in unexpected behavior that may occur at a random moment in time: this category of errors may be more problematic as they may go unnoticed for a long time.

Good practices that can alleviate this pain are:

Allocate enough memory to store all necessary data.
Use valid array indices when working with arrays remembering that array indices start at 0 and go up to size – 1.
Use appropriate loop limits especially when loops are iterating over the elements of an array.
Check the return value of functions that allocate memory (such as malloc, calloc, or realloc). If the function returns NULL, it means that the memory allocation has failed and the memory should not be accessed.

Confusing operators

Confusing the address-of operator & and the dereference operator * are different. A common mistake is to use the wrong operator in the wrong context, such as using address-of trying to get the value of a variable that a pointer points to, or using dereference to get the address of a variable.

Another common mistake is forgetting to use one of the operators when it is needed. A good practice here is to always look into your warnings while compiling your code.

Memory and memory fragmentation

To conclude, these are special ones, maybe more related to dynamic allocation than to pointers but I will mention them anyway.

A memory leak occurs when a program allocates memory dynamically but fails to properly deallocate it when it is no longer needed. This can lead to reduced performance, as the system must continuously allocate blocks of memory to keep up with the demands of the program.

In addition, if a program has a large number of memory leaks, it can eventually run out of memory, which can cause the program to crash. To avoid these issues, it is important to carefully manage the allocation and deallocation of memory in a program and to make sure to free memory when it is no longer needed.

Memory fragmentation occurs when the blocks of memory in a computer’s RAM are not contiguous. This can happen when a program dynamically allocates and deallocates blocks of memory. Over time, chunks of free memory are created here and there in the RAM. When the system needs to allocate memory for a new program or task, it will need to get contiguous space. If the system is limited in resources, it may happen that there is enough free memory but not enough contiguous memory to carry on the allocation.

This will result in a catastrophic failure. This problem is typically solved using memory pools or memory defragmentation. This issue impacts specifically embedded systems, where having a dedicated task for memory defragmentation may suck up too many resources. For this reason, schedulers that run on these machines often rely almost completely on static allocation.