matthiaserb.dev

Understanding C Compilation Steps

When we run a simple command like gcc program.c, the multi-stage process turning our source code into an executable is hidden from the user. Understanding what happens behind the scenes not only demystifies compilation but also helps debug complex issues and optimize code.

In this article, we’ll break down the compilation of a C program into its fundamental steps and show how to manually execute each one using GCC flags.

The Four Stages of Compilation

The compilation of C source code into an executable binary involves four distinct phases:

  1. Preprocessing (to final source code)
  2. Compilation (to human-readable assembly language)
  3. Assembly (to machine-readable object code)
  4. Linking (to executable binary)

Notice how “compilation”, depending on context, can refer to the entire process or just the second step of turning code into assembly. The terminology of steps 2 and 3 can be confusing, but this is not an error on the author’s side: The compilation step indeed turns source code into assembly (language), while the following assembly process (performed by the assembler) takes this assembly (language) and creates object code from it. We’ll go through it step by step using a simple example.

1. Preprocessing

All the preprocessor does is word processing on human-readable text, replacing characters, deleting some, adding others. It is controlled by directives beginning with #, such as #include, #define, and #ifdef and performs the following tasks:

Manually Running the Preprocessor

To get the output of the preprocessing stage, use the -E flag:

gcc -E program.c -o program.i

This command processes all directives and produces a .i file containing the preprocessed source code. Let’s look at a simple example:

#include <stdio.h>
#define MAX 100

int main() {
    /* Comments are removed by preprocessor */
    printf("Max value is: %d\n", MAX);
    return 0;
}

After preprocessing, the file will contain:

Omitting the hundreds of lines copied in from stdio.h, the end of the resulting program.i looks like this

int main() {

    printf("Max value is: %d\n", 100);
    return 0;
}

On my machine the file contains 821 lines, up from 8 of the original source code. If we include stdlib.h as well, it’s 1924 lines of code!

2. Compilation (to human-readable assembly language)

During this stage, the preprocessed code is translated into assembly language specific to the target processor architecture. The compiler performs:

Manually Running the Compiler

To stop after compilation and output assembly code:

gcc -S program.i -o program.s

The resulting .s file contains human-readable assembly instructions. Here’s a snippet from our simple example compiled for the x86-64 platform:

.LC0:
        .string "Max value is: %d\n"
        .text
        .globl  main
        .type   main, @function
main:
.LFB6:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $100, %esi
        leaq    .LC0(%rip), %rax
        movq    %rax, %rdi
        movl    $0, %eax
        call    printf@PLT
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

Notice how MAX was replaced with the literal value 100 in the line

        movl    $100, %esi

and the assembly instructions for calling printf were generated.

3. Assembly (to machine-readable object code)

The assembler converts assembly code into object code in the shape of .o object files.

Manually Running the Assembler

To generate object code from assembly:

gcc -c program.s -o program.o

The -c flag tells GCC to assemble, but stop before linking. The resulting .o file is in binary format and no longer human-readable. It holds object code that reflects the assembled instructions and data from the source code, but isn’t yet a complete, runnable program.

4. Linking

The final stage combines one or more object files with library code to produce an executable. The linker:

Manually Running the Linker

To link object files into an executable:

gcc program.o -o program

If your program uses external libraries, you’ll need to specify them here

gcc program.o -o program -lm  # Links with the math library

with the exception of some common libraries that GCC links by default, most prominently libc (the C standard library).

Putting It All Together

gcc -E program.c -o program.i # Preprocessor
gcc -S program.i -o program.s # Compiler
gcc -c program.s -o program.o # Assembler
gcc program.o -o program # Linker

visualization of compilation steps: Preprocessing, Compilation, Assembly, Linking

GCC detects whether previous steps have already been performed and if not, automatically does so. For example, we can directly generate assembly from the source code like this

gcc -S program.c -o program.s

without explicitly running the preprocessing step.

Conclusion

While modern IDEs and build systems often hide the compilation steps, understanding this process is crucial for becoming a well-rounded C programmer. Next time you encounter a cryptic compiler or linker error, hopefully these insights help you find the root cause.