Understanding C Compilation Steps
When we run a simple command like gcc program.c
, the multi-stage process turning our source code into an executable is hidden from the user. Understanding what happens behind the scenes not only demystifies compilation but also helps debug complex issues and optimize code.
In this article, we’ll break down the compilation of a C program into its fundamental steps and show how to manually execute each one using GCC flags.
The Four Stages of Compilation#
The compilation of C source code into an executable binary involves four distinct phases:
- Preprocessing (to final source code)
- Compilation (to human-readable assembly language)
- Assembly (to machine-readable object code)
- Linking (to executable binary)
Notice how “compilation”, depending on context, can refer to the entire process or just the second step of turning code into assembly. The terminology of steps 2 and 3 can be confusing, but this is not an error on the author’s side: The compilation step indeed turns source code into assembly (language), while the following assembly process (performed by the assembler) takes this assembly (language) and creates object code from it. We’ll go through it step by step using a simple example.
1. Preprocessing#
All the preprocessor does is word processing on human-readable text, replacing characters, deleting some, adding others. It is controlled by directives beginning with #
, such as #include
, #define
, and #ifdef
and performs the following tasks:
- Includes header files
- Expands macros
- Handles conditional compilation
- Removes comments
Manually Running the Preprocessor#
To get the output of the preprocessing stage, use the -E
flag:
gcc -E program.c -o program.i
This command processes all directives and produces a .i
file containing the preprocessed source code. Let’s look at a simple example:
#include <stdio.h>
#define MAX 100
int main() {
/* Comments are removed by preprocessor */
printf("Max value is: %d\n", MAX);
return 0;
}
After preprocessing, the file will contain:
- All the code from
stdio.h
- The literal value
100
replacingMAX
- The original code with comments removed
Omitting the hundreds of lines copied in from stdio.h
, the end of the resulting program.i
looks like this
int main() {
printf("Max value is: %d\n", 100);
return 0;
}
On my machine the file contains 821 lines, up from 8 of the original source code. If we include stdlib.h
as well, it’s 1924 lines of code!
2. Compilation (to human-readable assembly language)#
During this stage, the preprocessed code is translated into assembly language specific to the target processor architecture. The compiler performs:
- Syntax checking
- Type checking
- Optimization (if enabled)
- Generation of assembly code
Manually Running the Compiler#
To stop after compilation and output assembly code:
gcc -S program.i -o program.s
The resulting .s
file contains human-readable assembly instructions. Here’s a snippet from our simple example compiled for the x86-64 platform:
.LC0:
.string "Max value is: %d\n"
.text
.globl main
.type main, @function
main:
.LFB6:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $100, %esi
leaq .LC0(%rip), %rax
movq %rax, %rdi
movl $0, %eax
call printf@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
Notice how MAX
was replaced with the literal value 100 in the line
movl $100, %esi
and the assembly instructions for calling printf were generated.
3. Assembly (to machine-readable object code)#
The assembler converts assembly code into object code in the shape of .o
object files.
Manually Running the Assembler#
To generate object code from assembly:
gcc -c program.s -o program.o
The -c
flag tells GCC to assemble, but stop before linking. The resulting .o
file is in binary format and no longer human-readable. It holds object code that reflects the assembled instructions and data from the source code, but isn’t yet a complete, runnable program.
4. Linking#
The final stage combines one or more object files with library code to produce an executable. The linker:
- Resolves external symbol references
- Combines multiple object files
- Incorporates library code
- Sets up program entry points
Manually Running the Linker#
To link object files into an executable:
gcc program.o -o program
If your program uses external libraries, you’ll need to specify them here
gcc program.o -o program -lm # Links with the math library
with the exception of some common libraries that GCC links by default, most prominently libc (the C standard library).
Putting It All Together#
gcc -E program.c -o program.i # Preprocessor
gcc -S program.i -o program.s # Compiler
gcc -c program.s -o program.o # Assembler
gcc program.o -o program # Linker
GCC detects whether previous steps have already been performed and if not, automatically does so. For example, we can directly generate assembly from the source code like this
gcc -S program.c -o program.s
without explicitly running the preprocessing step.
Conclusion#
While modern IDEs and build systems often hide the compilation steps, understanding this process is crucial for becoming a well-rounded C programmer. Next time you encounter a cryptic compiler or linker error, hopefully these insights help you find the root cause.