Thursday, February 20, 2025

Discover the Secrets of How the C Programming Language Compiler Works, and Learn How Your Code Becomes Executable

A compiler is a software tool used to translate source code written in a programming language into executable code that can be run on a computer. This is a process that involves several stages, including lexical analysis, syntax analysis, and semantic analysis of the source code. The compiler then generates executable code that is ready to run. While simple language can be used to explain compilers, understanding their operation and translation process can be essential for software engineers and programmers who write software. That's why we separated this topic from the previous post, see here; and gave it special significance.

The first good question you would probably have for us is, in what program is the compiler for the C programming language written? If we go back a little in history, we know that older computers mostly used assembly language, while higher-level programming languages began to develop when the benefits of reusing software on different processors increased. The first higher-level programming language, Plankalkül, was proposed as early as 1943. Since then, several experimental compilers have been developed. Fortran's team, led by John Backus of IBM, introduced the first complete compiler in 1957. Since then, compilers have become increasingly complex as computer architectures have evolved.

Today, it is common practice to implement compilers in the same language that is being compiled. Therefore, it is assumed that the compiler of the C programming language is coded in the C programming language, as for example all .Net programming languages have an open-source Microsoft compiler called Roslyn which is written in the C# programming language. However, to create the first C compiler, its creator Dennis Ritchie used the previous programming language B, which was developed by Ken Thompson.

Compiling files written in the C programming language is not uncommon even in the most modern corporations

Compiling files written in the C programming language is not uncommon even in the most modern corporations

Dennis Ritchie later expanded the B programming language and created the C programming language, so the original C compiler was also written in B. We mostly use GCC compiler version 14 or newer, this text was written in 2025; and there is no theoretical chance that you will find any command in the B programming language in it. Because the GCC compiler does not support the B programming language, it considers it too obsolete. But we know that the GCC compiler is written in a combination of the C and C++ programming languages, with the possibility that it may also contain some parts written in other programming languages such as Objective-C and some newer ones.

When you flawlessly write C code in any text editor and create a text file, you can call the C compiler to translate it into machine code so that your program can run. The compiler runs a translator or translation unit, known as a Translation Unit, which consists of the source file and header files that are referenced using #include directives. If your code is correct, the translator creates an Object File, which we recognize by the. o or .obj suffix, and we call such object files modules. The standard library of the C programming language contains translated object files in machine language, which allows faster access to standard functions that we call in our programs.

It is important to note that when we say that a file is translated into machine language in the C programming language, it is first translated into assembly programming language in a temporary file, which is then translated into machine language, after which the temporary file is deleted. When compiling a program, we recognize such a file by the . s suffix. The translator separately translates each source file with all the header files it contains into separate object files, i.e., modules. The translator then calls the Linker, which combines all object files and all used functions from the library into an Executable File. Do not confuse this process with .Net technology. In .Net technology and the C# programming language, things are different.

How the C Programming Language 'Understands' Your Code: A Journey Through the Compilation Stages

Let's remember, a compiler is a software tool that translates program code written in a high-level language, such as C into machine code that a processor can execute. The main functions of a compiler are:
  • Checking the syntax and semantics of the code.
  • Translating from source code to intermediate code.
  • Optimizing the code for better efficiency.
  • Generating machine code specific to the processor and operating system.
The most well-known compilers for the C language are:
  • GCC - GNU Compiler Collection
  • Clang
  • MSVC - Microsoft Visual C++
  • Intel C Compiler
The compilation process of a C program can be broken down into eight phases in detail. Although it is common to talk about 4-5 main steps, namely preprocessing, analysis, code generation, optimization and linking, the complete process actually goes through the following 8 phases. Each stage plays a key role in the transformation of the code, and optimization and linking especially affect the final efficiency of the program.

1. Preprocessing

Processing directives (#include, #define, #ifdef, etc.).

Inserting the contents of header files (#include).

Replacing macros (#define).

Removing comments.

Conditional compilation (#ifdef, #endif).


2. Lexical Analysis

The source code is divided into the smallest logical units - tokens (keywords, identifiers, numbers, operators).

For example, the code:

C

int x = 10;

is broken down into tokens: int, x, =, 10, ;.


3. Syntax Analysis (Parsing)

  • AST - Abstract Syntax Tree is generated, which shows the structure of the code.
  • Checks the validity of the syntax (e.g., whether if is correctly written).
  • If there are syntax errors, the compiler stops the process and reports errors.
The C compilation process from object files to executable file

The C compilation process from object files to executable file

4. Semantic Analysis


Checks the meaning of the code: 

Are the data types compatible?

Are variables properly declared before use?

Is there a name conflict in the scope of variables? 

This phase prevents logical errors, such as using a float variable in a switch statement.

5. Intermediate Code Generation

The compiler creates intermediate code, which is not directly dependent on the processor architecture.
This intermediate code can be in the form of three-address instructions or SSA - Static Single Assignment form.

Example of intermediate code:

Code snippet

MOV R1, #10

STR R1, x


6. Code Optimization

Improving code performance by removing redundant instructions. Optimization techniques include:

  • Dead code elimination (code that is never executed).
  • Loop optimization (Loop Unrolling).
  • Function inlining (Inline Expansion).
  • Reduction of redundant expressions.
7. Machine Code Generation

Transformation of optimized intermediate code into machine code specific to the processor.

For example, for x = x + 5; on the x86 architecture, it might look like this:

Code snippet 
 
ADD EAX, 5

8. Linking
  • Linking object code (.o or .obj files) with libraries (libc, math.h, stdio.h, etc.).
  • Creating the final executable file (.exe or binary file).
There are two types of linking:
  • Static (all libraries are included in the executable file).
  • Dynamic (the program uses external *.so or *.dll files).
The compilation process in modern programming languages

The compilation process in modern programming languages

Compiling C Programs in Practice


All of this may seem complicated to you theoretically, but in practice everything is simple. Just keep in mind that if you compiled a C program into an *.exe file in the Linux operating system, it will not work on the Windows operating system. However, if you really have such a need, it is enough to install MinGW-64 and use a different command:

sudo apt-get install mingw-w64

x86_64-w64-mingw32-gcc -o program.exe program.c

These commands will compile your file to work on the Windows operating system, but then the file will not work on the Linux operating system due to differences in operating systems and their binary formats. Run your terminal and enter the following commands.

manuel@manuel-virtual-machine:~$ sudo apt-get update

manuel@manuel-virtual-machine:~$ sudo apt-get upgrade

manuel@manuel-virtual-machine:~$ clear

manuel@manuel-virtual-machine:~$ ls

manuel@manuel-virtual-machine:~$ cd tutorials

manuel@manuel-virtual-machine:/tutorials$ ls

manuel@manuel-virtual-machine:/tutorials$ cd c_tutorial

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ nano program.c

Write the following C code into a file.

#include <stdio.h>

int main() {

    printf("Hello, World!\n");

    return 0;

}

Then, compile the program.c file.

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ cat program.c

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ clear

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ gcc program.c -o program

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ ls

manuel@manuel-virtual-machine:/tutorials/c_tutorial$ ./program

You will get the following result.

Hello, World!

You can also see the same example of compiling a file with C code in the following video.


C Tutorial - 2. How to Compile a Text File Written in the C Programming Language?

 

 

 

 

  

 

 

 

No comments:

Post a Comment