C++ meta-programming part 1: introduction to pre-processor

This article is meant as introduction to pre-processor for C++ programmers. It requires some basic knowledge of types, function, structures and pointers as well as difference between definition and declaration.

Technology C++
Level Pre-intermediate
Time to read ~15 minutes

Compilation steps

First things first. What exactly is the C++ pre-processor? To answer that we must look into building process itself. Let see which steps will be performed when ordering your favorite gcc/clang/Visual Studio to build your executable from cpp file(s).

Before we go further, note that cpp files are just plain text items. Other files like makefiles, Visual Studio solutions, etc. are there just for your convenience. You can code c++ in notepad and invoke all tools from above chart yourself. Modern IDEs just make this process easier for you.

Going back to the picture: first step is to have the source files preprocessed. The application called preprocessor is going through each file and executing preprocessor directives. Directives are just lines starting with '#' character (like #include). Preprocessor is transforming text file into other text file that will be feed to actual compiler. We'll get into details shortly.

Next step is to do actual compilation. Compilator will translate text file that is understandable by you (which is source code), to something that is understandable by CPU (which is machine code). Compilator works on single source file at time and has absolutely no idea what to expect from other source files. Especially it does not know about variables, structures and anything declared in other source files (we call that separate passes as translation units)

Then the final step: linker. Linker will take each one of the obj files produced by compiler, and then merge them into final executable, filling all gaps across translation units. The result is a complete software.

Oh, and this is the reason why if you have name collision between different translation units, you'll know of them as linker error, not compiler error. Specificly, you won't get line for error, because line numbers were wiped out by the compiler at previous step.

What actually do the #include directive does?

Copy & Paste. Nothing more. It takes file located in #include path and paste it replacing the directive.

To explain the reason for this we need to get back to definition/declaration semantics. Each variable can only have one definition (which says: "Compiler, I need to allocate memory for this variable") but can have as many declaration as you need (which says: "Hey, compiler, there is a variable with that name elsewhere!").

If you did not define variable you'll get linker error ("Hey", the linker says, "You told me there is such variable, but it is nowhere to be found!"). If you define it twice, you'll get another linker error ("Dang, I've already fetch some memory for this variable, now what do I need to do with the second one?"). Both situations are fatal errors at the consolidation stage. The same story goes for functions (which also have separate declaration/definitions).

Scenario: you've written awesome clamping function, which will clamp the variable to given range. Lets declare it as: int clamp(int value, int min, int max). You want your buddy, Mark, to use it in his own module. So you write this declaration in the header file. Now you both include the header and both can use the function.

So you have three files:

header.h clamp.cpp mark.cpp
int clamp(int value,
	int min, int max);
#include "header.h"
int clamp(int value,
	int min, int max)
{
	if(value < min)
		return min;
	if(value > max)
		return max;
	return value;
}
#include "header.h"
#include <cstdio>
using namespace std;
int main()
{
    printf("Clamped: %d",
			clamp(5, 2, 3));
			
	// wait for pressing the key
	getchar(); 
}

After preprocessing you will have two files to be fed to compiler:

clamp.i mark.i
int clamp(int value,
	int min, int max); // Copy-pasted from header.h
	
int clamp(int value,
	int min, int max)
{
	if(value < min)
		return min;
	if(value > max)
		return max;
	return value;
}
int clamp(int value,
	int min, int max); // Copy-pasted from header.h
	
int printf ( const char * format, ... ); // Copy-pasted from cstdio
int getchar ( void );  // Copy-pasted from cstdio
/* A lot of other declarations
	copy-pasted from cstdio */
	
using namespace std;
int main()
{
	printf("Clamped: %d",
			clamp(5, 2, 3));
			
	// wait for pressing the key
	getchar(); 
}

Note the headers are no longer used for later stages of building software! And yes, you can have declaration just before definition in clamp.i, that's perfectly fine.

Why can't I just copy-paste declaration, and don't use preprocessor at all?

Good question... in fact you can! But imagine your product has 50+ files that use your awesome clamp function. And the boss comes to you telling you that from now on, you need to clamp variables of type float instead of int. So you have to waste a lot of time to carefully copy+paste everything. This is why having headers is just convenient for you; You just make a change in single place.

Stay tunned for part 2: What else the pre-processor can do? Macros!