MANUEL RADOVANOVIĆ : Understanding Primitive Data Types in the C Programming Language

C is a statically typed programming language, which means that every variable must have a clearly defined type. Data types in C determine the amount of memory that a variable occupies, as well as the operations that can be performed on it. Simply put, in the C programming language, data types refer to things that are separate entities in other programming languages. That's why we'll learn them all at once. Just keep in mind that many things in the C programming language are limited and don't even have what the C++ programming language has, let alone what we can tell you about other more modern programming languages. There are four main categories of data types in C:

Basic or primitive data types
Certain type modifiers
Complex data types
User-defined types
Primitive types in C are simple, close to the hardware, and flexible, but less standardized compared to modern languages. C++ extends them with better tools e.g. bool, <cstdint>, while languages like Java and Python introduce a higher level of abstraction and standardization. If you're programming in C, it's important to understand the platform you're working on because it affects the behavior of these types.

Data types in the C programming language work differently compared to more modern programming languages

Let's take a look at the first category of data types. Here's an overview of the basic or primitive data types in C:

Integers - Integer Types:

int - Used for whole numbers (e.g. 5, -10, 42). The size depends on the system architecture (usually 4 bytes on 32-bit or 64-bit systems).
short - A shorter integer (usually 2 bytes).
long - A longer integer (4 or 8 bytes, depending on the system).
long long - An even longer integer (at least 8 bytes, introduced in C99).

Modifiers:

signed - Allows positive and negative values (the default for int).
unsigned - Allows only non-negative values, which doubles the positive range (e.g. unsigned int).

Floating-Point Numbers - Floating-Point Types:

float - Single precision (usually 4 bytes), for decimal numbers (e.g. 3.14).
double - Double precision (usually 8 bytes), for greater accuracy of decimal numbers.
long double - Extended precision (size depends on the system, often 10, 12 or 16 bytes).

Characters - Character Type:

char - A single character (e.g. 'A', 'b') or a small integer (1 byte).
signed char - From -128 to 127.
unsigned char - From 0 to 255.

Logical Type - Boolean:

In standard C before C99, there was no special type, but since C99, _Bool is introduced, 0 for false, 1 for true. With the <stdbool.h> library, bool can be used as an alias for _Bool, along with true and false.

Empty Type - Void:

void - Indicates the absence of a type. It is used in functions that do not return a value or in pointers to an undefined data type void*.

Why Are Primitive Types in C Close to Hardware and How Does This Differ from Other Languages?

Primitive data types in C are very similar to those in other languages like C++, but there are subtle differences in implementation, flexibility, and additional types that other languages introduce. Here is a comparison with C++ and some other popular languages:

C vs. C++

C++ inherits all primitive types from the C language and is fully compatible with them, but adds some improvements and differences:

Logical type:

In the C language, before C99, there was no native logical type - int was used 0 for false, non-zero for true. Since C99, _Bool exists.
In C++, bool is a native primitive type from the very beginning, with values true and false, and does not require a special library.

Size and range:

In C and C++, the size of types int, long, etc. depends on the compiler and platform. However, C++ introduces the <cstdint> library with fixed types such as int32_t, uint64_t, etc., which guarantee the exact size e.g. 32 bits for int32_t, which is useful for portability. In C, these types were added only in C99 via <stdint.h>.

Default modifiers:

In both languages, int is signed by default, but C++ programmers more often use explicit modifiers for clarity and modern practices.

Flexibility:

C++ extends C types through objects and classes, which are not primitive, but primitive types remain identical in behavior and use.

C vs. Java

Java has a completely different approach to primitive types:

Fixed sizes:

In Java, the sizes of types are fixed and do not depend on the platform, e.g. int is always 32 bits, long 64 bits, while in C sizes are variable and depend on the architecture.

Types:

Java has byte (8 bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits), char (16 bits, for Unicode), and boolean (logical type). There are no unsigned versions of types, which is a big difference compared to C.

C does not have a native byte type and uses char for similar purposes, and char is 8 bits, in Java it is 16 bits due to Unicode support.

Void:

In Java, void exists only as a function return type, while in C it can also be used for pointers void*.

C vs. Python

Python does not have classic primitive types in the sense of the C language because it is a high-level language:

Dynamic typing:

In Python, there are no explicit type declarations as in C (int x = 5;). Types like int, float, and str are objects, and the size of int is not limited, supports arbitrarily large numbers.

Lack of low-level types:

Python has no equivalents for short, long, or unsigned because it abstracts hardware details.

Data types in C programming language

Practical Example: Managing Student Data

All of this may sound very complicated, but through practical examples and pieces of C code, you will understand the essence. This program stores and displays information about a student:

student ID - integer
average grade - floating-point
initials - characters
enrollment status – boolean

Open your terminal and type the following code.

manuel@manuel-virtual-machine:~$ sudo apt-get update
manuel@manuel-virtual-machine:~$ sudo apt-get upgrade
manuel@manuel-virtual-machine:~$ clear
manuel@manuel-virtual-machine:~$ ls
manuel@manuel-virtual-machine:~$ cd tutorials
manuel@manuel-virtual-machine:/tutorials$ ls
manuel@manuel-virtual-machine:/tutorials$ cd c_tutorial
manuel@manuel-virtual-machine:/tutorials/c_tutorial$ ls
manuel@manuel-virtual-machine:/tutorials/c_tutorial$ mkdir student_data
manuel@manuel-virtual-machine:/tutorials/c_tutorial$ cd student_data
manuel@manuel-virtual-machine:/tutorials/c_tutorial/student_data$ code .

Create a file and name it student.c, then type the following code.

#include <stdio.h>
#include <stdbool.h> // For bool, true ,false
int main() {
// Declaration of variables with different primitive types
unsigned int student_id = 20230015; // Student ID is a positive number only
float average_grade = 8.75; // Average grade is a decimal number
char initials[3] = {'M', 'R', '\0'}; // Initials a character array, \0 marks the end of string
bool is_enrolled = true; // Enrollment status, true or false
// Printing the data
printf("Student ID: %u\n", student_id);
printf("Average Grade: %.2f\n", average_grade);
printf("Initials: %s\n", initials);
printf("Enrolled: %d\n", is_enrolled); // %d because bool in C is techmically 0 or 1

// Demonstartion of additional types

short birth_year = 1975; // Birth year, small integer

double height = 1.86; // Height in meters, heigher precision

unsigned char exams_passed = 25; // Number of passed exams, small positive number

printf("\nAdditional Data: \n");

printf("Birth Year: %hd\n", birth_year);

printf("Height: %.2lf\n", height);

printf("Exams Passed: %hhu\n", exams_passed);

// Demonstration of overflow

unsigned char max_value = 255; // Maximum value for unsigned char, 8 bits

max_value = max_value + 1; // What happens when we exceed the limit?

printf("Overflow of unsigned char: %hhu\n", max_value); // Wraps around to 0

return 0;

}

When you execute the given code, you will get the following result.

[Running] cd "d:\tutorials\c_tutorials\student_data\" && gcc student.c -o student && "d:\tutorials\c_tutorials\student_data\"student
Student ID: 20230015
Average Grade: 8.75
Initials: MR
Enrolled: 1

Additional Data:
Birth Year: 1975
Height: 1.86
Exams Passed: 25
Overflow of unsigned char: 0

[Done] exited with code=0 in 11.951 seconds

The first thing you may notice in the code is that for the boolean data type, you need to include the header file <stdbool.h> to use this data type with the values true and false. You do not have to do this in the C++ programming language. Next, pay attention to how comments are written in the C programming language. In the C programming language, comments are parts of the code that are not executed, but are used by programmers to document, explain, or record information within the code. They help make the code more readable and understandable, both for the author and for others who later review or maintain it. C supports two types of comments: single-line and multi-line.

Single-line comment: // This is a single-line comment
Multi-line comment: /* This is a multi-line
comment */

Single-line comments were introduced in the C99 standard. In older versions of C such as ANSI C or C89, they were not supported, so programmers used only multi-line comments. As for multi-line comments, please note that they cannot be nested.

Variables in the C programming language are fundamental elements for storing data in memory during program execution. They allow the programmer to name a specific location in memory, assign a value to it, and manipulate that value during execution. Each variable has:

Name – an identifier used by the programmer.
Data type – defines the type of data that the variable can store, e.g. int, float, char, etc.
Value – the data currently stored in the variable.
Address – the location in memory, accessible through a pointer.

In C, before using a variable, you must declare it, or rather, inform the compiler about its type and name. Often, the declaration and initialization, assigning an initial value, are done simultaneously.

data_type variable_name; // Declaration
data_type variable_name = value; // Declaration with initialization

Naming variables also has its limitations. The name must start with a letter or an underscore, but not a number. The C programming language is case-sensitive, which means that it distinguishes between uppercase and lowercase letters. The type determines how much memory the variable occupies and what values it can store, e.g. int usually occupies 4 bytes on a 32-bit system. Variables in C have a scope and lifetime, which depend on where they are declared in the code. They can be declared in a code block or in a function. They are only visible within that block. They are destroyed when the block ends.

In addition to variables, you can also see the use of format specifiers, literals, and escape sequences in the program mentioned.

Format specifiers

Format specifiers are used in the C language to format data during input scanf and output printf. They determine how the value of a variable will be interpreted and displayed.

Basic Format Specifiers

Literals

A literal is a fixed value that appears directly in the code, without the need for a variable. Literals have a data type that the compiler automatically recognizes based on their form. In the C language, there are different types of literals:

Integer Literals: E.g. 10, 42, -5. They are of type int by default, but can be long (e.g. 42L) or unsigned (e.g. 42U).
Floating-Point Literals: E.g. 3.14, 0.5, -2.0. They are of type double by default, unless f is added for float (e.g. 3.14f).
Character Literals: E.g. 'A', '7', '\n'. They are written in single quotes ('') and represent a single character (type char).
String Literals: E.g. "Hello", "Number: %d\n". They are written in double quotes ("") and represent a sequence of characters (type char[]).

Escape sequences