Systems Guide
Table of Contents
General Skills
Basics of C++
Basics of C
Tooling for C/C++
Object Oriented Programming
STL
Concurrency
Compilers and Interpreters
Lexing
Parsing
Variables, Conditions and More
Compilers and Code Generation
The Road Ahead
KernelDev
Network Programming
Network Models and Architecture
OSI (Open Systems Interconnection) Model
TCP/IP Model
Protocols (by layer)
Advanced Network Programming Concepts
Blocking and Non-Blocking I/O
Concurrency Models
Performance Optimization
Socket Programming
Some Practical Assignments
Further Resources
General Skills
Basics of C++
- learncpp.com: A good tutorial for beginners
- C++ Primer, 5th Edition: For beginners who prefer working with a book. Dives more in-depth than learncpp.com.
- A Tour of C++ by Bjarne Stroustrup: For those with experience in another programming language, needing a quick start with C++.
It is not essential to finish the entirety of the resources mentioned above, however for a smooth experience following the rest of this guide, we highly recommend doing the sections corresponding to General Skills from them. Most topics in Systems require a good deal of programming experience in these languages beforehand so get well acquainted with them.
Basics of C
- Beej’s tutorial: Most comprehensive tutorial introducing all concepts.
- Beej’s library reference: Library reference, covering all c stdlib functions.
The above two books should introduce a beginner to all features of the language and functions in stdlib, providing examples, common pitfalls, and references to c standard.
Tooling for C/C++
Important things you should learn aside from the language itself, which are really important when you are dealing with large projects, are:
- Build systems: makefiletutorial.com. Makefiles make it harder to manage multiple libraries, so instead, a preferred, albeit equally tedious way to manage libraries is to use cmake.org
- Debugging is an important skill in all of programming. For C/C++ a popular debugger is GDB. Here’s a quick tutorial that should get you started. And here’s a popular cheatsheet. You need not know every single command but you should be familiar with commonly used ones at least.
Make sure to interlace studying theory and tutorials with making some simple projects on your own. It’s ok, and in fact recommended, to aim for projects where you don’t know all the concepts required to complete them.
Object Oriented Programming
This is relevant if you’re using C++. The book and website mentioned above for the basics of C++ should also cover this. Apart from this, make projects in C++ and those will inevitably need OOP concepts.
STL
STL is an important tool for C++ programmers as it implements a lot of useful data structures and algorithms.
Again, if you’re following the book or website then this will probably be covered. Otherwise, to get familiar with STL you can watch this:
- Complete C++ STL in 1 Video | Time Complexity and Notes OR read this:
-
[The Complete Practical Guide to C++ STL(Standard Template Library) by Abhishek Rathore Medium](https://abhiarrathore.medium.com/the-magic-of-c-stl-standard-template-library-e910f43379ea)
(Taken from the CP roadmap). Rather than having an extensive knowledge it’s important to be familiar with the available data structures and how to apply them since you’ll need those quite often in software development.
Concurrency
In any software, seemingly multiple tasks happen at the same time. Dealing with multiple tasks at once is concurrency. It is slightly different from parallelism, which, in a sense, is “True Concurrency” since it uses (and thus requires) multiple cores of a CPU to achieve concurrent systems, executing tasks in parallel on different cores.
Initially, a general idea of how concurrent systems work is enough. You should go through the talk mentioned, but before it, to get an idea of what a thread is, what a process is, etc. For this, properly go through multiple answers AND comments (always a good practice with Stack Exchange) in the following thread ;) on stackexchange What is the difference between a process and a thread?. For a decently complete, C++ specific implementation of concurrency, and concurrent systems, refer to Chapter 13 of A Tour of C++, titled Concurrency. We highly recommend the following talk to get an idea of when to use concurrency, CppCon 2017: Ansel Sermersheim “Multithreading is the answer. What is the question? .
Compilers and Interpreters
This section deals with how to build compilers and interpreters. By the end of it you should be able to create your own programming language and maybe even contribute to an open source compiler/interpreter of your favourite language.
Lexing
The input to a compiler is the source code, which is simply a raw list of characters. This is too low level to make sense of so we first extract tokens from a raw string - this process is called lexing/lexical analysis.
- This chapter of the Crafting Interpreters book focuses on building a lexer for the Lox programming language implemented in the book.
- LLVM’s Kaleidoscope tutorial on writing a lexer.
- This is a tutorial on writing a parser. The first part involves making a lexer: https://lisperator.net/pltut/parser/
Parsing
Once you have a list of tokens the next step is to organize them in a more meaningful manner than a simple list. Usually this means making an abstract syntax tree.
- This chapter and the next chapter from the book “Crafting Interpreters” explain it well. This book is a great resource that I will be referring to again later. Note that this chapter includes code that builds on some previous code in the book and it also deals with error handling. These can be safely ignored for now.
- The same tutorial referenced above. As I said its main focus is writing a parser: https://lisperator.net/pltut/parser/
Now that you know how to make a lexer and a parser, try to apply your knowledge to a simple project.
Try to make a mathematical expression parser which takes an input string such as 3 + sin(0)*3 for example and prints the answer.
Note: This can also be done without following the usual pipeline needed for a programming language compiler. In particular, you don’t necessarily need to make an AST.
Look into the Shunting Yard algorithm and Reverse Polish notation. But since our goal is to make a compiler/interpreter, try doing it that way.
Variables, Conditions and More
Now we’ll move on to the more interesting stuff. But first, to get an idea of what the architecture of a compiler looks like and how you may go about designing one, read the top answer here: https://softwareengineering.stackexchange.com/questions/165543/how-to-write-a-very-basic-compiler
- Crafting Interpreters: A great introductory book. It teaches most of the fundamental concepts that we’ve covered and eventually develops an interpreter for a toy programming language.
Follow this book to make a relatively complete (although not yet usable for production) programming language.
The first part (tree walk interpreter) is written in Java but I recommend trying this out in any object-oriented language of your choice.
Note that the book makes an interpreter - first a tree walk interpreter and later a bytecode VM. The working of the bytecode VM is very similar to a compiler, the difference being that a compiler generates assembly/machine code for a specific CPU architecture. Bytecode, on the other hand is a made-up instruction set and the VM emulates a chip running this instruction set.
- This playlist on YouTube is about making an interpreter.
Compilers and Code Generation
An interpreter goes through source code line by line and executes it on the fly. On the other hand a compiler converts the source code into native machine code, which can then be run as an executable. An interpreter may also take an intermediate approach, such as the bytecode VM in Crafting Interpreters.
Interpreters and compilers are quite similar and use a lot of the same techniques. Once you’ve learned how to make an interpreter you can attempt to make a compiler for one specific CPU architecture. You will need to go through a reference manual for assembly and CPU instruction sets.
- This playlist goes through the development of a compiler for a simple language through assembly code generation.
- Chapter 2 of The dragon book goes through making a mini-compiler (Yes in just a chapter, that’s how comprehensive this book is). It is a classic book on compiler design, but it is a fairly advanced writeup and is to be mainly used as a reference.
If you just want to dabble into the waters of generating assembly code, then a simple project could be to make a compiler for an esolang like Brainfuck. BF is a famous esolang with a very simple instruction set, yet it is Turing complete.
Since it has very simple instructions, writing an interpreter would be trivial. But it could be a good exercise to write a compiler for it that converts these instructions into assembly or even machine code.
The Road Ahead
By this point you probably have a good handle on the techniques used in compiler and interpreter development. Now you can go wild and explore new techniques on your own.
I will not make this section like a roadmap but instead put out some random ideas and resources.
- Compiler architectures like LLVM will often handle the backend (code generation and optimisation) for you in a practical setting since this is a very hard task and generating machine code for several different platforms on your own is virtually impossible.
So learn how to use tools like LLVM. https://llvm.org/docs/tutorial/ - Play around with open source compilers like GCC or your favourite language’s compiler/interpreter. Maybe you could try to contribute a few bug fixes or simply locally tweak some stuff and see what changes.
- JIT Compilation - This is a hybrid of interpreting and compilation where native machine code is generated on the fly and executed. For example, JavaScript is usually JIT compiled.
This technique allows for optimisations on parts of code that are being executed more often. Once again, you could start out by attempting to do JIT compilation on a simple esolang like Brainfuck, like in this video. - Compiler optimisations. This is another advanced topic you may explore. A lot of code written by the user can be optimised during compilation.
For example, if youx++twice in a row you could replace that with a singlex += 2. The dragon book has a chapter or two on this. Here’s a paper on some compiler optimisations: https://www.clear.rice.edu/comp512/Lectures/Papers/1971-allen-catalog.pdf
KernelDev
OSDev Wiki - A great resource for learning operating system development from scratch. Make sure to read the Introduction and Beginner Mistakes sections. Learn about the environment, CPU, kernels, storage devices, memory management, booting, and the “Tools” section. Read the in-page links.
Another good resource is Linux Kernel Development. I would suggest going through starting 3-4 pages of every chapter of this book - this will give you a high-level understanding of how the Linux kernel is structured and maintained. You can of course go deeper and read more in-depth.
Extended learning & Roadmaps Linux Roadmap - While it’s not focused directly on writing kernel code, Linux Roadmap is an important foundation for anyone aiming to contribute to or build a kernel. Linux Kernel Developer Roadmap - You can learn up to module 3. Rest of it has yet to be updated.
Network Programming
At its core, network programming deals with communication between processes over a network using protocols like TCP/IP and UDP. Examples can include sending a message, fetching a webpage, transferring files, syncing game state and more.
Network Models and Architecture
OSI (Open Systems Interconnection) Model
This one is a classic. It might seem academic a bit at first glance but, this blog does a pretty good job at explaining the OSI model.
TCP/IP Model
While most modern networks use the TCP/IP stack, OSI model remains a foundational tool for learning and discussing network architecture.
Guide: If you are just getting started, Beej’s Guide to Networking Concepts is an absolute gem. This amazing guide should get you started with a basic understanding of networking concepts, don’t get overwhelmed by this though, you can cover this at your own pace :)
Protocols (by layer)
We will now get a high-level overview on what these are and where and how they are used.
- Application Layer:
- HTTP/1.x, HTTP/2: This is basically how your web browsers talk to websites.
- FTP (File Transfer Protocol): for, well, file transfer across networks.
- SMTP: Simple Mail Transfer Protocol
- DNS: It is like the phonebook of the internet, mapping IPs to domain names.
- SSH (Secure Shell), TLS/SSL (security): These are like your guardians, keeping your connections secure and private.
- WebSockets: For real-time and interactive experience – think live chat or game updates without constantly refreshing.
- RPC frameworks: These let programs on different computers “call” functions on each other as if they were local. Modern ones like gRPC leverage the power of HTTP/2’s multiplexed streams. Check out this article for in-depth overview on HTTP/2.
- HTTP/3: This runs over QUIC, promising faster and more reliable connections.
- Transport Layer:
- This layer manages end-to-end data transmission between systems using protocols like TCP and UDP.
- TCP ensures every data packet arrives in order.
- UDP is more like throwing the packet – fast but unreliable. Some use cases for UDP include media streaming, DNS queries, and certain online games where speed outweighs the guaranteed delivery.
- Protocols like RTP (Real-time Transport Protocol) and RTCP belong here too, crucial for streaming audio and video data over IP networks.
- This layer manages end-to-end data transmission between systems using protocols like TCP and UDP.
- Network / Internet Layer:
This layer is responsible for routing packets between different networks – more like a GPS for your data.
- IP (Internet Protocol): It gives every device a unique address. (Difference between IPv4 and IPv6)
- ICMP: This helps with error messages and diagnostics (example:
ping, ARP). - You may want to get familiar with concepts such as Subnetting, NAT, CIDR.
- Data Link Layer
It manages MAC addressing, switches/bridges, etc ensuring error-free transmission of data.
- MAC address vs IP address: MAC addresses are hardware-based unique identifiers for network interfaces, while IP address are logical addresses used for identifying devices and routing data across networks.
Advanced Network Programming Concepts
Blocking and Non-Blocking I/O
With blocking I/O, when a client makes a request to connect with the server, the thread that handles that connection is blocked until there is some data to read, or the data is fully written. With non-blocking I/O, we can use a single thread to handle multiple concurrent connections.
This explains how frameworks like
Node.jshandle concurrency efficiently.
Concurrency Models
These are strategies for how a program handles multiple tasks or operations at the same time.
- Threading vs Processes This article explains this well.
- Event-Driven Programming In this model, instead of setting new threads/processes for every connection, the program setup “event listeners”. When something happens (like new data arriving on a socket), an “event” is triggered, and a small piece of code (a “callback”) is executed. This allows a single thread to manage thousands of connections efficiently by simply reacting to events as they occur.
Performance Optimization
- Caching: Storing frequently accessed data closer to where it is needed (e.g., in memory, local file system or dedicated caching server like Redis). This reduces the need to refetch data over the network, speeding things up significantly.
- Load Balancing: Distributing traffic across multiple servers in a server farm. This ensures no single server is overwhelmed. This also improves response time and provides high-availability.
Socket Programming
- Beej’s Guide to Network Programming
A good resource. Skim through it and get an introduction, learn the basics like what sockets are, about TCP/IP connections, how to send/receive data, and open/close a TCP/IP connection.
- To get a high-level overview, check out Bytemonk’s video.
- This is a nice, quick read for understanding the core concept.
- Linux IP Networking - (chapter 2-9) For those who want to get deep into the Linux side of things, this is a excellent resource. It gets a bit more advanced, but it’s incredibly rewarding.
3.Deep dive into iptables and netfilter architecture
Understanding iptables and netfilter is crucial for network security and traffic control on Linux.
Some Practical Assignments
Here are some project ideas with increasing complexity to solidify your understanding. Pick a language (Python is great for quick prototyping, C++ is great for performance).
- Simple Chat Application (Command Line)
- Goal: Build a client-server chat app.
- File Transfer
- Goal: Create a program to send and receive files from one computer to another with proper error handling.
- HTTP Proxy Server
- Goal: Build a basic HTTP proxy that forwards requests.
- Websocket-Based Real-Time Application
- You can brainstorm ideas for this. An example may include a collaborative whiteboard.
Further Resources
- Building a TCP/IP stack - Video
- Some channels and blogs you can follow
- NetworkChuck
- ByteMonk
- LowLevel
- Beej’s Blog
- Books
- Computer Networking: A Top-down Approach by Kurose and Ross (Classic one)
- For Socket Programming Unix Network Programming, Volume 1: The Sockets Networking API by W. Richard Stevens.
- High-Performance Browser Networking by Ilya Grigorik
- Designing Data-Intensive Applications by Martin Kleppmann
Contributors
- Austin Shijo | +91 8946061070
- Shivansh Jaiswal | +91 9971104638
- Sujal Satish Montangi | +91 7349439500