What I Learned from OS Development?

Way more than building one...
By Apan Trikha at Wed Aug 25 2021


Operating System, the system software no one cares until it funks up and makes your life miserable (which happened with me) or maybe, you're a pleb and care too much about it to flex on others. Either way, it is the system software that makes our hardware go brrrr. That being said, building even a simple one is a hard task. Just for reference, I built an operating system called ArukaOS which has a read-only FAT16 filesystem, basic paging model, round robin scheduling and uses a very basic shell. It took me 7 weeks to build it and it's an amazing experience but saying that I had an easy time will be a bold faced lie.

Here, I've documented what I've learned in these past 7 weeks about OS development and more which I think is worth discussing as this is a path very few will consider viable and even fewer will tread on it. This article will be a summary of my experience and no way a tutorial on how to build one. Even though I've built one, in no conceivable way I can put it all in one article, it's just too much.

Be prepared to deal with Dark Work

For those who are wondering what the hell did I mean? Allow me to introduce this term "Dark Work". Essentially, dark work is the one that users can't immediately figure out but is crucial for building the foundations of your system. Writing an OS is generally dark work in most cases, if you think the "Operating System" are these big buttons with pretty pictures that are aligned on bar, you're dead wrong. These are the parts of "Desktop Environment", to be precise those are icons and window decorations.

The crucial tasks of an operating system are:

All of these can be summed up as providing an abstract environment for programmers to avoid the details of hardware configurations & to provide that, a significant amount of ground work needs to be done. This ranges from creating a bootloader to taking your processor to desired modes of operation like Protected Mode (for 32 bit), Unreal mode (for 16 bit to break the 64KiB limit), etc., as well protocols for user programs to communicate with the kernel.

I've just mentioned a few things to provide the bare minimum for creating a programming environment, which is the bare minimum a for a fully functional operating system. These take time and do not come into fruition quickly and hence, this tests one's patience too.

RTFM is not an option, it's the name of the game

If you're familiar with the famous Linux distributions, Arch and Gentoo Linux, then you probably know about this acronym, RTFM. If not, this is means READ THE FUCKING MANUAL! I am not going to sugar coat this if you feel bad about it, because those are going to be anyone's words who'll encounter a person complaining about something too trivial and too lazy to read the wikis of those distributions, where the question has been precisely answered. This is perhaps, why people think these two distributions to be "elite".

Why RTFM to noobs?

I know what you're thinking, especially if you're using Windows, MacOS or a more "beginner-friendly" distribution of Linux.

What's with the hostility?

Don't you want Arch/Gentoo to be inclusive?

Arch prides itself to be simple, but then why RTFM?

A curious reader who hasn't used Arch/Gentoo yet

The reason is simple, RTFM is used to call out the laziness of the individual. Given that both Arch and Gentoo Linux require you to install the distribution manually (using terminal commands), it isn't a wild conclusion one will come up with, that the individual asking a question is knowledgable enough to

And when these instructions are clear and already available in the wiki, then shouldn't that be your first reaction for troubleshooting? Nothing wrong with asking questions but solving problems yourself is often a lot quicker than asking from a forum. It's just faster to read an already proposed solution albeit intimidating at first.

But how does that connect with OS Development?

If you want to write an OS that works with "real world" hardware and "real people" can use it or program on it, then one must comply with the pre-existing standards of the industry. Even if you want to make your OS compatible with Microsoft standards (because Microsoft has a long history of building and imposing its own) as React OS is trying to do, you still need to study how they work in order to implement them well.

To make that possible, reading the manual isn't something one will do out of free will but to avoid re-inventing other wheels as you are already re-inventing one yourself. For ArukaOS to work with x86 processors, read files and load programs, I had to consult the manuals of the following.

If you want your system to be POSIX compliant, guess what you gotta read the manual. Want to add USB support, read the manual to avoid running in circles. In most cases, they only serve as guidelines while you need to implement the way you want.

OS Development is Tedious

This will sound like I'm repeating my first point but read it anyways as there are some frustrating aspects about OS development and one of them is that things take a lot more time than other domains of software engineering. For example, implementing fread in your C library. If you are using another library and merely recompiling it, then that isn't an issue. However, my aim with ArukaOS is to primarily learn how an OS works and is implemented, I implemented one. To even use fread as intended, the OS must be able to understand filesystems and stream data to/from it. For data streaming, I must work on disk's read/write operations, which are done by using assembly instructions and interacting with the kernel.

All of these take time and one must be patient while working on these, there's no easy way out here. And thinking about copy pasting code is even more dangerous! Don't tell me I didn't warned you since debugging these isn't going to be an easy task.

The most tedious of all is confronting your stupidity. In the process of building an OS, they'll be mistakes that aren't problematic on user level because the Big 3 OS (Windows, MacOS, Linux) all handle memory management for you. But now that's your job to make sure that your OS can also handle programmer's stupidity. This is the secret sauce behind the success of those 3. This also means being good in DSA is critical as there's virtually no hand holding because your program (OS) is in the driver's seat of utilizing the hardware.

Meet the "Debugger"

For the most part, especially during the early phases of Aruka's development most the time, I solved my problems by using a debugger. I think I should dedicate an article on it too as it is a tool I never used while working at Amazon but this one just made my life a lot easier that it is unimaginable. I am using GNU debugger for those wondering, it is CLI based but it's seriously powerful. I can run and inspect the project in instances that seem "pretty 'sus to me".

From the above screenshot, you can see how I can debug my OS by running and stopping it where it needs to be instead of keeping track of the states in my head all the time. This is hugely beneficial to me and I know how many trivial bugs I was able to find with this. As a result, this made the wild and probably insane world of OS development into something palatable.

Shifted outlook on C

I don't exactly remember who said that "C is a high level programming language" and I gasped and probably laughed at the absurdity for a moment. I mean why not? For the longest period of my life I was told it's a low level programming language and there I witnessed the opposite, until I encountered assembly. Now, I thank God everyday for C language, an abstraction over assembly instructions while not hiding the key details for writing a program. Apart from that, I also witnessed more advanced features of C language.

C, the abstracted assembler

If you think C is hard and you have to do a lot of work to write a user level program, think again, because I'm going to introduce you to the real boss mode of programming, assemblers. These are the direct instructions given to a CPU and are readable equivalents of machine code which is unreadable, can say by building a CHIP-8 emulator in a weekend. To emulate CHIP-8, I had to decode the hex code (compact binary) instructions and run commands according to the opcode, it's a simple thing to be honest. But writing that binary file isn't simple which is why no one goes lower than assemblers.

The pesky thing about assemblers is that your code has to be compatible to a certain architecture and writing programs for another, similar architecture will mean re-writing the assembly code. This was even more true when computer architectures were in crazy numbers in the 1980s, this was the time when different computers weren't just different parts cobbled together but rather different architectures. Today you can compare this by writing a simple program that should be able to run on both regular desktops and Raspberry Pi.

The thing about C is that on compilation, one can decide how the machine code needs to be generated according to deployment architecture. This is a useful prospect for developing an OS as you can create a 32 bit OS using a 64 bit machine but to generate 32 bit x86 code, you must tell C compiler for that. This introduced me to the concept of cross-compiling because obviously, I'm building my own platform to build and run programs on.

ELF and linking process in C

Ever wondered what those DLL files mean if you're on Windows or may be the .so files on Linux or .dylib for MacOS? And in some cases, you have an executable that packs everything and doesn't need anything else for example, Godot Game Engine which is a single binary regardless of OS you're using? The difference lies in how the program has been linked.

The former is the case of dynamic linking where the dependencies are utilized on the go while in the latter scene the dependencies are linked statically. But the differences I learned is far more than what I just explained. I learned how the binary handles the links and how one can observe. These may not the most accurate description but good enough to provide the difference. Right now, I won't go completely over ELF format as that'll elongate this article way more than needed, I'll link to this page for the information.

For ArukaOS, I built an ELF loader to load binaries whose dependencies are statically linked, with that I observed how links are formed. For example, take a look at the symbol table of my OS' "hello world" program.

On looking you'll realize that it has way more than needed but that's due to including other files for testing other library functions as I have also built my C library. But you can look that all functions from the source are included in the executable, this is what we mean by static linking as the links are made within the executable and will be called as if someone wrote a massive single C program. On the other hand, here's the "hello world" program I made in Linux in the usual way (just using gcc hello.c).

Notice it has .dynsym along with .syntab, that's for the symbol table for dynamic links which aren't included in the executable, I mean the dependencies are dynamically linked. They just need to be pulled from the specified locations in run time.

Conclusion

Developing my OS has been a great experience, it's a formidable intellectual challenge and mostly an educational experience more than anything else. I don't recommend pursuing this project until you're confident with dealing with C, assemblers, computer architecture and reading official documentation. These are important things that one must be aware before delving into OS development.

But if you're confident with these and want to build your own OS to learn better how they work, then go for it but don't expect it to be built in a weekend or two, a well functioning OS will take weeks if not months to build. Despite being a quick learner with weeks worth of free time, it took me 7 weeks to build an OS equivalent of "Hello World". And if you think you can make it after going through a semester of C programming class, read the beginner mistakes mentioned in OS Dev Wiki. Until then stay safe.