You decided to master assembler, but before that you want to understand what it will give you as a programmer? Is it worth entering the world of programming through assembly, or is it better to start with some high-level language? And in general, do you need to know assembler to become a full-fledged programmer? Let's sort it all out in order.
Dive into assembler
This is the introductory article in Dive Into Assembler, which we publish in honor of its completion. Its full text is available without a subscription. After reading it, you can move on to other articles in this course:
What is it worth learning assembler for?
It is worth learning assembler if you want:
- understand how computer programs work. Understand the details, at all levels, right down to machine code;
- develop programs for microscopic embedded systems. For example, for 4-bit microcontrollers;
- understand what is under the hood of high-level languages;
- create your own compiler, optimizer, JIT runtime, virtual machine or something like that;
- break, debug, or protect computer systems at the lowest level.
Many security flaws appear only at the machine code level and can only be addressed at this level.
You don't need to learn assembler if you want:
- speed up your other programs.
Modern optimizing compilers do this very well. You can hardly overtake them.
Who will produce the best assembly code?
Why is it almost impossible to outrun the compiler? Look, is it obvious to you that you can't beat a computer at chess, even if you play better than the creator of a chess program? It's the same story with optimizing compilers. Only the optimizing compiler plays not with chess pieces, but with contextual circumstances.
In modern processors, virtually nothing that affects performance should be discussed out of context. The same combination of a dozen assembly instructions is executed with drastic differences in speed (thousands or even millions of times), depending on a whole bunch of very different circumstances.
- Is the data you are accessing now loaded into the cache or not? And what about the combination of assembly instructions?
- If neither the data nor the code is located in the cache, is the processor dragging them there on the sly, assuming that they will be accessed in the near future?
- What instructions were followed just before our top ten? Are they still on the assembly line now?
- By some chance we haven't reached the end of the current virtual memory page? And then, God forbid, a good half of our ten will end up on a new page, which, moreover, has now been pushed onto disk by the law of meanness. But if we are lucky and the new page is still in physical memory, can we get to it through the TLB buffer? Or will we have to work our way to it through the full address using page tables? And are all the page tables we need loaded into physical memory? Or are some of them pushed to disk?
- Which processor is executing the code? Cheap i3 or powerful i7? It happens that cheap processors have the same instruction set as powerful ones, but advanced instructions are executed in several steps, rather than in one.
And all this is just the tip of the iceberg, a small part of what you will have to take into account and analyze when you try to replay the compiler.
There is a myth that programs written in assembly language run ten times faster. This myth goes back to the seventies. Compilers in those distant times generated code so ineptly that every self-respecting programmer had a black list of prohibited language constructs.
When our colleagues from the past wrote programs, they either kept this blacklist in mind and did not let their fingers fill in problem constructs, or they set up a special preprocessor that converted the source code into a lower-level, problem-free representation in the same language. Since then 50 years have passed. Compilers have matured, but the myth remains.
Of course, even today you can occasionally meet a unique person who writes faster code than the compiler. But it takes him so much time that he does not climb into any gate. Plus, optimization requires you to know by heart the entire set of processor instructions.
In addition, since you are polishing your code by hand, no compiler will back you up by catching bugs that you inevitably produce when you write a program.
Also, your assembly code will be non-portable. That is, if you want your program to run on a different type of processor, you will have to completely rewrite it to create a modification tailored for the instruction set of that other processor. Of course, you also need to know these instructions by heart.
As a result, you will spend tens and hundreds of times more time than if you trusted the optimizing compiler – but the result is likely to be slower, not faster.
At the same time, sometimes the optimizing compiler spits out assembly code, the logic of which is, well, completely incomprehensible. However, do not rush to accuse the compiler of being stupid. Let's take an example.
When you write in C something like
x , then naturally you expect to see an instruction in the assembler that multiplies the variable
a for a deuce. But the compiler knows that addition is cheaper than multiplication. Therefore it does not multiply
a by a deuce, and adds it to itself.
Moreover, looking at
b, the compiler may think that
b preferable to
b*3… Sometimes triple addition is faster than multiplication, sometimes not. And sometimes the compiler comes to the conclusion that instead of the original expression, it will be faster to evaluate
(… Or even
x is used only once – and in conjunction with a couple of lines of subsequent code – the compiler may not calculate at all
x, but just insert
a*2 instead of x. But even if
x is used and the compiler sees something like
y , he can correct these calculations by
y wondering at your extravagance. Wastefulness in terms of computational complexity.
Reflections of this kind inevitably lead you into a tangled maze of alternatives. All of them need to be calculated in order to choose the best one. But even when you do this, the compiler-generated version of the assembler code is likely to run faster than yours.
By the way, if you use GCC or Clang, enable optimization options for SSE, AVX and everything else that your processor is rich in. Then sit back and be surprised when the compiler vectorizes your C code. And he will do it in a way that you never dreamed of.
What programs cannot be written in assembly language?
There are no such. Everything that can be done on a computer can be done in assembly language as well. Assembler is a textual representation of raw machine code into which all programs running on a computer are translated.
You can even write a website in assembler if you want. In the nineties, C was a perfectly reasonable choice for this purpose. Using such a thing as a CGI BIN, the web server could invoke a C program.
stdin the site received a request, and through
stdout sent the result to the browser. You can easily implement the same principle in assembler.
But why? You have to be a masochist to do this. Because when you write in assembler, you are faced with such problems.
- Your productivity is lower than if you were working in a high-level language.
- Your code has no structure, so it will be difficult for other developers to read it.
- You will have to write a lot of letters. And where there are more letters, there are more potential bugs.
- With Secure Coding, everything is very sad here. Assembler is the hardest thing to write in a way that is safe. On C in this regard, you feel much more comfortable.
Yes, everything can be written in assembler. But today it is not practical. Better write in C. It is likely to be safer, faster, and more concise.
From the editor
The author of this article is a big fan of C and highly recommends this language. We will not deprive him of this opportunity. C is a great thing and helps to both master the basic concepts of programming and get a feel for the principles of computer operation. However, when choosing a language to study, you can be guided by a variety of considerations. For example:
- You need to learn Python or Lua to get immediate results. This is motivating!
- It is necessary to learn Scheme or Haskell for the same reasons that algebra is taught in school, and not, for example, auto mechanics.
- You need to learn Go for the same thing as C for, but in 2020.
- You need to learn Java to maximize your earnings.
- Need to learn Swift, because why not?
- We must teach HolyC to praise the Lord.
- We must learn Perl in the name of Satan.
Etc. The answer to the question of which language to start with depends on many factors, and the choice is an individual matter.
Of course, when you know assembler, you will have significant advantages over programmers who do not know it. But before you get to know these benefits, remember one simple thing: good programmers know assembler, but almost never write in it…
What are the benefits of assembly language for the programmer?
To write efficient programs (in terms of speed and resource saving), you definitely need to know the assembler of the hardware for which you are writing. When you know assembler, you are not fooled by the external simplicity and brevity of high-level functions, but you understand what each of them eventually turns into: a couple of assembler instructions or a long sequence of them intertwined with loops.
If you work with high-level languages such as C, learn to at least read and understand assembly code. Even if in the foreseeable future you do not see yourself writing in assembler (in fact, very few people see themselves as such), knowledge of assembler will be useful to you.
If you use assembler for you, it will serve you well in debugging. Having mastered assembler, you will understand what is happening under the hood of high-level languages, how the computer does what it does, and why the high-level compiler sometimes does not work the way you expect it to. You will be able to see the cause of this and understand how to eliminate it.
Plus, sometimes you just can't figure out what kind of bug you have until you step through the assembly code in the debugger.
And here's another subtle hint: some employers would like to see the word "assembler" on your resume. This tells them that you have not just picked up the tops, but are really interested in programming, digging deeper.
Should you start learning programming with assembly language?
When you learn programming from the bottom up, this has its advantages. But assembler is not the very bottom. If you want to start at the bottom, start with logic gates and digital electronics. Then dig deeper into the machine code. And only then proceed to the assembler.
From time to time you will have the thought that you are doing some kind of nonsense. But you will learn a lot of useful things for your future work, even if it will only be related to high-level languages. You will learn exactly how the computer does the things it does.
However, I would not recommend starting with assembler and lower layers. Everything that is listed in the previous two paragraphs is easier to understand when you start with a high-level language. This way you will achieve the desired result faster than you get bored with it.
But at some point you really need to become familiar with assembler, especially if you program in C. I doubt that you can become a full-fledged C programmer without knowing assembler. But it's not worth starting with assembler.
How much easier is it to learn other languages when you already know assembly language?
Assembler is completely different from high-level languages. Therefore, the proverb “The experience that you got in one language can be easily converted to another language” does not work with the assembler.
If you start with assembler, after you learn it and decide to master a new language, you have to start from scratch. I remember that my classmate learned assembler at school, wrote a toy on it, with which he won the conference. But at the same time, I was not able to get used to C well when we studied at the university.
How is assembly language different from high-level languages? The variables in it are just memory areas. There is no
char… There are no arrays here!
There is only memory. And you work with her differently than in a high-level language. You can forget that you put a string in some memory area and refer to it as a number. The program will compile anyway. But it will only collapse at runtime. And it will crash harshly, without a polite error message.
In assembler no
if..… Instead, there are only comparison and conditional operations. Strictly speaking, there are not even functions there.
But! By studying assembler, you will understand how functions, loops, and everything else are implemented. And the difference between passing a parameter "by value" and "by reference" will become self-evident to you. Plus, if you write in C, but cannot fully understand how pointers work, then when you learn what registers and relative addressing are, you will see that it is not difficult to understand pointers.
Better to start with C. It is convenient to master the basics: variables, conditions, loops, logical constructions and the rest. The experience you gain by learning C can be easily converted to any other high-level language, be it Java, Python, or whatever. And assembler is easier to deal with when you've already mastered C.
How profitable is it to be able to program in assembly language?
If you look at HH.ru, then, most likely, you will not find a single vacancy with the word "assembler" written in the title. But from time to time, some office is frantically looking for a magician-wizard who knows the inside of the computer so deeply that he can completely subordinate the operating system to his will. A magician-magician who can (1) patch the system without having the source code on hand, (2) intercept data streams on the fly and interfere with them.
Some of this deep magic – and now the need for such magic is becoming increasingly rare – can only be embodied in very low-level language.
I have heard of an office looking for someone to develop a new high-frequency trading platform. The idea is that if you get information about quotes faster than your competitors and make decisions faster than them, then you will be rowing fabulous sums.
"When you get quotes by going through the entire TCP / IP stack, it's too slow," the guys at the firm say. Therefore, they have a gadget that intercepts traffic at the Ethernet level, right inside the network card, where the customized firmware is uploaded.
But these guys went even further. They are going to develop a device for filtering Ethernet traffic – on FPGAs. What for? To catch quotes at the hardware level and thereby save precious microseconds of trading time and, as a result, get a small, very small, advantage over competitors. Language C did not suit them. They didn't even fit the assembler. So these guys are scratching out the program right on the silicon!