The content of the article
The study of the algorithm of the work of programs written in high-level languages traditionally begins with the reconstruction of the key structures of the source language – functions, local and global variables, branches, loops, and so on. This makes the disassembled listing more intuitive and greatly simplifies its analysis.
Fundamentals of Hacking
Fifteen years ago, Chris Kaspersky's epic work, Fundamental Fundamentals of Hacking, was the handbook of every aspiring computer security researcher. However, time goes by and the knowledge published by Chris is no longer relevant. The Hacker editors tried to update this voluminous work and migrate it from the time of Windows 2000 and Visual Studio 6.0 to the time of Windows 10 and Visual Studio 2017.
Modern disassemblers are quite intelligent and take the lion's share of the recognition of key structures. In particular, IDA Pro successfully manages the identification of standard library functions, local variables, addressed through the register
ESP, case branches, and more. However, sometimes it is mistaken, misleading the researcher, moreover, its high cost does not always justify the application. For example, students studying assembler (and the best way to learn assembler is to disassemble other people's programs), it is hardly affordable.
Of course, the light did not converge on the IDA, there are other disassemblers – say, the same DUMPBIN, which is part of the standard SDK package. Why not use it at worst? Of course, if there is nothing better at hand, DUMPBIN will do, but in this case you will have to forget about the intelligence of the disassembler and use it exclusively with your own head.
First of all, we will get to know non-optimizing compilers – the analysis of their code is relatively simple and quite understandable even for beginners in programming. Then, having mastered the disassembler, let's move on to things more complex – optimizing compilers that generate very tricky, confusing and ornate code.
Put your favorite music, choose your favorite drink and plunge into the depths of disassembled listings.
A function (also called a procedure or subroutine) is the basic structural unit of procedural and object-oriented languages, therefore code disassembly usually begins with identifying functions and identifying the arguments passed to them.
Strictly speaking, the term “function” is not present in all languages, but even where it is present, its definition varies from language to language. Without going into details, we mean by function a separate sequence of commands called from various parts of the program. A function can take one or more arguments, or it can take no; may return the result of his work, or may not return – this is not the point. The key property of a function is the return of control to the place of its call, and its characteristic feature is the multiple call from various parts of the program (although some functions are called only from one place).
How does the function know where to return control? Obviously, the calling code must first save the return address and pass it along with the other arguments to the called function. There are many ways to solve this problem: for example, you can place an unconditional jump to the return address at the end of the function call, you can save the return address in a special variable and, after the function finishes, perform an indirect jump using this variable as an operand of the instruction
jump… Without dwelling on the discussion of the strengths and weaknesses of each method, we note that in the vast majority of cases compilers use special machine instructions
RET, respectively, designed to call functions and return from them.
CALL throws the address of the instruction following it to the top of the stack, and
RET pulls together and transfers control to it. The address pointed to by the instruction
CALL, and there is the address of the beginning of the function. And the instruction closes the function
RET (but attention: not everyone
RET marks the end of the function!).
Thus, the function can be recognized in two ways: by cross referencesleading to machine instruction
CALLand by her epilogueending with instruction
RET. Cross-references and the epilogue together allow you to determine the addresses of the beginning and end of a function. Looking ahead a bit, we note that at the beginning of many functions there is a characteristic sequence of commands called prologue, which is also suitable for identifying functions. Now consider all these topics in more detail.
Continuation is available only to participants
Materials from the latest issues become available separately only two months after publication. To continue reading, you must become a member of the Xakep.ru community.
Join the Xakep.ru Community!
Membership in the community during the specified period will open you access to ALL Hacker materials, increase your personal cumulative discount and allow you to accumulate a professional Xakep Score!
I am already a member of Xakep.ru