How does allocation and deallocation work? How do regular and smart pointers work in C ++? How to recognize memory operators using a disassembler that does not understand their true nature? In order to understand all this, we have to disassemble the mechanisms for allocating the dynamic memory of the application (in other words, the heap) of the two most popular compilers by bytes and identify the differences in their work. Therefore, in the article we are waiting for a lot of disassembler listings and C ++ code.
Fundamentals of Hacking
Fifteen years ago, Chris Kaspersky's epic The Fundamentals of Hacking was the handbook of every aspiring computer security researcher. However, as time goes on, and the knowledge published by Chris loses its relevance. Hacker's editors tried to update this voluminous work and bring it from the days of Windows 2000 and Visual Studio 6.0 to the days of Windows 10 and Visual Studio 2019.
Look for links to other articles from this series on the author's page.
Identifying the this pointer
this – this is a real golden key, or, if you like, a lifeline that allows you not to drown in the stormy ocean of the OOP. It is thanks to
this you can determine whether the called function belongs to a particular class. Since all non-virtual functions of an object are called directly – at the actual address, the object is, as it were, split into its constituent functions at the compilation stage. Don't be pointers
this, it would be fundamentally impossible to restore the hierarchy of functions!
Thus, the correct identification
this very important. The only problem is how to distinguish it from pointers to arrays and structures? After all, an instance of a class is identified by a pointer
this (if allocated memory is pointed to by
this, this is an instance of the class), however
this by definition, it is a pointer that refers to an instance of the class. A vicious circle! Fortunately, there is one loophole … Pointer manipulation code
this, is very specific, which makes it possible to distinguish
this from all other pointers.
Actually, each compiler has its own handwriting, which is highly recommended to learn by disassembling your own C ++ programs, but there are general guidelines that apply to most implementations. Since
this Is an implicit argument of each function – a member of a class, then it is logical to postpone the conversation about its identification until the section "Identification of function arguments" Here we will discuss how to implement pointer passing
this the most popular compilers.
Here we are of course talking about the x64 architecture. On a 32-bit platform, parameters aligned to 32-bit size are passed through the stack. On the other hand, things are more interesting on a 64-bit platform: the first four integer arguments are passed in registers
R9… If there are more integer arguments, the rest are placed on the stack. Floating-point arguments are passed in registers
XMM3… In this case, 16-bit arguments are passed by reference. Note that this is all about the calling convention in Microsoft operating systems (Microsoft ABI), but not on Unix-like systems. But let's not spray our attention on them.
Both the compilers I tested, Visual C ++ 2019 and C ++ Builder 10.3, regardless of the function calling convention (
__thiscall) pass a pointer
this in the register
RCX, which corresponds to its nature:
this Is an integer argument.
Identifying the new and delete operators
delete are translated by the compiler into calls to library functions, which can be recognized in the same way as ordinary library functions. In particular, IDA Pro can automatically recognize library functions, removing this concern from the shoulders of the researcher. However, not everyone has IDA Pro and is far from always at the right time at hand, and besides, she does not know all the library functions, and from those that she knows she does not always recognize
delete… In a word, there are plenty of reasons to manually identify them.
delete can be anything, but most Windows compilers rarely implement heap functions on their own. Why is this? It is much easier to access operating system services. However, it is naive to expect instead of
new call appearance
HeapAllocand instead of
HeapFree… No, the compiler is not that simple! Can he deny himself the pleasure of "carving nesting dolls"? Operator
new is translated into function
newcalling to allocate memory
malloc the same, in turn, refers to
HeapAlloc (or its similarity – depending on the implementation of the library for working with memory) – a kind of "wrapper" for the Win32 API procedure of the same name. The picture with freeing memory is the same.
To delve into the jungle of nested calls is too tedious. Is it possible
delete identify in some other way, with less effort and without unnecessary headaches? Of course you can! Let's remember everything we know about new:
- new takes a single argument – the number of bytes of allocated memory, and this argument in the overwhelming majority of cases is calculated at the compilation stage, that is, it is a constant;
if the object contains neither data nor virtual functions, its size is equal to one (the minimum block of memory allocated only so that there is something to point to the pointer
this); there will be a lot of calls from here like
mov ecx, 1 ; size
XXXand there is an address
new! In general, objects are typically less than a hundred bytes in size … look for a frequently called function with a constant argument less than one hundred bytes;
new– one of the most popular library functions, look for a function with a crowd of cross-references;
the most characteristic:
newreturns a pointer
thisvery easy to identify even with a cursory glance at the code (it usually returns in a register
newthe result is always checked for equality to zero (by operators like
RCX), and if it is indeed zero, the constructor (if any) is not called.
new more than enough for quick and reliable identification, there is absolutely no need to waste time analyzing the code of this function! One thing to keep in mind:
new is used not only to create new instances of objects, but also to allocate memory for arrays (structures) and, occasionally, for single variables (such as
int , which is generally insanity, but some do it). Fortunately, it is very easy to distinguish between these two methods – neither arrays, nor structures, nor single variables have a pointer.
Harder to identify
delete… This function has no specific features. Yes, it accepts a single argument – a pointer to the memory region to be freed, and in most cases it is a pointer
this… But besides her,
this accept dozens if not hundreds of other functions! Earlier in the era of 32-bit stones, the researcher had a handy clue that
delete took a pointer in most cases
this through the stack, and the rest of the functions through a register. At the present time, as we have seen more than once, any functions accept parameters through registers:
mov rcx, (rsp+58h+block) ; block
call operator delete(void *,unsigned __int64)
In this case, IDA without confusion recognized
delete returns nothing, but how many functions do the same thing? The only clue is the challenge
delete follows the call to the destructor (if any), but since the constructor is identified as a function preceding
delete, a vicious circle is formed!
There is nothing left but to analyze the contents of the function:
delete sooner or later calls
HeapFree (although there are options here: for example, Borland / Embarcadero contains libraries that work with the heap at a low level and free memory by calling
VirtualFree). Fortunately, IDA Pro recognizes in most cases
delete and you don't have to strain yourself.
What happens if IDA does not recognize
delete? The code will look something like this:
mov rcx, (rsp+58h+block) ; block
cmp (rsp+58h+block), 0
jnz short loc_1400010B0
A shallow analysis shows: in the first line in the register
RCX, obviously to be passed as a parameter, a block of memory is placed. It looks like it is an entity pointer. And after the call
XXX this block of memory is compared with zero, and if the block is not zeroed, the address is jumped. In this simple way, we can easily identify
deleteeven if IDA does not define it.