The content of the article
If you have ever encountered a virus, you probably remember that the experience is not pleasant. But there are ways to catch and dissect the malware. This is what virus analysts are doing. Today I will introduce you to the basics of this work. We will go through all the main stages of the analysis and see how to use professional tools. I will try to explain the basic terminology along the way.
The first steps
So, let's imagine for starters that we have an infected machine in our hands. First of all, you need to perform three basic steps: isolate the computer from others on the network, make a memory dump and remove the disk image.
When disconnecting an infected computer from the network, it is worth remembering that if we are dealing with an encryption virus, there is a chance to lose user data. Malware can start encrypting them when it disconnects from the network, so first of all, evaluate how important the data on the infected PC is and whether it is worth the risk.
It is best to create a virtual network (VLAN) for quarantine with Internet access and move infected PCs there. In a large company, this will require coordinated actions by cybersecurity service employees and network administrators, and in a small company, it will require the diverse skills of a single system administrator.
Next, you need to remove the memory dump. This is done from the very beginning, because when copying a hard disk, you will need to turn off the computer and the contents of the RAM will be lost. The program that removes the dump is best run from an external media so as not to leave unnecessary traces on the hard drive – we will need the data from it unchanged.
To prevent the malware from reacting to turning off the system, it is easiest to pull out the power cable. The method, of course, is barbaric, but it guarantees instant shutdown. Keeping the data on your hard drive intact is more important to us than worrying about the health of other components.
Next, start the PC using bootable media with any operating system in forensic mode and software for obtaining a hard disk image. The most commonly used disk image is in the E01 format (Encase Image File Format). It is supported by a large number of applications for both Windows and Linux.
Forensic mode is boot without mounting physical disks. This allows you to exclude any changes to the examined disk. Professional criminologists also use devices to block any attempts to access the disc for recording.
The memory and disk dumps most likely contain everything we need – the body of the malicious program and the traces left by it. We can start the search.
I recommend that you start by examining the copy of the disk, and use a memory dump if no traces can be found on the disk or as an additional source. When analyzing the contents of a disk, we primarily pay attention to startup, task scheduler, and the initial sector of disk loading.
To understand how the malware penetrated the computer, it is worth examining the time it took to create and modify the files on the disk, the browser history, and the user's mail archive. If you can establish at least an approximate time of infection, then this will be a plus. On the Internet there are good selection of paths in the file system and registry, which are worth paying attention in the first place (for example, a good Group-IB article about it).
You can also extract some unique information from a memory dump, for example, what documents and browser tabs were open at the time of infection, what were the active network connections, work processes (including hidden ones).
When you have a suspicious file in your hands, you will need to somehow make sure that this is an instance of the malicious program. Of course, if this is a known virus, then it can be detected by an antivirus (there is a website for checking all anti-virus engines at once virustotal.com) But more and more often there are unique cases not detected by antivirus programs. Such malware has the status of FUD (Fully UnDetectable).
Also, it cannot be ruled out that you are faced with a zero-day (0-day) threat – one that no one knows about yet (and the developers have zero days to eliminate it – hence the name). Malvar, which operates zirodey and has the status of FUD, poses a serious threat not only to individual computers, but also to entire companies.
Before you begin analyzing a suspicious file, you must do the following:
- Prepare a stand for research – a virtual machine with an installed operating system suitable for launching a test file.
- Configure access to the Internet, preferably by ensuring that your real IP address is hidden so as not to lose contact with the Malware management servers (you can be recognized as a virus analyst and restrict access to hide some functions).
- Take a snapshot of the primary state of the virtual machine.
In no case do not connect the stand to the corporate network – this can cause mass infection of other computers.
There are two approaches to software analysis – dynamic and static. As a rule, for a better effect, they use a more suitable method for the situation or both methods simultaneously. For example, it may be necessary to study the behavior of a malicious program in order to identify characteristic markers without analyzing the algorithms. Therefore, the choice of methods and tools may change during the analysis.
Let's start with the static analysis, since it does not require the launch of malicious code and certainly will not cause infection on your computer.
Consider the initial headers for the Windows executable.
- A DOS file header, also known as a DOS stub. Thanks to him, it is possible to run the program in DOS (usually the inscription This program cannot be run in DOS mode is usually displayed). You can see the beginning of the title by characteristic letters
- Immediately after the first is the header used by modern OSs with all the necessary parameters for the executable file (for example, the offset to the import / export table, the beginning of the executable code section). The beginning of the title can be found by the characteristic letters PE, and a description of the format is on Microsoft website.
It can be said that the header of the executable file contains a guide on where and what exactly is inside the file itself, what permissions should be granted to the sections, all settings for correct operation, so analyzing the header can provide valuable initial information.
In addition, you need to look for string data in the files. Using them, you can subsequently identify such files or even obtain important information such as addresses of management servers.
Malvari developers are always faced with the task of masking their creation as much as possible and making detection and analysis more difficult. Therefore, packers for executable files are often used.
Packers – utilities for compressing and encrypting executable files. They are used not only by the creators of the malware, but also by the developers of legitimate software to protect their programs from hacking. Self-written packers specially designed to make analysis difficult are not always detected by programs and require additional actions from the analyst to remove the packaging.
A deeper (and at the same time complex) analysis of any executable files is the use of disassemblers. Assembler code is more understandable for humans than machine code, but because of the volumes, understanding it is far from easy. In some cases, it is possible to restore the source code of programs in a high-level language by decompilation – if you can determine which compiler and obfuscation algorithm were used.
Glossary of Terms
- Disassemblers – Programs for translating machine code into a relatively readable and understandable assembly language.
- Decompilation – restoration of the source code of the program in the original programming language.
- Obfuscation – changing the source code of the program so that its functionality is preserved, but it becomes more complicated and nothing adding garbage appears in it.
- Obfuscators – programs for the automation of obfuscation.
- Pseudo code – as a rule, this is the name of the informal language for describing algorithms, which allows us to present the assembled code in a more readable form. When transferred to pseudo-code, insignificant elements of the algorithm are discarded.
Assembler code analysis is a time-consuming process that requires a lot of time and good low-level programming skills, so for quick analysis, you can convert the resulting code into pseudo-code. Reading pseudo-code is more convenient than assembler, this can be seen in the following example, where the original assembly code is on the left and the pseudo-code is on the right.
Reading a pseudocode or assembly code is like unraveling a ball of thread – painstakingly and laborious. Therefore, you can use another type of analysis – dynamic. Dynamic analysis involves launching an executable file and tracking actions performed by it, such as accessing registry branches, sending and receiving data over the network, and working with files.
Dynamic analysis involves the launch of the investigated file. This must be done on a virtual machine isolated from other computers in order to avoid the possibility of malware spreading over the network.
You can analyze the program’s behavior by monitoring and intercepting the launch of applications at the OS level or by connecting to a working process and intercepting library and API calls. And in order to analyze in detail the process of program execution, it is best to use one of the debuggers.
Debugger is a utility or set of utilities that is used to test and debug a target application. The debugger can simulate the processor, and not run the program on real hardware. This gives a higher level of control over the execution and allows you to stop the program under given conditions. Most debuggers are also able to start the execution of the investigated code in a step-by-step mode.
It doesn't matter if you use a debugger or some kind of program that allows you to control API calls, you will have to work in manual mode. This means that it will require in-depth knowledge of the operating system, but you will get the most complete data about the object of study.
However, analysis can also be automated. To do this, use the so-called sandboxes – sandboxes for running software.
Sandboxes are divided into two types: offline and online. To get the most complete picture, I recommend using several data sources at once and not excluding some type of sandbox. The more information you collect, the better.
Since Malvari analysis always rests on practical skills, this article would not be complete without demonstrating it with an example. We will carry out express analysis and establish the nature of the executable file.
To begin with, we will determine the sequence of actions. Here is what we need to do:
- get the hash amount from the file;
- use the online service to check the file;
- collect static data from a file;
- check the file in the sandbox (local or on the Internet);
- run the file in a virtual environment to track actions;
- remove shells and get malware deployed in memory;
- parse the code in a disassembler.
Let's say we need to examine an unknown file
To store the file, I recommend first of all changing the extension, for example, to
Sample._exeto avoid accidental launch.
We take a snapshot of the virtual machine on which we will run the executable file (the initial state of the system will still be useful to us), and consider the hash sum of the file.
Copy the result and use virustotal.com for check.
As you can see from the screenshot above, the VT verdict is 54 out of 70. It is highly likely that it is malware, but let's not stop there and use another service – Any.run.
We see that it gives a similar result (see in the lower right corner). In addition, you can collect additional data about what the program did. Namely:
- after the start, she duplicated herself in her memory;
- accessed the server 184.108.40.206 on port 587. On the platform, you can see the network dump of interaction with the malware management server (they are often called Command & Control, C2 or C&C);
- Added a ban on starting the task manager;
- copied itself to a separate user directory;
- added itself to autoload.
Even if the verdict did not indicate a possible threat, disabling the task manager and adding it to startup does not bode well for the user, especially when you consider that all this was done immediately after launch.
So, already two services have confirmed that this is a malware. We continue. We use a tool called DIE.
As you can see in the screenshot, the malware is written in Visual Basic. You can easily google the structure of Visual Basic 6.0 programs and describe how they work. In short, they run in a virtual environment, which means we need to catch the moment when this code is unpacked in memory. You can also analyze the file structure and get the name of the project, the forms used and other data.
Another way to find out that the malware is written in Visual Basic is to use CFF Explorer.
CFF Explorer – A set of tools with a single minimalistic interface that allows you to view and, if necessary, edit all sections of the header of the executable file. Here you can see the import and export of functions from libraries, a list of the libraries themselves and the addressing of the sections.
In this case, we will see a characteristic imported library – its presence indicates that Visual Basic functions are used.
The next step is launching Hiew and, going to the beginning of the executable code, we find a function call from the library.
Hiew – binary editor with built-in disassembler for x86, x86-64 and ARM. They can also open physical and logical drives as a file. Hiew is a “lightweight” (unlike IDA) and at the same time very powerful program that allows you to make a first impression of the object being studied.
At this stage, it’s enough for us to know that when you start, Visual Basic code will be executed.
It's time to try pulling out the code and fixing the startup behavior. For this we need a prepared virtual machine with Windows, Process dump and API Monitor.
API Monitor – A program that allows you to control API function calls by applications and services on Windows, intercepts information about application launches or connects to an executable process to view the libraries and API calls that are used.
In the Monitor API, run
Sample.exe and we get the following picture: another process starts, after which the first is completed, then the program is added to startup.
We find the specified executable file, this is the original file recorded in the user directory.
The program also disables the ability to call the task manager.
This is already enough to say with confidence that the file is malicious.
Unload the working process from the RAM. We will use the utility for unloading a process dump – Process Dump. We take the PID from the API Monitor data, it is displayed next to the process name.
As a result, all libraries that the application uses will be unloaded. We also find that in the address space, in addition to the main executable file, there are also hidden ones, this can be seen below, the word hiddenmodule is in the file name.
We check each received executable file in the DIE.
We see that two of the three files are written in C ++, and one in VB.NET.
Let's pay attention to the application written on VB.NET. It can be opened with any debugger for working with .NET, for example dnSpy. And we get readable code in Visual Basic, all that remains is to remove obfuscation. In this case, the algorithm was complicated by adding a lot of code jumps using the command
Для анализа двух оставшихся файлов воспользуемся дизассемблером IDA.
IDA ― популярный интерактивный дизассемблер компании Hex-Rays. Имеет бесплатную и пробную версии, чего вполне достаточно для первичного знакомства. Также компания выпускает версию Pro. Основная задача программы ― это перевод исполняемых файлов из бинарного вида в читаемый код на ассемблере.
Как видно из примера, IDA позволяет получить код программы на ассемблере, но для более удобного просмотра и первоначальной оценки можно воспользоваться плагином Snowman и получить псевдокод.
Использование псевдокода упрощает анализ, но не всегда дает ожидаемый результат. Дизассемблирование и создание псевдокода выполняются автоматически, и у вирусописателей есть техники для их усложнения. Такое вот вечное противостояние меча и щита, интеллекта создателя малвари и интеллекта вирусного аналитика.
Мы, когда использовали API Monitor, уже выявили вредоносную сущность этого файла по совершаемым действиям. Но пока что не знаем, каков потенциал этой малвари. Чтобы получить полный алгоритм работы этого исполняемого файла, необходимо углубляться в анализ как ассемблерного кода, так и программы на Visual Basic, но это выходит за рамки статьи.
Если у тебя создалось впечатление, что мы бросили исследование в самом начале пути, то оно отчасти справедливо: здесь мы проделали лишь те действия, которые не требуют знания ассемблера. Однако, как видишь, провести экспресс-анализ и установить, чего можно ждать от малвари, вполне реально и без этого.
При полноценном же разборе потребуется глубокое понимание принципов работы операционной системы и, конечно, знание ассемблера.
Если ты серьезно решил встать на путь вирусного аналитика, то тебе поможет литература по reverse engineering, анализу малвари, системному программированию и ассемблеру, а также практика, много практики. Рекомендую решать крэкми и зарегистрироваться на hybrid-analysis.com для получения примеров работающих вредоносов. Тебя ждет долгий путь, но дорогу осилит идущий!