There were many thoughts about what should I write about in my first time ever post in my own reversing blog. Fortunately for me, my will to publish my findings and the reversing challenge, that my good friend has kindly pointed me to, resulted in the birth of this first post. So, I’d like to write about the steps I took to accomplish the challenge and what eventually I’ve learned from it at the end. I hope someone will find something new and enriching for himself here as I did. I’ll be very glad to hear any comments on this post. So, here we go.
The challenge page has pointed me to some binary and the goal was to find out the algorithm that the binary used to prepare something before it communicates with its CnC. The algorithm must be converted from the assembly to the high level language representation.
Despite the fact, that I’m really new to the reversing world, I’ve already learned that I must decide as precise as possible what I’m looking for, before I’m even diving into the binary. So, considering the above and the challenge question, my starting point was to look for functions involved in communications and examine them for unique parameters:
- communication protocol
- domain names
unPacking the things
So, we are looking for an API call that could establish communication with the outer world. Let’s take a look at import table and try to find such API:
Figure 1: Import Directory
This table looks pretty empty to me as it has a very small number of functions and not the ones I’m looking for. In addition a small check in IDA for file layout showed little code and lots of data:
Figure 2: IDA file layout
So, I guess the author has packed the file and the probable solution to this situation is to get closer to our buddy with my other friend Olly. After the opening of the file in Olly, I’ve landed at the Entry Point and little examination has showed that it was responsible for analyzing the stack and calculating the base address of kernel32.dll module. There were hard coded values of function offsets from the base address of the kernel32:
Figure 3: Using offsets to load the functions
The important ones were :
which were used to allocate space and move there 0x2B2A of data:
Figure 4: Moving code before decryption
and it can be seen that the data is taken from the start of the file – remember IDA analysis we have seen earlier. On the next step, the moved code was decrypted and the control transferred to it using SEH for “Access Violation Exception”:
Figure 5: Triggering “Access violation exception”
So, I’ve landed at the Entry Point (after decryption) and the first thing to check if we are able to transmit to the outer world. Looking the loaded modules list shows nothing of particular interest which means, the malware needs to load libraries before doing anything “useful”:
Figure 6: Loaded modules at the decrypted EP
And after the load is finished, we get a very different picture:
Figure 7: All the needed modules
Looking at the Fig.7, one can notice two libraries that could be used for communication – winInet.dll and ws2_32.dll. As winInet.dll requires the second one, I’ve decided to concentrate on the winInet.dll at first. Analyzing inter-modular calls, I was hoping to find a very specific functions that initiate the connection and send the data to the remote host:
- InternetConnect – the functions contains the actual server that malware wants to connect to, probably having the dynamic name generation.
- HttpSendRequest – the functions responsible for sending the request which could contain dynamic parts, like ID of the session/machine on which it is installed.
Fortunately for me, there were not so many references to the above functions, so I’ve decided to start from the second function – HttpSendRequest. Following the first reference call, I’ve landed at the calling function which looked very promising as it was responsible, among other things, for request string generation.
Figure 8: Suspicious pattern for the ID var
Going over the request generation (Fig. 8), I’ve found the id parameter which was concatenated with XXX_xxxxxxxxxxxxxxxxxxxxxxxxxxxx. This looks like some reserved space to me. To verify my “intuition”, I’ve followed the execution of the malware and monitored the above string which was stored at [0x00390B93]. Eventually, the XXX_… was replaced with USA_VkJlY2ZkNGRiZi1lY2RiNTljMl8 and BINGO, I was right. So now, the question is, how this id was generated. I looked for the references to the pre-allocated buffer in the code. The reference list showed only 4 potential places to start the quest from. Quickly analyzing them, I concentrated only on one of them:
Figure 9: Following the ID buffer
The special thing about this place was the fact that this buffer was used to store the data from the registry query with the key name “w8″. Obviously, if there was a query for that value, there was some place that it was responsible to store it in the first place. Fortunately, this place was right above my current position. RegSetValueExA API call and the buffer with the needed data was at [EBP-4] which is the one, I need to follow to get to the initial calculation of the id. This being said, I jumped to the beginning of the function to start the tracking.
Figure 10: Allocate buffer for ID
It was easy to see, that [EBP-4] got the address of the allocated buffer and in addition we get a new player in the game – [EBP-8] which points 0x100h bytes further in the newly allocated memory. Deeper into the function, it could be seen that those two “buffers” are actually parameters that are supplied to 2 functions, where [EBP-4]:
Figure 11: First part of the ID – the Locale
gets the locale info as the result of calling GetLocaleInfoA and this is actually shows why we had USA in the final id as my lab computer had USA English locale, and [EBP-8] got supplied to some local defined function. The examination reviles that the uniqueness of the id is based on the DeviceIoControl with the following supplied parameters:
Figure 12: Parameters to DeviceIOControl API call
Where IoControlCode and InBuffer (StorageDeviceProperty = 0) are of particular interest, because it results in the filling the following struct holding the HDD information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
where the final key piece of data is a serial number of the drive. The serial number of the drive is actually transferred to the third function as a dictionary the final part generation of the id and this based on the following algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
In addition to the above, it appeared that domain names were also dynamically generated. Looking into inter-modular calls in Olly, I’ve noticed 3 calls to InternetConnectA which among its parameters had a domain name. Those 3 functions were the starting points.
Figure 13: Domain name buffer to follow
The idea is to follow the second parameter and look for the peace of code that changes it. So, as the Fig. 13 shows, the address of interest is 0x3900B4h.
Olly presented not very long list of references to the buffer and its examination did not take much time and effort. In many places it was used to store the data, that was retrieved from the registry query. In addition one of the examined function reviled that buffer data was saved by the following registry keys: “pre”, “net”, “tst” and “prh” and examining Fig. 14 it can be seen that probably altered 0x3900B4h buffer was saved again by the “tst” key.
Figure 14: Manipulating suspicious domain name buffer.
Based on the above findings the analysis was concentrated inside the function that was shown on Fig.14. Going over the Fig. 14 code, I spotted a loop that was constantly waiting to get a positive result while the only parameter to the function was the buffer of interest. Diving at the address 0x396B8F the following function is observed as on Fig. 15 which clearly changes the initial buffer
Figure 15: Changing the buffer
and translating the code into something more readable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44