C++ Reversing Series - 0x00
Table of Contents
The more functional and developable a product is, the more complex it is. This complexity made it difficult to create “quality” applications in the software development world where functional programming was previously used effectively. Instead of functional programming, the object-oriented programming paradigm, which was introduced with a higher level approach, was introduced and enabled developers to reveal large projects more clearly by writing code that can be associated with real life.
The advantages of object-oriented programming for the programmer are indisputable, but do the same conditions apply to the reverse engineer analyzing an application?
The answer to the question is clearly no because while software coded with a classical (i.e. functional) approach is compiled by the compiler as is, software coded using OOP structure is compiled by the compiler using different approaches and rules. A reverse engineer who is not familiar with these approaches and rules has difficulty in analyzing such applications. Especially if advanced OOP features such as polymorphism and dynamic binding are utilized, this situation becomes unmanageable. In this article, I will talk about the topics and details of how to reversing C++ applications written using OOP. Since defining C++ structures in sophisticated and advanced applications and shortening the reverse engineering processes will not fit in one article, I will make this a series :)
Class & Struct Similarity
We mentioned that OOP provides great benefits and can simplify complex structures. There is an unknown here, isn’t there? If you have analyzed applications developed with traditional programming (ex: a malware written in C) and you are familiar with the x86 architecture and instruction set, it is not so unknown for you. Because the “high-level” concepts we are going to talk about are actually based on the basics you already know.
// class_struct.h
class cls_material{
public:
char code;
int count;
bool avail;
};
struct str_material {
char code;
int count;
bool avail;
};
The class and struct we defined above are basically the same. The compiler interprets and compiles both in the same way. Let’s make implementations to test this.
// class_struct.cpp
struct str_material gold;
cls_material copper;
gold.avail = true;
gold.code = 17;
gold.count = 15;
copper.avail = false;
copper.code = 18;
copper.count = 23;
In the small code example we kept data about the materials. An important point here is that the resources are initialized on the stack (we will also talk about dynamic initialization). There doesn’t seem to be any difference at the source code level. Let’s dissassemble and take a look.
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 sub esp, 18h ; allocated for local variables
.text:00401006 mov [ebp+var_4], 1 ; gold.avail
.text:0040100A mov [ebp+var_C], 11h ; gold.code
.text:0040100E mov [ebp+var_8], 0Fh ; gold.count
.text:00401015 mov [ebp+var_10], 0 ; copper.avail
.text:00401019 mov [ebp+var_18], 12h ; copper.code
.text:0040101D mov [ebp+var_14], 17h ; copper.count
.text:00401024 xor eax, eax
.text:00401026 mov esp, ebp
.text:00401028 pop ebp
.text:00401029 retn
.text:00401029 _main endp endp
We have proved that there is no difference at the assembly level, and that struct and class are the same up to the memory footprint. In the simplest terms, we can say that classes are essentially the same as structs and that these two structures are really just a collection of memory addresses of types.
Class Constructor
Constructor is one of the most important concepts we need to recognize in order to reverse the OOP structure in C++. Let’s look at the declaration and definition of the constructor.
// constructor.hpp
class Material {
private:
char code;
int count;
bool avail;
public:
Material(char _code, int _count, bool _avail) : code{ _code }, count{ _count }, avail{ _avail }{}
char getCode();
int getCount();
bool isAvail();
};
// constructor.cpp
char Material::getCode() {
return code;
}
int Material::getCount() {
return count;
}
bool Material::isAvail() {
return avail;
}
// main.cpp (Implementation)
int main() {
Material* gold = new Material(17, 15, true);
std::cout << gold->getCode() << "\n" << gold->getCount() << "\n" << gold->isAvail();
return 0;
}
The constructor of our Material class takes the attributes at initialization time and makes the variables ready for use. There are also related member functions that return each attribute. In its implementation, it is worth noting that we use dynamic initialization compared to the previous simple class example.
.text:004012E3 push 0Ch ; Class boyutu
.text:004012E5 call ??2@YAPAXI@Z ; operator new(uint)
.text:004012EA mov [ebp+Block], eax
.text:004012ED sub esp, 8
.text:004012F0 mov [ebp+var_4], 0
.text:004012F7 mov ecx, eax ; eax = Class pointerı
.text:004012F9 call sub_401260 ; Material::Material()
Before calling the constructor, the heap allocated memory area that will hold our class pointer is assigned to the ecx
register and the constructor of the Material class (sub_401260
), which is basically a member function, is called.
.text:00401260 push ebp
.text:00401261 mov ebp, esp
.text:00401270 push esi
.text:00401271 mov esi, ecx ; ecx = this pointer
.text:00401273 push 0Ah
.text:00401275 mov byte ptr [esi], 11h ; code
.text:00401278 mov dword ptr [esi+4], 0Fh ; count
.text:0040127F mov byte ptr [esi+8], 1 ; avail
.text:0040129B mov eax, esi ; this pointer moved to eax register for return
.text:004012A5 mov esp, ebp
.text:004012A7 pop ebp
.text:004012A8 retn 0Ch
We can see that the sub_401260
function is a constructor. The this
pointer, which is the secret pointer of our object, is used in the constructor through the ecx
register. The parameters we pass in the initialize phase are moved to the memory space allocated for our object according to their size.
After the object is constructed, the arrangement of the attributes in memory is as above (I don’t mention the memory padding and the size of the variables).
Member Function Calls
Let’s continue our example with our Material class again. Since the attributes are specified as private, we have written get functions (you can see their definitions in the example above).
.text:00401299 call sub_401260 ; constructor
.text:0040129E mov edx, eax ; edx = object pointer
.text:004012A0 mov ecx, edx ; ecx = object pointer
.text:004012A2 call sub_401250 ; Material::isAvail()
.text:004012A7 movzx ecx, al
.text:004012AA push ecx
.text:004012AB mov ecx, edx ; ecx = object pointer
.text:004012AD call sub_401240 ; Material::getCount()
.text:004012B2 push eax
.text:004012B3 call sub_401230 ; Material::getCode()
We can call our member functions via our object pointer (this
). Note that C++ compilers (at least MSVC) assign the this
pointer to the ecx
register when calling a member function of an object. So the constant use of ecx may indicate the use of OOP primitives.
Inheritance
Inheritance is undoubtedly the concept that gives the OOP structure its inter-class relationship and extension structure. It is one of the two most difficult (in my opinion) fundamentals to reverse OOP based C++ applications.
It would be more useful to examine the topic of Inheritance under two headings as single and multiple. Let’s both explain and reverse with examples :)
Single Inheritance
In this situation derived class has only one base class.
// inheritance.hpp
class Plant {
private:
int age;
public:
Plant() : age{ 0 } {};
};
class Tree : public Plant {
private:
int leaf_count;
public:
Tree() : leaf_count{ 0 } {}
};
class Fruit : public Plant {
private:
int water_percent;
public:
Fruit() : water_percent{ 0 } {};
};
There are two important concepts when it comes to inheritance: base class and derived class. Derived classes take the attributes of base classes. Since we know that each plant has an age, we define this common attribute in our base class. And we inherit our base class to each plant type we create.
// main.cpp
int main() {
Tree oak;
Fruit apple;
return 0;
}
As you can see in the implementation, we have inherited the attributes of the base class to the derived classes. Let’s come to the reverse part, which is our topic, and this time let’s consider that we are creating our objects on the stack.
.text:004010D0 push ebp
.text:004010D1 mov ebp, esp
.text:004010D3 and esp, 0FFFFFFF8h
.text:004010D6 sub esp, 8
.text:004010D9 lea ecx, [esp+8+var_8]
.text:004010DC call sub_401090 ; Tree::Tree()
.text:004010E1 lea ecx, [esp+8+var_8]
.text:004010E4 call sub_4010B0 ; Fruit::Fruit()
.text:004010E9 xor eax, eax
.text:004010EB mov esp, ebp
.text:004010ED pop ebp
.text:004010EE retn
On the surface it looks like the constructors of the derived classes are called first, but the underlying concept is completely different.
.text:00401090 sub_401090 proc near
.text:00401090 push esi
.text:00401091 mov esi, ecx
.text:00401093 call sub_401070 ; Plant::Plant()
.text:00401098 push offset aTreeTree ; "Tree::Tree()\n"
.text:0040109D mov dword ptr [esi+4], 0
.text:004010A4 call _printf
.text:004010A9 add esp, 4
.text:004010AC mov eax, esi
.text:004010AE pop esi
.text:004010AF retn
.text:004010AF sub_401090 endp
.text:004010B0 sub_4010B0 proc near
.text:004010B0 push esi
.text:004010B1 mov esi, ecx
.text:004010B3 call sub_401070 ; Plant::Plant()
.text:004010B8 push offset aFruitFruit ; "Fruit::Fruit()\n"
.text:004010BD mov dword ptr [esi+4], 0
.text:004010C4 call _printf
.text:004010C9 add esp, 4
.text:004010CC mov eax, esi
.text:004010CE pop esi
.text:004010CF retn
.text:004010CF sub_4010B0 endp
.text:00401070 sub_401070 proc near
.text:00401070
.text:00401070 push esi
.text:00401071 mov esi, ecx
.text:00401073 push offset Format ; "Plant::Plant()\n"
.text:00401078 mov dword ptr [esi], 0 ; age = 0
.text:0040107E call _printf
.text:00401083 add esp, 4
.text:00401086 mov eax, esi
.text:00401088 pop esi
.text:00401089 retn
.text:00401089 sub_401070 endp
Now everything is clearer. Even though it looks like the constructors of the derived classes are called first, the compiler actually calls the base class constructor first. After the base class is constructed, the other elements in the derived class are constructed. Maybe it seems confusing, let’s describe it in our code:
Tree() : leaf_count{ 0 } {}
|
|
v
Tree() {
Plant::Plant(); // you might think there is a secret call
leaf_count = 0;
}
Multiple Inheritance
Derived class has more than one base class. Let’s update our existing Tree class and add Forest as a base class.
class Forest {
private:
int numof_trees;
public:
Forest() : numof_trees{ 0 } { printf("Forest::Forest()\n"); }
};
class Tree : public Plant, public Forest {
private:
int leaf_count;
public:
Tree() : leaf_count{ 0 } { printf("Tree::Tree()\n"); }
};
Currently we have a derived class with 2 base classes. Let’s disassemble it and see which one will be constructed first.
.text:00401090 sub_401090 proc near ; Tree::Tree() constructor
.text:00401090 push esi
.text:00401091 mov esi, ecx ; esi = this pointer
.text:00401093 call sub_401050 ; Plant::Plant()
.text:00401098 lea ecx, [esi+4]
.text:0040109B call sub_401070 ; Forest::Forest()
.text:004010A0 push offset aTreeTree ; "Tree::Tree()\n"
.text:004010A5 mov dword ptr [esi+8], 0 ; leaf_count = 0
.text:004010AC call _printf
.text:004010B1 add esp, 4
.text:004010B4 mov eax, esi
.text:004010B6 pop esi
.text:004010B7 retn
.text:004010B7 sub_401090 endp
We can see that base classes are constructed from left to right, unlike parameter passing. Let’s change the initialize values of the class variables to leaf_count = 1, age = 2, numof_trees = 3
, add the attribute int root_size = 50
to the Plant class and see how the object belonging to the Tree class is arranged in memory with multiple member variables.
.text:00401070 sub_401070 proc near ; Plant::Plant()
.text:00401070
.text:00401070 push esi
.text:00401071 mov esi, ecx ; esi = Tree this pointer
.text:00401073 push offset Format ; "Plant::Plant()\n"
.text:00401078 mov dword ptr [esi], 2 ; age = 2
.text:0040107E mov dword ptr [esi+4], 32h ; root_size = 50
.text:00401085 call _printf
.text:0040108A add esp, 4
.text:0040108D mov eax, esi ; return this
.text:0040108F pop esi
.text:00401090 retn
.text:00401090 sub_401070 endp
.text:004010A0 sub_4010A0 proc near ; Forest::Forest()
.text:004010A0 push esi
.text:004010A1 mov esi, ecx ; Tree this pointer
.text:004010A3 push offset aForestForest ; "Forest::Forest()\n"
.text:004010A8 mov dword ptr [esi], 3 ; numof_trees = 3
.text:004010AE call _printf
.text:004010B3 add esp, 4
.text:004010B6 mov eax, esi ; return this
.text:004010B8 pop esi
.text:004010B9 retn
.text:004010B9 sub_4010A0 endp
In Multiple inherit, we can see the content of the object created for the derived Tree class more clearly. As in Single, we have understood that it contains all the attributes of the base class in order from right to left.
Polymorphism
Polymorphism, which stands for Polymorphism, basically allows us to behave “of another type and in more than one form “. Polymorphism is most common for us reverse engineers with the concept of dynamic dispatch. Virtual methods are used to call the same function (type of function, number of parameters and types) of the most derived class that is derived and overridden from the base class and is one of the ways to provide dynamic dispatch.
Let’s continue with our Tree and Plant class examples, but this time simplify them a bit.
// poly.hpp
class Plant {
private:
int age;
public:
Plant() : age{ 20 } {}
virtual void create() { std::cout << "New plant type created!\n"; }
void del() {std::cout << "Plant type deleted!\n";
}
};
class Tree : public Plant{
private:
int leaf_count;
public:
Tree() : leaf_count{ 10 } {}
virtual void create() { std::cout << "New tree created!\n"; }
void del() { std::cout << "Tree deleted!\n"; }
};
The create() and del() methods, which are exactly the same, can be found in both base and derived classes. In order to fully understand the function of virtual methods, let’s be aware that our del() method is not a virtual method.
// main.cpp
int main() {
Tree* oak = new Tree;
Plant* plt { oak };
plt->create();
plt->del();
return 0;
}
output:
New tree created!
Plant type deleted!
When the virtual method is called from the base class, the method of the most derived class is called, while in the non-virtual method, it calls the method of the class (Plant) from which it is currently called without any lookup action on the objects.
So what exactly is this lookup and how does the compiler call the most derived method?
Virtual Function Table (VfTable)
Virtual function invocation, also referred to as dynamic binding, takes place at runtime and is as far as possible independent of compiler optimizations. The VfTable contains the address of each virtual method in sequential order and is dispatchable by the table.
.text:00401030 sub_401030 proc near ; Tree::Tree()
.text:00401030 call sub_401000 ; Plant::Plant()
.text:00401035 mov dword ptr [ecx], offset ??_7Tree@@6B@ ;&Tree::vftable
.text:0040103B mov eax, ecx
.text:0040103D mov dword ptr [ecx+8], 0Ah ; leaf_count = 10
.text:00401044 retn
.text:00401044 sub_401030 endp
.text:00401000 sub_401000 proc near ; Plant::Plant()
.text:00401000 mov dword ptr [ecx], offset ??_7Plant@@6B@ ; &Plant::vftable
.text:00401006 mov eax, ecx
.text:00401008 mov dword ptr [ecx+4], 14h
.text:0040100F retn
.text:0040100F sub_401000 endp
Since it is a base class, the Plant::vftable reference will be assigned to the object pointer, but later the Tree::vftable reference will be replaced and the Tree::create() method will be called. Since we first create an object from the Tree class and reference it to the Plant class, the compiler changes the vftables. The reason why we specify as independent of compiler optimization as possible is also explained in practice.
.rdata:00403204 ??_7Plant@@6B@ dd offset sub_401010 ; Plant::create()
.rdata:00403208 dd offset ??_R4Tree@@6B@ ; const Tree::`RTTI Complete Object Locator'
.rdata:0040320C ; const Tree::`vftable'
.rdata:0040320C ??_7Tree@@6B@ dd offset sub_401050 ; Tree::create()
We can also understand the content of VfTable from the above disassemble output. For a better understanding, let’s specify it via the UML template and show it in the memory array.
plt: 0C 32 99 00 | 14 00 00 00 | 0A 00 00 00
&vftable age leaf_count
&vftable(0099320C): 50 10 99 00
&Tree::create()
We have clearly explained the vftable sharing and memory allocations between base and derived classes.
.text:00401073 call sub_401030 ; Tree::Tree()
.text:00401078 mov ecx, eax ; eax = oak object pointer
.text:0040107A mov edx, [eax] ; [eax] = vftable reference
.text:0040107C call dword ptr [edx] ; [edx] = Tree::create() reference
.text:0040107E call sub_401020 ; Plant::delete()
In the disassemble fragment above, we can see how vftable is used. In our implementation, we override and call the Tree::create()
method. We can call the method at runtime via vftable, which is referenced in the first 4 bytes of the object pointer. Then, since the virtual keyword is not used, the Plant::delete()
method is called and the program terminates.
References
Microsoft Word - Reversing_CPP.doc (blackhat.com)
Reversing C++ programs with IDA pro and Hex-rays – Aris’ Blog (0xbadc0de.be)
comments powered by Disqus