Introduction to Malware & Malware Obfuscation:
The fact that you're here means a part of you is curious enough and willing to learn this topic even, if you're still hesitant to fully invest your time you don't have to rush your self, Save this page for the later and know You've already completed the hardest step which is starting somewhere. Now, let me guide you through the rest if you're willing to stay
What is this about:
Beginner level guidance to windows malware development and obfuscation.
Malware theory:
Malware is just code. It's not magic. It's not built by goblins, It's built by ordinary everyday programmers who understand a specific set Programming concepts that very often harmless, which then they put to use towards a malicious purpose. All malware ultimately works to either damage your system, steal your information, or take control of it but mostly the ultimate goal is always being the extraction of your stolen data without you knowing, Back to a remote server that the programmers control. With that out of the way... in this course we got a stealer sample to play with later, BUt first as i said in order to reach that level we need to go and study the same set of concepts i talked about earlier, Be Aware i'm not presenting these subjects as the ultimate gospel, but I can confidently say that once you're comfortable with them, you'll have the confidence to explore the rest independently.
First you gotta start with :
A. Programing:
Your first and most critical goal is to learn real programming, not just code syntax !!
Learning syntax alone without knowing how to build is bad, It's just the fast track to tutorial hell, There is much to it other than that, There are others skills you need to learn in order to become useful in a project. Since this course will focus mainly on the Windows ecosystem there is nothing better to start learning than C or (c#) since most of the Windows api documentation is in c and the kernel was written in c anyways. Furthermore, you must understand how innocent software interacts normally with the surrounding environment. This means getting comfortable with the tools of the trade like libraries, APIs(note i'm not yet talking about the win api now!), and finally frameworks that enable your programs to communicate with the operating system and other applications. Relaying on a single language for everything is incredibly limiting. Can you imagine for a second and staying on the dn theme, how is it like to be an administrator for a dark net market lets take for example and you want to switch careers, like are you supposed to use php outside of web dev?, can you immagine how hard would it be to write a stealthy fileless virus in php?, Borderline fucking impossible. and this goes for other programmers in different fields too, would you write a website's backend in raw Assembly?, Are you going to create an interactive web page using only C++?, Or how about a info stealer malware in plain HTML perhaps, i hope you get the point i'm trying to make because trying to use one language for everything is the programming equivalent of trying to eat soup with a fork, This is literally the best explanation for it,And this is also really relevant to malware because you’re not just using the wrong tool, you’re fundamentally misunderstanding the entire environment. Don't get me wrong.
All programming languages can be "dangerous" in the same way that a steak knife can be a murder weapon.
But Only in there own little world.
With that being said last opinion i wanna make on this segment is that Programming languages are merely one drawer in a toolbox... they occupy just one big section besides reconnaissance, vision, and infrastructure and etc... With that gone out the way and we’re finished from talking about gears and boxes ,let’s shift gears to the next segment and rate my headjanitor cosplay from 1 to 10 i was contemplating insulting folks who use auto transmission to avoid getting asulted after saying this lame ass shit, but here i am asking for mercy.
B.Compiler Toolchains:
Now ⚠️A skilled driver⚠️ doesn’t just know how to steer the wheels. Best believe he also understands what’s happening under the hood.
In the same way, a good programmer should know their compiler tool chain inside and out as much as possible, It's basically like the car engine that turn gas(human readable code) into mechanical energy (machine code). You don’t have to rebuild it from scratch to prove a point. You just gotta learn how to use it. I’m emphasizing this because switching between compiler tool chains carelessly will cause you serious problems down the line. For example, Mingw is a port of GCC for Windows and my personal choice.I naturally lean towards Linux software because i started on linux but You'd have to make your own decision. While it works great, it’s not identical to the native Windows tool chain for example, There is quite literally differences in the Windows API headers like wrong import/export macros, mismatched type definitions to the msvc ones, incorrect Protection Flags and uncompleted headers that were reverse engineered, Unfinished and Uncompatible with the latest kernel headers like winnt, The last one can break your builds at any moment and sometime it will compile if you manually fill all the holes, but will cause weird runtime issues later, And the compiler errors are sometimes unsolvable due to how much work you need to do of course , I got an advanced course on malware obfuscation that is coming later , But for now we are still beginners and not up against any sophisticated antivirus tools yet to need that.
C. Windows API:
Think of this as the way that enables your malware to access features of the Windows operating system otherwise you
can't access otherwise with Standard ⚠️C Libraries⚠️.
Your Malware will typically accesses this API using system DLLs, which are shared libraries among all the programs in the
Windows user space.
Let me list a few example use cases of the Windows Api :
-
Retrieving information about a particular user account on a server:
-
NetUserGetInfo() <lmaccess.h>
-
Retrieving date time stamp: GetTickCount() <sysinfoapi.h>
-
Retrieving Temp Path: GetTempPathA() <fileapi.h>
-
Read file content into a buffer: ReadFile() <fileapi.h>
This is a simple C program that uses the Windows API to retrieve and display the name of the current user. Note that the program itself is not malicious by any means, but understanding these principles can serve as a stepping stone toward developing more complex malware.
include <windows.h>
include <stdio.h>
int main() {
BOOL R234; char UBdg23\[INFO\_BUFFER\_SIZE\]; DWORD NORB234\[INFO\_BUFFER\_SIZE\]; // R234 = Result of the function,
UBdg23 = Char Username Buffer, NORB234 Number of Received Bytes
R234 = GetUserNameA(UBdg23, &NORB234);
if(R234 == 0){
fprintf(stderr, "GetUserNameA() with err code %lu\\n", GetLastError);
return 1;
}
fprintf(stdout, "username is %s & is %lu long\\n",UBdg23, NORB234 - 1);
return 0;
}
**D.PEfile format:**
I've strategically placed this topic at the last place on the Theory segment because it is without exaggeration the most important one you'd have to definitely study. Recall all that rant on compilers? This is their ultimate output, The final exe binary. The Portable Executable, or PE as you may heard this suffix somewhere in the past, is the undisputed standard on the Windows operating system. It's the binary code container for nearly all user-space programs(.exe's), kernel drivers (.sys), shared libraries (DLLs), and maybe surprising for some of you linux users, it even extends beyond Windows. Your efi bootloader is a PE-formatted file, Chocking to know right ?, You can verify this for yourself.
Open a terminal and run:
Note: This path may vary. Adjust it for your specific Linux distribution.
file /boot/EFI/GRUB/\*.efi
Alright now, so the PE format is a massive topic, but we are not gonna get lost in the sauce. our focus is gonna be on specific data sections this course. We're only gonna tear apart the pieces that matter for our sample. Now, to make this clear, I'm gonna have to use some third-party image hosting for the diagrams. I don't love it, but the alternative is making this whole course look like a long cringey diary, and I don't want to scare anyone off. I get how confusing this can be. When I first started looking into PE files, I was so lost I ended up making my own diagrams in GIMP just to remember how everything connected because there's headers for example that were referred to with a lot of names and that goes for structs inside the data sections that the section headers pointed to. I also scavenged screenshots from YouTube videos and pieced it all together at the end(this is a lie , stay tuned for part2). I've also got a list of the best resources that actually helped me learn this, and I'll drop them all down below . I'll link them in the sample code too, just in case I forget.
-
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers
-
https://medium.com/@jasemalsadi/understanding-pe-file-for-reversers-9aa3b59ab13c
Coding the osint sample:
At this point in writing, I've been pulling info from every everywhere and I kinda lost track of who I even made this course for. And I’ve noticed this course is getting way too long for my liking. But rest assured if you haven’t studied the PE format yet, I got you I’ll explain everything we’ll actually be interacting with in this segment like for example the Import Section, the ReadOnly Section, the Resources Section. I’ll break down what they store, and more importantly, how to find them and extract data manually or do any other operation on them without any tools just using simple math and formulas. I'll be focusing on what you need to use, not what you need to memorize so i highly recommend you put the time in and STUDY!
Finally enough yapping and let's start working towards obfuscating this malicious osint sample but first we gotta code the sample code together to get there .
Here's some terms i may will be using:
-
ImageBase :This the preferred starting address in virtual memory where a PE file wants to be loaded.
-
Virtual Address(VA) = This is the actual memory address of an object while program is running, calculated using
ImageBase + RVA.
-
Relative Virtual Address(RVA) = This is the relative offset/address within a loaded IMAGE(Please research this term) -> ImageBase - VA
-
Data Directory = An array of IMAGE_DATA_DIRECTORY (you can find on winnt.h) entries(8 bytes each), the initial 4 bytes give the RVA the rest gives size of the specific table or resource in the PE file.
-
Section header = It's a Array of IMAGE_SECTION_HEADER structure that describe the section im going to talk about, examples: .text, .rdata, [url =].idata, .rsrc.
-
Cipher = The algorithm that transforms plaintext into something called ciphertext (encryption/obfuscation) and back ( decryption/de obfuscation), Look into the cesar cypher.
Osint tool sample:
NEVER USE VAR NAMES WITH REAL MEANING, THIS ONLY FOR CLARITY
why should you care about obfuscating malware?
So, why should you care about obfuscating your code?
Let's break it down simply.
At its core, malware is just a series of ones and zeros. What security companies do is scan these files to find a malicious sequences of bytes. Think of it like a fingerprint for the malware. When they find a distinctive pattern that identifies what the malware is or does, they create a "signature" from it. This signature gets added to a massive shared database that your antivirus software checks against. So, if your program's byte sequence matches a signature in that database, the antivirus will immediately flag it as malicious and quarantine or delete it. OBFUSCATION is how you change that fingerprint so your program doesn't get recognized as malicious, But unless you're writing custom code you will be fine with windows until eventually what I mentioned above happens.
Please stay tuned for introduction to Malware & Malware Obfuscation ! 2/2
What's coming in the next course:
How to build this sample
-
Running this sample through detect it easy & virus total and looking for anything suspicious that might stick out and plan out ways to hide them.
-
Inspecting the on disk Pe sections that Virus total interact with in order to pull those important information. ദ്ദി(˵ •̀ ᴗ -˵✧)
-
Fixing any logic errors i may have in the sample code.