00:04
welcome to separately. Hi, My name is Sean Pierce from the Civic Matter. Extra firm.
00:08
A direction to malware analysis. Say we're gonna be talking about part two.
00:16
So how we go from raw assembly
00:22
Wealthy compiler created
00:26
raw execute herbal code and that included it in this container.
00:33
is called a P E file. And Windows usually uses these P files to know
00:40
what section of the code to begin executing what libraries it needs and what version of what compiler made it. And all this other metadata about the executed all code.
00:55
If you want to go explore that, I highly suggest
00:57
it because there's a wealth of information that we can use in there, such as time stamps like
01:04
ads in when this file was made, uh, usually says what its target platform and architecture for our what function calls is going to call and other information like that.
01:19
And there are several p e. Parsons p file format purser's P Explorers CFF Explorer and not C o FF IX for P I. D. P. Studio
01:34
010 Hex editor is really good if you use a P binary template.
01:42
Binary template. And, of course, you can make your own. It's not super hard to do, and there's a lot of work already done for that.
01:51
If you look at them our Alice book,
01:53
they have a lot of scripts to help you with that, and you get to know the file format very well.
02:02
What use is the P E file format? Simply don t X C files are the most common deal. Oh, files are the exact same format with just a few flags changed and a few more function
02:20
The deal. Oh, mein being the only one that a deal Oh, foul needs to export in order for it to be used and loaded up by the windows loader for execution
02:29
SRC files or screen saver files. But they execute code
02:35
Um, as a demonstration,
02:39
we'll open up our signal in terminal here.
02:44
And when we were talking last,
02:54
prince. Hello, world.
02:57
I'll just cat the hell of world file. You can see it
03:00
a very simple program.
03:06
to any one of these things.
03:15
list. We see that there is a doubt fon we're gonna execute that
03:22
and a prince Hello world. It's the same file format. Sometimes malware will use these other extensions, but it gets to be highly suspicious. So, um,
03:31
you know, most I've seen is SRC
03:37
this a d s r c. And we can execute a doubt as our C
03:43
and it's still executes it just like it would any other e x e file.
03:47
So it's good to be aware off this.
03:52
So we're talking about
03:53
assembly, and we're talking about actually executing it. I said I mentioned the flags were destroyed later,
04:03
why it's so important is because it keeps state
04:05
of the instructions.
04:08
And if you look over on the far right here we have some X 86 assembly.
04:12
The first line is move one into ta X.
04:16
The next line a compare instruction is moved to
04:20
or compare the X to the number two
04:25
instruction Below that is Jay Z jump If zero is what it stands for.
04:34
if the above operation
04:45
So the compare instruction
04:48
does the same thing is a sub strapped instruction
04:54
store the result back to the register.
04:58
All it does is change. The flags register.
05:02
when compared, the X
05:10
It would do a subtraction operation
05:16
This would result in negative one,
05:18
but it would not result in zero.
05:23
if you see in the middle picture zero flag
05:27
and right above it the sign flag, those would both Or the sign flag would be flipped. So it would be a one in the sign flag and the zero flag would be zero.
05:38
So false zero true one.
05:42
So the zero flag would be zero signs like would be one. And so since
05:46
zero was not flagged, this jump
05:50
will not work. It will not jump to the label is too,
05:56
which is on line six. It will just simply fall through to the label is not
06:02
labels don't actually do anything. There's no assembly equivalent. They're just placeholders. And so
06:11
what will happen is Line five would be executed where two will be moved into the ex register
06:18
will be executed. But since label doesn't actually
06:21
equate to any assembly code life seven will be executed. Zero will be moved into the axe.
06:30
So with these conditional jumps, compilers will turn if statements and loops and other control flow mechanisms,
06:41
ah, spaghetti coud or switch statements or whatever
06:47
There are other flags that you should be aware of, but those are the biggest. Is the sign flag in the zero flag
06:57
overflow. And under Flo flags will generally tell you if, ah, if you try to multiply or do something else and
07:04
you lost some information
07:06
Ah, the overflow and under Flo flag will
07:12
And sometimes code is good enough to check for that,
07:15
um, Carrie flag to see if
07:16
you know there's operation that had
07:19
you know what you add
07:20
added some bites to some register, and, uh,
07:31
belay said the most important ones are zero and sign.
07:38
I talked about the push and the pop instructions and the S P in the V p
07:43
registers. And I said they had to do with the stack. And
07:47
here we're gonna look at how excited six uses this data structure,
07:53
which is almost always at the top of memory, where the operating system is down at the bottom of memory.
08:00
and the heap where Malik and Kallick
08:09
stack is slowly growing down,
08:13
a really high memory address. And every time you do a push
08:18
it'll decrease the E S P, which points the top of the stack,
08:24
uh, the E V P is always above it. So the base pointer is always putting at the bottom of the stack, which is at the highest memory address. This might be kind of hard to wrap your head around, and a lot of people get confused by this.
08:41
care to just draw it out on a piece of paper every once in a while and that really helps, it helps me, at least,
08:50
so the push instruction
08:56
So it's kind of like you're adding
08:58
to the stack like you're You have a stack of plates
09:03
and, like all you can eat buffet or something. And you,
09:07
somewhere and every time you put a plate on top, the whole set time moves down a little bit.
09:15
if you want to grab a plate, you have to
09:16
pop off the one on top.
09:18
And if you pop something, then you will take the data from
09:24
that stack and the E S P value will increments
09:31
four bites. And, uh, there are pushing pop instructions that can
09:37
do, like 16 bites and
09:39
or 16 bits, which is two bites.
09:46
most common is push four bites and pop four bites.
09:50
And the call instruction will also affect the stack. Because each stack frame,
09:56
you know, between the E. S. P and the BP pointer
10:01
you've stacked frame
10:03
is the local scope for that function. So all local variables will be be between the ESPN E v p.
10:11
And if this sounds confusing, don't worry. You can look at some assembly,
10:16
uh, of what the functions
10:18
and you will see how the stack is being manipulated
10:22
for each function. Cole.
10:26
when you call something.
10:30
You push the E i P value
10:35
So wherever you returned place in memory is,
10:39
or the next instruction is to be executed, is pushed onto the stack,
10:43
and then you can jump to a location
10:46
and memory, another location.
10:48
And when you call the Rhett instruction or the return instruction
10:54
you're going, the E I. P is going to be repopulated
10:58
that the call instruction had pushed onto the stack. And it is going to,
11:05
resume execution in that place that you called from.
11:09
Yes, it sounds confusing. Don't worry.
11:11
You're gonna see some assembly, and it's gonna make a bit more sense.
11:16
Stack grows downwards.
11:22
when you look at assembly and there's local variables
11:26
and we'll see an example and a little bit,
11:28
you'll see them usually addressed from E B P.
11:35
plus a certain value
11:37
or, more appropriately, E V p minus a certain values. So if I want to access
11:50
it'll be e V P minus four.
11:58
do a print F and pass it a single parameter
12:01
Hello world or a pointer to the string Hello, world
12:07
The functional reference that for Amber
12:13
it accesses that pointer
12:20
I'm gonna let that sink in for just a minute.
12:22
So just a little miscellaneous information. There is something called a NOP instruction that stands for no operation and eyes, actually,
12:31
but it is an alias to the exchange.
12:35
Uh, yea x e x instruction. So it moves E X into be a X, so a result in nothing.
12:43
The napkins instruction is very useful. If you wanna manipulate malware so that you can say, Oh, I see it's doing a function called check to see if there's any bugger and dies. If
12:54
there's a debunker so it
12:56
knobs out, you can just put in op instructions in there, and that will result in the compiler
13:05
or in the CPU just completely doing nothing. For those instructions, I can show you example of knocking out some instructions.
13:16
I showed you the flags register bit Mask is good to know,
13:22
because it's doing basically a logical and isolate a piece of memory that it wants to. So
13:26
here are some examples where we want to get to a certain bit or a certain bite
13:33
to see what the value is.
13:35
And so we do a logical and
13:37
the bullying logical. And and, uh,
13:41
if you are curious about how this works, if you look at enough assembly, you'll eventually come across it.
13:48
But it's it's not that big a deal. You should just be aware of it.
13:52
Indian, it says, I pointed out earlier, is when spot bites are swapped around wind in storage.
13:58
When they're in registers, they look normal where
14:01
the most significant value is on the left hand side.
14:05
So when we typically read from left to right and our culture,
14:11
you should know that. And we're gonna cover that and I'll hear in the biz.
14:15
So we should also know,
14:16
Uh, no MacLeish er such as,
14:22
Or you should also know the nomenclature surrounding data types such as word, D, word and cured and academia. Word means
14:39
the base unit of memory and architecture terms. So,
14:46
if I'm talking about
14:48
ah, 32 bit computer, ah, word is generally
14:58
Microsoft we're making the programming languages they had to keep with compatibility, and the original word for the original
15:11
made word synonymous with 16 bits.
15:15
So you'll often see and Microsoft AP eyes and websites and code
15:20
the term D word, which is double word, which is pretty much synonymous with
15:26
32 bits or four bites.
15:28
Q. Word is quad Word, so it's double that's 64 bits a eight bites
15:35
like this is the difference between, like
15:41
industry standard and, uh, academia.
15:45
just be aware that when you see word or D word
15:48
or Q word, it means 16 32 and 64 bits.
15:56
unless you're reading a textbook
15:58
ones. Compliment is something you should be familiar with. It basically means you just flip all the bits. So if it was 0010
16:06
then you flip. All of this will be 1101
16:11
and you might go OK. It's very simple. Two's compliment
16:15
is where you flip all the bits and then
16:26
and you flip all the bits would be 110
16:29
and if you add one to, that would be 111
16:33
So I'm talking about
16:34
all this in terms of binary, and you might think, OK, that's kind of weird. Why would you ever use that or need that? And it's used for negative numbers.
16:47
that if you store negative numbers as two's compliment
16:53
you can use the same circuitry
16:59
for addition and subtraction and other operations.
17:03
If a negative number is in twos. Compliment, that's pretty nifty. And it's a shortcut that the hardware designers took to make computers really fast and not have to have extra circuitry for both negative and positive numbers.
17:18
You'll probably never see it, though, but it is something to be aware of
17:25
Oh, you know, the output of this
17:32
and then you see, like
17:34
the output from a print statement,
17:37
native zero. That's just like, How do you have a negative zero?
17:41
It's like, Well, technically, it's possible toe flip the sign bit
17:48
not not have one added to it. So it's native zero and then
17:53
native. One would be one
17:59
I mentioned earlier. Indian is important, and
18:03
it's kind of strange and rather confusing. And until it was really the only one that does it that I'm aware of,
18:10
it means that swaps the bites.
18:15
the the the references kind from Gulliver's Travels,
18:19
he came across the land of small people who were fighting viciously over what end to crack their egg
18:30
and little Indian. It does exactly what you think, where the value is
18:38
and it's stored. That way.
18:41
stored Little end first. So little Indian
18:48
the strange one in that
18:49
the least significant
18:56
in the lowest address.
19:00
So example is 12345678
19:06
The bites or swap. So it's 78
19:11
Intel is Lindy in, and it's like
19:15
I think the only one that I've ever seen this little Indian maybe aimed is, But I don't think so.
19:22
Um, Big Indian is exactly
19:26
what you would think it is So with the value of ah
19:29
number or ah, spot on memory is 12345678
19:36
it's stored like that. So this is when network traffic is being sent across. Um, whatever device is sent his big Indian like, If that's
19:48
then that's how it's sort of memory.
19:49
And Intel does this because
19:53
it's a some point, they found it to be more efficient
19:57
on they could do operations faster.
20:00
So a visual representation of this
20:03
would be something like
20:06
Little Indians on the left, Big Indians on the right.
20:11
If you're still a little confused, I'm gonna do it an example
20:18
so we can take this number.
20:23
So if we take this number,
20:26
I'm gonna split up into 28 byte segments.
20:40
and then the second value would be zero x for Hexi decimal
20:52
the bites. So zero x
21:02
and then the other value would be zero x
21:15
and then we combined them back again.
21:26
would be in an Intel Little Indian system
21:30
sword as this value.
21:37
I was just practicing it once or twice.
21:45
so just some notes for the paranoid.
21:48
A CZ, I said earlier, just simply can't be wrong.
21:56
it's an unsolvable problem without actually executing the code, it's
22:00
impossible to know exactly what the instructions will do sometimes more, or do things that will break assemblers or trick thumb.
22:08
one good trick that I've seen is switching from x 86 assembly code X 64 code, which you can only do and 64 bit system.
22:19
this is simpler, or D bugger right now can handle it.
22:25
they just get the disassembly wrong because they make certain assumptions about the code. It's processing that they're processing,
22:32
and also jumping into the middle of other instructions. So
22:40
the very purposefully take advantage of the assumptions made by assemblers too.
22:48
This symbol of some disassemble some code so that it looks like a bunch of move instructions,
22:53
But What actually happens is that a jump instruction jumped into the middle of one of those instructions
23:00
and is actually executing something else.
23:06
when you statically disassemble something,
23:08
you can't make an assumption
23:10
that you're reading what is gonna be executed. But Mauer will very frequently
23:17
change its own code,
23:18
like as its executing. It will change the code ahead of it
23:23
to jump somewhere else or to decrypt something and then jump into that.
23:29
And we can see an example of that here
23:32
in the next few videos.
23:33
And it's pretty interesting because we might have to dump
23:37
it out of memory and then use a combination of static and dynamic analysis to figure out what's going on.
23:42
Some malware will statically compound
23:45
the library's. It's using into itself instead of relying on the print F function.
23:52
Being in a library that we can access
23:56
malware authors will include the whole seed.
24:00
The sea standard library, like it will include the entire STD Io
24:14
five instruction or 10 instruction.
24:18
Uh, analysis just turned into,
24:21
you know, millions of instructions.
24:22
I'm a pro and others will try to identify
24:27
frequently used libraries and then tribal, label them and just say, Oh, this is string length or oh, this is
24:37
but it doesn't always catch it.
24:40
There's a lot of Mao out there. They will have what is called junk code, which functionally does nothing.
24:47
One of the first piece of malware that I was looking at for a good, long while,
24:52
I just spent a week on.
24:55
I had a lot of jump code where it would make function Cole's like get system time or get system information and then jump into functions that just had a lot of move instructions and a lot of jump instructions. And they didn't really do anything.
25:10
buried somewhere deep in one of the calls that it made
25:17
unfold, decrypt some other code of the payload,
25:22
use another function call to jump into that.
25:26
That's not uncommon,
25:27
but once you look at it for a while, you start to see some of the patterns that a lot of,
25:37
those kinds of tools those kinds of tools will produce. Like
25:42
one thing that would go into unexcusable and just kind of insert random instructions like,
25:52
movie A X, e, x or exchange E, T X
25:57
and E X, and then exchange them back a few instructions later. So it's code that guarantees that the function of the program doesn't change. And you can see that it's pretty easy to associate meaningless instructions with certain
26:15
programs, like packers or critters.
26:19
When we're talking about push instructions as parameters as it loading up as it's loading up parameters. Keep in mind that the compilers the one generating these instructions
26:36
compilers don't necessarily have to follow those conventions.
26:38
But it does. When it has to interface with AP eyes
26:41
so internally it can load up
26:45
like we saw with 64 bit code, it could load up the parameters and registers.
26:49
But when it comes to actually calling system libraries,
26:55
it has to have the stack in a good
26:57
situation has to have it in a good state. If you want that a p I to work and we will take a look at some malware, which
27:06
ah corrupt its own stack
27:10
and its malware that we've already seen before in prior videos is that little IRC bought malware
27:17
so We'll take a look at that and it's Ah,
27:19
gonna be interesting. And there's many, many ways to do this stuff. Luckily, most malware does not try to be
27:29
Most malware authors aren't very sophisticated,
27:33
and most malware authors, frankly, just don't care
27:36
if they develop a new
27:38
anti analysis technique or an anti debugging technique. Or
27:42
that they spend a week developing a new way to push and pop parameters on and off the stack. You know that maybe,
27:52
you know, good. It might be better
27:56
in defensive measures, but it would take them a week or two, and it would just take you, like
28:03
15 minutes, thio figure out. So it's really not a great time. Tradeoff for them. Our author, especially if they're my hour, has already been analyzed. That means,
28:14
you know they've their operation is probably already blown,
28:21
the pain on their motives it's usually not
28:27
depending on the malware authors motives and depending on the purpose of the malware, it's usually not advantageous from our authors to put in a whole lot of defense's into their malware
28:40
searches to recap and a list of good resource is that I suggest you check out.
28:47
we went over the goals of stack analysis. We what? We want to understand what's going on
28:52
underneath the hood, and we want to find out more information and get IOC's and confirmed dynamic analysis and really wanna gauge sophistication and maybe eventually attribution of
29:03
the malware on the intent of the authors.
29:07
We went over a lot of technical details and exiting six assemblies. I highly suggest you check out how different control, flow structures and different data structures compile down and what the resulting assembly is like. How ah, switch statement compiles down or how nested
29:26
if statements compiled down
29:33
meth all methodology of the compiler is using for optimization.
29:38
And there's a lot of different ideas about how to do this. And if you are interested in that,
29:44
I would suggest take
29:45
taking a look at the art of assembly.
29:48
That is a pretty good book, and it's relatively
29:52
neutral in terms off
29:55
looking at X 86 how to do things in terms of looking at, uh, arm and how to do something. Also
30:03
reversing secrets of reverse engineering
30:06
that is a fantastic book. Some people think it's dry and boring. I think it's fantastic for what you want to do.
30:14
And three eyed a pro book,
30:18
let's see. Unofficial at a pro book. But it's the only idea pro book, and it's pretty good,
30:25
especially if you're just beginning.
30:26
And I would also suggest checking out websites that hosts were called Crack Means.
30:33
And they're meant for reverse engineers and crackers to figure out
30:41
key generation algorithms or how to manipulate a program into getting the information that you want
30:48
puts on a fantastic intro to x 86.
30:52
Attn. They're open security train got info website, and they have all the materials up there. The slides, challenges
31:02
videos up on YouTube
31:04
posted YouTube playlist also suggests checking out
31:10
X 86 assembly language and the X 86 calling conventions. Calling conventions is something we didn't quite cover,
31:18
but we will go over in the future,
31:30
they enter a function call.
31:34
Sometimes the function
31:37
the call, he will clean up the stack. It said, Okay, I'm gonna make
31:41
500 bytes for my own local
31:47
on. Then at the end, it cleans up the 500 bites off the stack
31:51
on, then sometimes another calling conventions
31:53
like standard. Cole.
31:56
Uh, the Kolar cleans up the stack. It's like, OK, this function is going to need 500 fights for its local variables. I'm gonna give it 500 bison stack. Call the function. After it gets back,
32:08
I'll clean up the 500 bytes all clean up to stack.
32:12
So that's that's just what that is. Thank you for watching.
32:16
I highly suggest you go explore your own if you're interested in this stuff. I was just checking out malware stat. Stop Ward And you can see how many malware samples they have doing anti V M or anti disassembly techniques.
32:30
And you can see what function calls air using highlights. Just you also check out Cork. Am I or Coke? Mammy?
32:38
Ah, his all his information. He did a great job breaking down the P E
32:46
and displaying it in a very understandable way. And he
32:52
a lot of the values and the P E file format, and he
32:57
messes with them, and he says. Okay, this one does this. This one does this. This one doesn't do anything. Even the Microsoft says it does.
33:04
Um, and he shows how much you can mess with,
33:08
a portable X kyul file format before
33:13
the operating system refuses to try to execute it. And malware will use ah, lot of those same type of tricks,
33:22
um, automated analysis systems
33:25
on. We'll take a look at that stuff in the future as well.
33:29
Thank you for watching. And I hope you
33:32
follow along to the next video.