Suppose you have the following number of bytes, and you would like to know whether these bytes are executable code or data?
OpCodes = "\x48\x65\x6C\x6C\x6F"
Generally speaking, there is no guaranteed way to differentiate code from data. But you can follow certain conditions to identify whether these are code or data. For example, if you look at the above bytes and compare it with the ASCII table you would figure out that:
\x48 = H \x65 = e \x6C = l \x6F = o
In short the bytes equals to “Hello”.
Following the the list of methods you can apply to differentiate code from data. If you apply all the methods below, it should increase your detection accuracy.
Method 1: Are all the bytes provided ASCII decodable? If so, they most likely are data.
Method 2: What are the starting bytes when disassembled? If the starting bytes are common instruction used in an executable, the bytes may be code.
Method 3: What are the ending bytes? look for the bytes that can help the program close properly I.E if the ending bytes are one of the following, it may be code:
\xc9 or \xc3 or \xcd\x21 or \xf4 and etc …
These bytes help the program close properly.
Method 4: Does the bytes contain a compare jump? suppose you decode the byte and loop through each instruction and come across the following instruction:
cmp eax, ebx; jmp 0x0011223344
This could also rank the bytes toward being code.
Method 5: Does the bytes contain memory address? If you come across I.e
jmp 0x0011223344
it most likely means it is code.