BURNEYE preliminary documentation


Description of how per-function encryption works in burneye
===========================================================

If the binary that is wrapped by burneye contains symbolic debug information
(usually unstripped binaries compiled with the -g option), we apply a special
encryption method.

Through the debug info, certain information about the functions within the
binary is extracted: The virtual address the function begins with, the length
in bytes of the functions' machine instructions and its name.

Each function is encrypted then. Since it is very difficult to move the
functions, we cannot prepend them with a decryption stub. Instead, we encrypt
the function, but overwrite its first eight bytes using a special sequence of
machine instructions:

	push	eax		; 0x50
	pusha			; 0x60
	pushf			; 0x9c
	call	cg_entry	; 0xe8 0x00 0x00 0x00 0x00

After the call instruction the remaining encrypted function data is stored. The
first eight bytes - which we overwrite with our stub - are saved elsewhere (see
below). This eight byte stub is stored directly within the binary. The function
can only be in two states throughout runtime, it can be encrypted or decrypted.
If it is encrypted the first eight bytes of the function contains always this
stub bytes. On the other hand, if it is decrypted the real original bytes are
at that place.

Now for the interesting part. As the function calls the 'cg_entry' entry point,
its stack space looks like this:

	[(possible) function arguments]
	[return address from func caller]
	[eax]
	[pusha register block]
	[saved flag register]
	[cg_entry caller return address (func + 8)]

The code at 'cg_entry' pops the last address from the stack and computes the
original function address from it (just minus eight). Using the address, it
calls a search function which searchs through a table of structures, one for
each encrypted function. Once it finds the appropiate structure, it restores
the first eight bytes of the function with the saved encrypted bytes, then
calls a decryption function on the entire functions data. Normally it could
just restore the flags and all registers then and jump to the functions entry
point. This would work perfectly, but the more functions are called in the
application, the more it is decrypted. If all functions are called at least one
time, the entire .text segment is decrypted and can be dumped.

To avoid this 'lazy-decryption' problem, the 'cg_entry' code also replaces the
return address of the function that is decrypted. Thus, as the now-decrypted
function is returning through a simple 'ret' instruction, our code is called
again. The diagram shows how this works:

usual:  [outside function]---[core function]---[called function]

cg:     [outside function]. .[core function]. .[called function]
                          | |               | |
                .---------' '--------. .----' '------------.
                |                    | |                   |
                '-----[cg_entry]-----' '-----[cg_detry]----'

This way both the entry and return ('detry') point of the function is
redirected. As a clever reader you may have noticed that the parent function
remains decrypted in this setup. Therefore the 'cg_entry' code also re-encrypts
its caller function.

In detail the stub and 'cg_entry' code does:

	1. save necessary data (flags, registers)
	2. encrypt outside function
	3. restore first eigth encrypted bytes of core function
	4. decrypt core function
	5. restore necessary data (flags, registers)
	6. pass control to entry point of core function (through jmp)


The 'cg_detry' code has to mirror the behaviour from the opposite perspective:

	1. save necessary data (flags, registers)
	2. encrypt core function
	3. overwrite first eight bytes in core function with stub
	4. decrypt outside function
	5. restore necessary data (flags, registers)
	6. return to real core function return address


Possible runtime problems
-------------------------

This function wrapping method is quite reliable and can cope with various
situations, such as goto's, signal handlers, function pointers and generally
non-linear code. However, there is one case where this does not work.

If there is an execution path which points from within one function to the
middle of another encrypted function, the target function is not decrypted.
This sounds complicated, so here is a rule of thumb: Do NOT use longjmp/setjmp
within your code.

If you have to use it or you have to protect a stock binary with symbols, you
can tag the function that the 'longjmp' passes control to (i.e. the function
that has the 'setjmp' call) with a decrypt-log. This means it is initially
decrypted once, before your binary receives any control at all and remains
decrypted throughout the whole runtime.

Note that this way the function that receives the 'longjmp' remains unprotected
through the whole time the binary runs. Hence, use this only if there is no way
to replace the 'longjmp'. In most cases there is a way.

Also, you can use the decrypt-lock functionality for performance improvements,
see below.


Performance overhead
--------------------

TODO

Performance is an issue with this protection. However, I do not have made
concrete statistics about it. For most I/O based programs the overhead may lie
around 10 to 15 times the instructions executed in runtime than in the
unprotected version. With some per-function optimizations, leaving the most
often called functions unprotected at runtime these may drop to a level of five
or less. If only single core functions are protected there may be no overhead
at all.

However, to make a real decision, statistics are required once the encrypter is
finished. TODO


--
vi:fo=tcrq:tw=79