1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
|
.TH objobf 1 "16 Apr 2003" "TEAM TESO"
.nh
.SH NAME
objobf \- burneye2 x86/Linux ELF relocateable object obfuscator
version 0.4.13.
.SH SYNOPSIS
.B objobf [options] inputfile
.SH DESCRIPTION
This page documents the
.B burneye2
object file obfuscator utility,
.I objobf.
It reads one ELF relocateable object file and
produces a functional equivalent output file, which is an obfuscated version
of the input file. To do this,
.I objobf
breaks up all functions in the file to the basic block level. This
representation is used to mutate the code while keeping it semantically
equivalent. This involves dataflow analysis and basic block transformations.
Afterwards, the basic block representation as control flow graph is linearized
into a new object file, which is created from scratch.
.SH BETA SOFTWARE
Please note that the software is in an early state of development, and does
not contain some of the features mentioned. Right now it only has the ability
to insert junk instructions, split and reschedule the order of basic blocks
among multiple functions ("entangling") and basic block duplication with
conditional codepathes between the copies. It does not yet provide substantial
protection against automatic analysis. Currently, it is meant as semi-public
beta test software. Please report bugs back (see the
.I BUG REPORTS
section).
.SH OPTIONS - GENERAL
.TP
.I "-o file"
Write the output to
.I file
rather than the default output file "output.o"
.TP
.I "-v num"
Set the verbosity level for both the frontend (analyzation phase) and the
backend (obfuscation and linearization). Values between zero (0) and three (3)
are allowed, where 0 means low and 3 high verbosity. Default is zero (low).
.TP
.I "-r file"
Read the random data required from
.I file.
This allows for easier reproduction of bugs, as otherwise the random data is
always pulled dynamically from /dev/urandom. If you can reproduce a bug on a
small object, use a constant
.I file
as randomization input.
.SH OPTIONS - GRAPHING
.TP
.I -n
Do a
.I natural loop analysis
on the control flow graph. This is only useful in conjunction with either the
.I -c
or
.I -l
output option. If this option is enabled, for every basic block in the CFG a
dominator set is build. Upon this set, a loop detection analysis is
performed. Only reducible flow graphs can be analyzed this way. This usually
includes all compiler generated code, though ;). The header block is marked in
aquamarine (light blue-green).
For a more detailed explanation of the loop detection, see below at
.I LOOP DETECTION.
Please note, that if you use this option, you cannot use
.I xvcg
anymore to display the graphs, as boxed subgraphs are used to display the loop
nesting. The
.I aiSee
program supports a superset of the VCG graph language which can handle this.
Get a free version of aiSee at http://www.absint.com/aisee/download/. You
might also consider symlinking
.I aisee.bin
to
.I xvcg
so that the graphs are displayed from
.I objobf.
.TP
.I -N
Do a
.I natural loop analysis
just as with the
.I -n
option, but using the heuristics recommended in the Dragon book. There, the
unification of multiple possible loops is prefered, whereas the method used
with
.I -n
prefers the nesting of those loops. Read below at
.I LOOP DETECTION.
.TP
.I "-l func"
Dump a register usage analysis for function
.I func.
More specifically, a live register dataflow analysis for the function is done
and written as VCG graph to the file "output.vcg". If the
.I VCG
program is installed as
.I xvcg,
an output file "output.ps" is created and displayed using
.I gv.
.TP
.I "-c func"
A simple control flow graph (CFG) of the function
.I func
is written as VCG file to "output.vcg". As with the
.I -l
option,
.I xvcg
is ran and the output is displayed with
.I gv.
.TP
.I "-d func"
Do a dominator tree analysis on the CFG of function
.I func
and dump the graph as VCG file to "output.vcg".
.I xvcg
and
.I gv
are ran as usual. A basic block B1 is dominated by another basic block B2, if
all codepathes to B1 go through B2. The dominator tree is visual candy, as
all blocks above a certain block B1 are executed before B1 is. The dominator
tree itself is not very useful to human analysis, but substantial for loop
detection.
.TP
.I -u
Do a register usage profile. Every instruction in the whole object is examined
for the CPU registers it uses. The final profile is dumped to stdout. Can be
useful to quickly see how optimized an object file is (equal distribution =
optimized, eax/edx prefered = unoptimized).
.TP
.I -i
Do an instruction usage profile. As with
.I -u,
every instruction in the object is processed. The main opcode index counted
(for example there is only one opcode index for all variants of mov). The
final profile of the object is printed to the standard output.
.TP
.I -I
Just like
.I -i,
but the output is given in a machine parseable format.
.SH OPTIONS - OBFUSCATION
.TP
.I -A
Activate all obfuscation options with reasonable values. This is the option to
use if you are unsure about each single option but still want a good level of
obfuscation.
The rest of the obfuscation options are to set individual parameters. If you
do not know exactly what you are doing, do not expect a higher level of
obfuscation by changing individual options.
.TP
.I -e
Entangle basic blocks of all functions in the object. That is, the basic block
order is randomized globally while the control flow is properly maintained.
However, there are no opaque predicates between basic blocks right now, and
the entanglement is easily seperateable (but it does fsck with IDA very well
already ;)
.TP
.I -j factor
Insert junk instructions to stretch the object files' total instruction count
by
.I factor.
The recommended value is between 3.0 and 10.0.
.TP
.I -m
Mark all inserted junk instructions by prepending a NOP. This is only helpful
if the
.I -j
option is used.
.TP
.I -s factor
Split basic blocks at given
.I factor.
This means, from all basic blocks in the object file roughly
.I factor * all_count
blocks are selected and split at a random point. The
.I factor
has to be inbetween 0.0 and 1.0. This works best with entangling, which can be
enabled using the
.I -e
option. See the
.I TIPS SECTION
for more hints how to use it.
.TP
.I -O probability
Set the opaque split probability. Only meaningful in conjunction with the
.I -s
option. Then, it overrides the default probability of 0.5 to a value between
0.0 and 1.0. For every basic block split, this probability determines if the
second part should be duplicated. If it is choosen to do so, the original
block is changed so that a conditional jump leads to either of the two. This
creates an opaque control flow path and will prove really useful with peephole
obfuscation in the future, as the duplicated basic block and the original can
be independantly modified.
.TP
.I -w factor
Experimental option: set the instruction swap factor. Not used by
.I -A
yet. This option introduces an extra pass in which all instructions in a basic
block are examined for order-exchangeability. The total number of swapable
instruction pairs are multiplied with
.I factor
and the resulting number of swaps is commited. This is a very non-trivial
thing and highly experimental. Theoretically, it should work best with
optimized code that has a high register liveness and does not utilize local
memory much. Use with care. As with any other factor option, the range of
.I factor
is inbetween 0.0 and 1.0.
.SH EXAMPLES
The included simple example shows how to use the obfuscator:
.PP
.in +4
.nf
gcc -c -o quicksort.o quicksort.c
objobf -v 3 -e -o quicksort_obf.o quicksort.o
gcc -o main main.c quicksort_obf.o
gcc -c -O2 -o reducebind.o reducebind.c
objobf -e reducebind.o
gcc -o reducebind-obfuscated output.o
strip reducebind-obfuscated
objobf -j 2.0 -e reducebind.o
gcc -o reducebind-more-obfuscated output.o
objobf -j 8.0 -e -s 0.95 reducebind.o
gcc -o reducebind-most-obfuscated output.o
objobf -A reducebind.o
gcc -o reducebind-vanilla output.o
objobf -n -l quicksort quicksort.o
aisee.bin -color -psoutput output.ps output.vcg
gv output.ps
.fi
.in -4
.SH TIPS
Obfuscation can be defined through three measures: Potency, Resilience and
Cost.
.I Potency
Measure of complexity added to obfuscated program, roughly equivalent to the
human difficulty of understanding the program.
.I Resilience
Measure how well a given transformation protects a program from an automatic
deobfuscator.
.I Cost
Measure of increase in resources required to run the program.
The definitions are taken from [Wroblewski2002], see the
.I REFERENCE SECTION.
.I objobf
tries to reach a high potency and resilience while not introducing a too high
cost. Hence, we take a look at how each option influences the three measures.
.SH TIPS - BY OPTION
.I Entangling (-e).
This increases the potency and cost of the obfuscation, but the resilience is
influenced only slightly. The cost depend on the objects size and the
computers instruction cache size and is roughly constant if no basic block
splitting is done. If basic block splitting is used, the cost increase by the
inverse of the splitting factor. So if a high split factor is used the
entangling cost increase, too. The resilience is not influenced by entangling,
as the entangling transformation does not change the property of functions
having disjunct control flow graphes. I suspect the potency increases by a
constant factor in the range of 0.5 to ~5 if helper tools such as IDA is used,
for completely human analysis by a simple disassembler alone this increases
much more.
.I Junk instruction insertion (-j factor).
This almost always increases the cost linear by
.I factor.
There are exceptions, especially if there are tightly optimized inner loops
that are modified. This may produce a cascade of cache misses, unreliable
branch prediction and other undesired results which lead to slower code. Junk
code insertion almost always introduces a constant factor potency to human
analysis. The junk instructions have to blend properly into the original
objects code and depend on valid registers. Else they are easily seperateable
automatically. Hence, the resilience of junk code depends largely on the
quality of the generated code. (This will be improved in future versions of
objobf). A
.I factor
of 8.0 to 12.0 is recommended to gain the most out of this option. Especially
with large object files this is helpful.
.I Basic block splitting (-s factor).
As of the current version of objobf this increases the potency and cost of the
code, but not the resilience (as no opaque constructs are used). This will be
changed in the future to jield truly resilient constructs against automatic
deobfuscation. A
.I factor
between 0.5 and 0.99 is recommended. This option is only useful in conjunction
with the entangling option. The reason this two options are separated is that
.I -e
intermixes function boundaries, while the basic block splitting works on basic
block level. The cost is roughly increased linear to the factor plus the
opaque split probability (set to 0.5 by default, or with the
.I -O
option).
.SH LOOP DETECTION
Within the control flow graph, there are edges which have the property of
being
.I back edges.
A back edge is an edge where the target basic block dominates its source basic
block (see the
.I -d option
for the definition of dominator blocks). All such edges may imply a loop in
the program. The list of basic blocks within this loop is called
.I natural loop.
One basic block of the loop is the so called
.I header block
which is the first one and the loops entry point. All natural loops only have
one single entry point.
.SH KNOWN BUGS
.I Multiple code sections.
Only one code section (usually .text) in the object is supported. The most
common case of multiple code sections are global constructors or destructors
in C++ programs or system library initialization functions (in both cases the
sections are called .init and .fini).
.I Dangling *common symbols.
To ease analyzation, no interpretation of symbols refering to the *common
section has been implemented (*common is what will turn into .bss at link
time). To fix this, use the following command line before feeding the object
into
.I objobf
.in +4
.nf
ld -q -r -d -o input-temp.o input.o
objobf input-temp.o
.fi
.in -4
.I Assertion failures.
If you experience one of the assertions below, please send me in the object
file, the options and the random seed you use for further investigation.
.in +4
.nf
bblock_fixup_end_single: Assertion `displ_relofs != 0' failed.
.fi
.in -4
.I Too many loops detected.
The loop analysis has trouble if there are more than one backedge to a loop
head. Then, two scenarios are possible. Either - the first case - it could be
that there are multiple loops nested within each other, and two loops should
be displayed. This case is correctly handled. Or, - the second case - it is
only one loop with multiple continuation points, such as continue statements.
This case is not correctly handled and multiple nested loops are displayed,
which all share the same loop head. I implemented algorithms for both cases,
but the problem is to choose the proper one. To solve this problem, further
loop analysis would be required. The Dragon book advises to assume it as one
loop, while I find it easier to manually group multiple nested loops as one,
once it is clear that it is just one loop. You can choose between the two
methods of loop detection with the
.I -n
(prefer the nesting of loops) and
.I -N
(prefer the unification of loops) options. Maybe in the future I will
implement induction variable analysis and provide better heuristics to select
the proper nested/group-as-one algorithm.
.SH REFERENCES
[Wroblewski2002]
.I http://www.mysz.org/papers/obfuscation.pdf
An all around very good introduction into the topic of obfuscation.
.SH BUG REPORTS
If you find a bug (and that is for sure), please try to reproduce it with an
object as small as possible, as the analysis will be easier then. Please mail
me at
.I scut@team-teso.net
|