0032 2001/02/03 explanations of malloc() overwrite technique ==== TESO Informational ======================================================= This piece of information is to be kept confidential. =============================================================================== Description ..........: Explanations of malloc() overwrite technique Date .................: 2001/02/03 23:00 Author ...............: scut Publicity level ......: known Affected .............: heap overflows in malloc'ed memory, special libc's Type of entity .......: exploitation technique Type of discovery ....: exploitation technique Severity/Importance ..: medium Found by .............: Solar Designer =============================================================================== Originally discovered by Solar Designer, this techniques allows arbitrary memory overwrites if you can overflow a malloc'ed buffer. On a common buffer overflow within the stack space, there are interesting variables after the overflowing buffer. Often this includes pointers, return addresses, function parameters, etc. In heap space there is far less interesting data that can be overwritten. But buffer overflows do occur in malloc'ed or static buffers, too. For static buffers, there are some techniques, namely to overwrite function pointers, jmp_buf'ers, or GCC created DTORS tables. For malloced space the situation was way more complicated, since the heap space used by malloc is dynamically ordered, giving away chunks of memory to the program if it requests them. Beside the order also the addresses and the content of the memory can vary a lot depending on the programs usage, system load or the moonlight. Well, nearly. However, it is desireable to find a more reliable way to exploit buffer overflows occuring within malloc'ed areas. This may have been the intention of our hero, Solar Designer, who thought upon this neat technique. Here it is, but first some technical background. The malloc(3) function is used to request memory from a dynamically growing memory area, called 'the heap'. Using the sbrk(2) system call a program can modify the border of this area. If the program needs a lot of memory in pieces and dynamically this can result in a lot of work to sort, use, reuse this memory areas efficiently. So this basic functionality of memory management is implemented in a efficient way within the standard C library. The internals are not interesting to the programmer, since he only uses malloc(3) to request memory of a certain size, and free(3) to release the memory after it is not in use anymore. But for taking advantage of a buffer overflow within a buffer that was created using malloc(3) we have to look at the details of the memory management system. In this case we use the all so popular GNU C Library. The C library keeps the information about the memory slices the application requests in so called 'chunks'. They look like this (adapted from malloc.c): +----------------------------------+ chunk -> | prev_size | +----------------------------------+ | size | +----------------------------------+ mem -> | data | : ... : +----------------------------------+ nextchunk -> | prev_size ... | : : Where mem is the pointer you get as return value from malloc(). So if you do a: unsigned char * mem = malloc (16); Then 'mem' is equal to the pointer in the figure, and (mem - 8) would be equal to the 'chunk' pointer. The 'prev_size' element has a special function: If the chunk before the current one is unused (it was free'd), it contains the length of the chunk before. If the chunk before the current one is used, the 'prev_size' value is zero. The 'size' field has a special meaning. As you would expect it contains the length of the current block of memory, the data section. As you call malloc(), it is set and is always padded up to the next double-word boundary. So a malloc(7) will become a malloc(8), and a malloc(20) will become malloc(24). For malloc(0) it will be padded to malloc(8), the reason we will see later in this text. Since this padding implies that the lower three bits are always zero and have no meaning to the real length, they are used otherwise, to indicate special attributes of this chunk. The lowest bit, called PREV_INUSE, indicates whether the previous chunk is used or not. It is set if it is used. The second least significant bit is set if the memory area is mmap'ed, a special case which we will not consider. The third least significant bit is unused. To test whether the current chunk is in use or not, we have to check the next chunks PREV_INUSE bit within its size value. Once we free() the chunk, using free(mem), some checks take place and the memory is released. If its neighbour blocks are free, too (checked using the PREV_INUSE flag), then they are merged, to keep the number of reuseable blocks low, but their sizes as large as possible. If a merge is not possible, the next chunk is tagged with a cleared PREV_INUSE bit, and the chunk changes a bit: +----------------------------------+ chunk -> | prev_size | +----------------------------------+ | size | +----------------------------------+ mem -> | fd | +----------------------------------+ | bk | +----------------------------------+ | (old memory, can be zero bytes) | : : nextchunk -> | prev_size ... | : : As you can see, where our data was previously stored at the 'mem' pointer returned by malloc, there are two new values, called 'fd' and 'bk', forward and backward namely. They are pointers and point into a double linked list of unconsolidated blocks of free memory. Everytime a new free is issued now this list is checked, and possible unconsolidated blocks are merged, or the whole memory is defragmented to release memory. Since the malloc size we give to it is at least 8 bytes all the time, there is always enough space for both pointers. If there is remaining old memory behind the 'bk' pointer it remains unused until this memory is malloc'ed again. The interesting essence of this whole stuff is that the memory management information itself is stored beside the data, a clear channeling problem (just as with format string commands within the string itself, as control channels in breakable phonelines, as return addresses within stack memory, etc). Since we can overwrite this management information if we can overwrite a malloced area, we should take a look at how it is processed later on. As every malloc'ed area is free()'d again in any sane program, we take a look at free, which is a wrapper to chunk_free() within malloc.c (simplified a bit to take out #ifdef's): static void chunk_free(arena *ar_ptr, mchunkptr p) { size_t hd = p->size; /* its head field */ size_t sz; /* its size */ int idx; /* its bin index */ mchunkptr next; /* next contiguous chunk */ size_t nextsz; /* its size */ size_t prevsz; /* size of previous contiguous chunk */ mchunkptr bck; /* misc temp for linking */ mchunkptr fwd; /* misc temp for linking */ int islr; /* track whether merging with last_remainder */ check_inuse_chunk(ar_ptr, p); sz = hd & ~PREV_INUSE; next = chunk_at_offset(p, sz); nextsz = chunksize(next); Since the malloc management keeps chunks within special structures called 'arenas', it is now tested whether the current chunk that should be free directly borders to the 'top' chunk, a special chunk. The 'top' chunk is always the top-most available memory chunk within an arena, it is bordering the end of available memory. The whole if block is not interesting to the typical buffer overflow within malloc space. if (next == top(ar_ptr)) /* merge with top */ { sz += nextsz; if (!(hd & PREV_INUSE)) /* consolidate backward */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); sz += prevsz; unlink(p, bck, fwd); } set_head(p, sz | PREV_INUSE); top(ar_ptr) = p; if ((unsigned long)(sz) >= (unsigned long)trim_threshold) main_trim(top_pad); return; } Now the 'size' of the current chunk is tested whether the previous chunk is unused (testing for the PREV_INUSE flag), and if it is the case both chunks are merged. islr = 0; if (!(hd & PREV_INUSE)) /* consolidate backward */ { prevsz = p->prev_size; p = chunk_at_offset(p, -(long)prevsz); sz += prevsz; if (p->fd == last_remainder(ar_ptr)) /* keep as last_remainder */ islr = 1; else unlink(p, bck, fwd); } Now the same is done into the other direction, it is checked whether the chunk in front of the current chunk is free (testing for the PREV_INUSE flag of the size two chunks ahead). If it is the case the chunk is merged into the current one. if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) /* re-insert last_remainder */ { islr = 1; link_last_remainder(ar_ptr, p); } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } else set_head(next, nextsz); /* clear inuse bit */ set_head(p, sz | PREV_INUSE); next->prev_size = sz; if (!islr) frontlink(ar_ptr, p, sz, idx, bck, fwd); } As Solar Designer showed us, it is possible to use the 'unlink' macro to overwrite arbitrary memory locations. Here is how: A usual buffer overflow situation might look like: mem = malloc (24); gets (mem); ... free (mem); This way the malloc'ed chunk looks like this: [ prev_size ] [ size P] [ 24 bytes ... ] (next chunk) [ prev_size ] [ size P] [ fd ] [ bk ] or [ data ... ] As you can see, the next chunk directly borders to our chunk we overflow. We can overwrite anything behind the data region of our chunk, including the header of the following chunk. If we take a closer look at the end of the chunk_free function, we see this code: if (!(inuse_bit_at_offset(next, nextsz))) /* consolidate forward */ { sz += nextsz; if (!islr && next->fd == last_remainder(ar_ptr)) /* re-insert last_remainder */ { islr = 1; link_last_remainder(ar_ptr, p); } else unlink(next, bck, fwd); next = chunk_at_offset(p, sz); } The inuse_bit_at_offset, is defined as macro in the beginning of malloc.c: #define inuse_bit_at_offset(p, s)\ (((mchunkptr)(((char*)(p)) + (s)))->size & PREV_INUSE) Since we control the header of the 'next' chunk we can trigger the whole if block at will. The inner if statement is uninteresting, except our chunk is bordering to the top-most chunk. So if we choose to trigger the outer if statement, we will call unlink, which is defined as macro, too: #define unlink(P, BK, FD) \ { \ BK = P->bk; \ FD = P->fd; \ FD->bk = BK; \ BK->fd = FD; \ } The unlink is called with a pointer to a free chunk and two temporary pointer variables, called bck and fwd. It does this to the 'next' chunk header: *(next->fd + 12) = next->bk *(next->bk + 8) = next->fd They are not swapped, but the 'fd' and 'bk' pointers point to other chunks. This two chunks being pointed to are linked, zapping the current chunk from the table. So to exploit a malloc based buffer overflow we have to write a bogus header in the following chunk and then wait for our chunk getting free'd. [buffer .... ] | [ prev_size ] [ size ] [ fd ] [ bk ] '|' is the chunk boundary. The values we set for 'prev_size' and 'size' do not matter, but two conditions have to be met, in case it should work: a) the two least significant bits of 'size' have to be zero b) both, 'prev_size' and 'size' should be add-safe to a pointer that is read from. So either use very small values up to a few thousand, or - to avoid NUL bytes - use big values such as 0xfffffffc. 'fd' and 'bk' can be set this way (as used in Solar Designers Netscape Exploit): fd = retloc - 12 bk = retaddr But beware, that (retaddr + 8) is being written to and the content there is destroyed. You can circumvent this by a simple '\xeb\x0c' at retaddr, which will jump twelve bytes ahead, over the destroyed content. Or you are an evil nerd and directly do shellcode which immediatly loads a register with it's own address and take advantage of the write ;-) (I am shooked SolarDiz' did not mention that ;) Well, however, exploitation is pretty straight forward now: <6> <4 bogus> | \xff\xff\xff\xfc \xff\xff\xff\xfc Where '|' is the chunk boundary (from that point we overflow). Now, the next two negative numbers are just to survive a few checks in free() and to avoid NUL bytes. Then we store (retloc - 12) properly encoded and then the return address, which will return to the 'jmp-ahead'. The buffer before the '|' is the same as with any x86 exploit, except for the first 12 bytes, because we have to take care of the extra write operation by the unlink macro. XXXTODO: add off-by-one techniques Thanks Solar Designer, for another nasty trick :-] ===============================================================================