Fuzzing Ruby C Extensions with Coverage and ASAN

16 jan 2024

Hello there!

In my recent bug bounty endeavors, I needed to fuzz some ruby C extensions. However, that turned out to be more difficult and hacky than anticipated so I'm sharing my setup here. Maybe it helps some of y'all.

Starting Point

Execute ruby extconf.rb and generate the Makefile for your target. We will modify the Makefile in-place but won't touch the source of the extension.

Coverage Information

Since C extensions are shared objects loaded dynamically via dlopen(3), we need to pick a coverage mechanism that works with dynamic loading. Unfortunately, AFL++'s instrumentation is off the table because of

Counter Collisions: Since each shared object requires a separate linker invocation, the coverage counters start from zero in each binary and collide.
Automatic Forkserver: AFL++'s runtime does not only handle coverage but also provides a forkserver that will start whenever the instrumented extension is imported. We want to have manual control over the forkserver instead.

Thus, we choose one of LLVM's mechanisms and compile the extension with SanitizerCoverage's PC guards by adding -fsanitize-coverage=trace-pc-guard to the CFLAGS. This inserts calls to __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) and __sanitizer_cov_trace_pc_guard(uint32_t *guard).
We handle these calls in a separate extension but more on that later.

Address Sanitizer

To make ASAN work with shared libraries we need to add -fsanitize=address -shared-libasan to the CFLAGS. This way, the ASAN runtime will be provided as a dynamic dependency of the binary in libclang_rt.asan-x86_64.so, instead of being statically linked into the application. The downside is that we must preload the runtime when executing the ruby interpreter. If we don't preload it, the interceptors cannot function properly, so we have to set

LD_PRELOAD=/path/to/libclang_rt.asan-x86_64.so

Fighting Address Sanitizer

Unfortunately, usage of ASAN has multiple side effects in our fuzzing setup.

Firstly, the garbage collector goes completely ham for reasons I do not know. Ruby's GC aggressively frees allocated chunks even when properly marked as in-use (?). I suspect this is due to the high memory pressure ASAN exerts with its 20TB of overhead. This leads to a lot of false-positive use-after-frees that do not happen otherwise, so we need to disable GC in our fuzz target via GC.disable.

Secondly, ASAN has its own implementation of __sanitizer_cov_trace_pc_guard_init and __sanitizer_cov_trace_pc_guard. Since we preload the runtime, the coverage functions get bound to the ASAN runtime, and we lose the ability to handle them ourselves. The solution for this is to simply rename the relevant symbol exports. The following script directly overwrites their .dynsym entries and changes the first letter to an X.

#!/usr/bin/env python3

import sys

def main(source, symbols):
    with open(source, "rb") as f:
        content = bytearray(f.read())

    for symbol in sorted(symbols, key=lambda x: len(x), reverse=True):
        symbol = bytes(symbol, "ascii")

        for i in range(0, len(content) - len(symbol)):
            part = content[i:i + len(symbol)]

            if part == symbol:
                print(f"Found {symbol} at {i}")
                content[i] = ord("X")

    with open(source, "wb") as f:
        f.write(content)

if __name__ == "__main__":
    main(sys.argv[1], sys.argv[2:])

Fuzzer Communication

The final puzzle piece is the communication with the fuzzer. We need to communicate

Coverage information that comes in via __sanitizer_cov_trace_pc_guard
Forkserver information whenever we attempt a new run

We implement all that in a new extension called forkserver. As a first step we connect to AFL++'s shared memory channel via:

unsigned char* shm = NULL;
size_t shm_size = 0;

void setup_shm (void) {
    char* var = getenv("__AFL_SHM_ID");
    if (var) {
        int id = atoi(var);
        shm = shmat(id, NULL, 0);
        
        shm[0] = 1;

        var = getenv("AFL_MAP_SIZE");
        if (var) {
            shm_size = atoi(var);
        } else {
            shm_size = 65536;
        }
    }
}

Then, we populate the coverage bitmap by assigning each PC guard an index into the coverage map:

uint32_t sancov_cursor = 1;

void __sanitizer_cov_trace_pc_guard_init (uint32_t *start,  uint32_t *stop) {
    for (uint32_t* i = start; i < stop; ++i) {
        *i = sancov_cursor++;
    }
}

void __sanitizer_cov_trace_pc_guard (uint32_t *guard) {
    size_t idx = *guard;
    
    if (idx < shm_size) {
        shm[idx]++;
    }
}

Note that this supports multiple instrumented extensions. Finally, we define a method in our module that starts the forkserver:

VALUE launch_forkserver (VALUE self) {
    (void) self;

    setup_shm();
    forkserver(); // not shown here but it does the same as afl-compiler-rt.o.c

    return Qnil;
}

// For the ruby interpreter:
void Init_forkserver (void) {
    rb_define_global_function("launch_forkserver", launch_forkserver, 0);
}

Fuzz Target

Let's combine everything from above and create our fuzz target now:

# Handles coverage and communication with the fuzzer:
require "./forkserver"

# Our instrumented target(s):
require "instrumented_target"

# GC doesn't seem to work with ASAN so disable it:
GC.disable 

# The setup is done so launch the forkserver now:
launch_forkserver

# Then we can read input from stdin and do stuff with it...

And start the fuzzer:

AFL_PRELOAD=/path/to/libclang_rt.asan-x86_64.so afl-fuzz -i corpus/ -o output/ -- ruby our-fuzz-target.rb

Thanks for reading!