Open-Obfuscator: A free and open-source obfuscator for mobile applications

Introduction

This post is about a new project on which I (intensively) worked this month. It aims at providing a free and open-source solution for obfuscating mobile applications.

TL;DR

open-obfuscator is available at this url: https://obfuscator.re, and this blog post is more about the motivation behind this project. You can also check the GitHub repositories: O-MVLL & dProtect

The origin of this project

Since several years now, I’m doing reverse engineering on (obfuscated) mobile applications and my latest publications tend to be more about how to “defeat” obfuscation rather than how to protect code. I actually enjoy both aspects and I’m also aware that breaking1 something or reversing a RASP check is – most of the time – easier than finding and building something that works at scale and for different users/environments (like an obfuscator).

An idea started to grow after a recent discussion about the legal aspect of reverse engineering on obfuscated code. It’s true: reverse engineering an obfuscator is not permitted and publishing the results is even less. Depending on the interlocutor, you might be lucky or not (c.f Promon vs University Researchers and this 36C3 talk).

That being said, the publications of this blog followed a responsible disclosure in which the stakeholders have been contacted ahead time of the publications. For instance, the feedback from Google for Safetynet was very fair:

Google’s feedback

On the other hand, if one would have to decide which solution would be better to protect his assets, this person could only refer (and infer) datasheets, demos2, or testimonials.

In particular, there are no public benchmarks to get an idea about how solutions are positioned from each other. Moreover, it can be difficult for developers to understand the purpose of an obfuscation scheme and how it really works underneath.

Thus, why not trying to take another hat and creating an obfuscator that fulfills these objectives:

  • Easy to use and easy to integrate for the developers.
  • Highly documented so that developers have a good understanding of the protections.
  • Open source:
    1. to welcome public improvements.
    2. to welcome public attacks, free from legal issues, which in the end are used to enhance the protections.
    3. to be objectively and technically benchmarked.
  • Realistic: trying to be as close as possible to the state of the art for the benefit of both: developers and attackers.

Obfuscation and open source sounds a bit contradictory no?

Somehow yes. If the design is known attacks are easier. But, even if an obfuscation technique is known, attacks can remain costly.

Let’s take an example with the control-flow flattening protection. The technique is known, documented and there are public attacks3 but defeating this pass (statically) can be painful and time-consuming. Usually (including myself), it forces the reverse engineer to analyze the program dynamically which shifts the problem to dynamic protections. Thus, we can assume that the pass is efficient against static analysis even if it is known.

Another example, Safetynet. The overall design of the protection is based on a virtual machine. Once we reversed the different handlers, we can assume that the design is “known”. Some layers of obfuscation are also based on known MBA. But, Google did something very smart, a new version is published regularly and the internal components are shuffled.

I do believe that the strongest protection against reverse engineering comes from a well-thought design, not an individual obfuscation pass.

Obfuscation is also about time, and I think that by combining several (known) obfuscation techniques, we can trigger a time-out in the attackers.

Talking about time, this month I had the time to bridge all these ideas.

Technical choices

So far, the ideas were on the paper and I did not want to create “yet another O-LLVM fork4 or spend time on something that already exists.

Native code obfuscation

No doubt about using LLVM and while looking at the current solutions based on LLVM, I found eshard/obfuscator-llvm on GitHub which is developed by eShard (a company known for side channels attacks on whiteboxes 56).

Compared to other LLVM-based obfuscators, it uses the new LLVM pass manager which enables us to load an out-of-tree plugin with clang:

clang -fpass-plugin=<path/to/llvm/obfuscation>/libLLVMObfuscator.so \
      hello_world.c -o hello_world

I was definitely convinced by this way of using an obfuscator. Especially because it keeps the original compiler from the toolchain which simplifies this kind of issue: Hikari/Troubleshooting - AArch64e Support.

On the other hand, I was less convinced of using environment variables to trigger and configure the obfuscation passes.

Java/Kotlin obfuscation

To support Java and Kotlin obfuscation for Android, I had a look at Proguard/Proguard-core which actually provides all the components to create an obfuscation pipeline.

Proguard is known for obfuscating symbols (class names, methods, …) and optimizing code but it’s just the tip of the iceberg. The project is really really well designed and modular such as it was very easy to add an obfuscation pipeline.

O-MVLL/dProtect

I ended up with two projects:

  1. O-MVLL, in reference to the well know, parent of all the forks: O-LLVM.
  2. dProtect for {dex,droid}-Protector

O-MVLL

O-MVLL uses the original idea of eShard: a pass-plugin obfuscator. On the other hand, it uses a Python API instead of environment variables.

In short, the user defines what she/he wants to protect and how she/he wants to protect it in Python. For instance, to obfuscate strings the developer must override the obfuscate_string method:

def obfuscate_string(self,
                     mod: omvll.Module,
                     func: omvll.Function,
                     string: bytes):

  # Replace the string
  return "REDACTED"

  # Obfuscate the string on the stack
  return omvll.StringEncOptStack()

  # Obfsucate the string the stack with a
  # loop if the length is too long
  return omvll.StringEncOptStack(loopThreshold=0)
  ...

There is also mod: omvll.Module and func: omvll.Function in the arguments of the function.

These arguments are LLVM objects wrapped with Python bindings. Thus, the user can use all the information (name, flags, visibility 7) provided by LLVM for these objects.

If the user only wants to protect strings in the function omvll::decode(), she/he can use this condition:

def obfuscate_string(self,
                     mod: omvll.Module,
                     func: omvll.Function,
                     string: bytes):

    if func.demangled_name == "omvll::decode()":
        return StringEncOptGlobal()
    ...

You can find more details about this pass here and you can also explore the O-MVLL documentation here.

dProtect

For dProtect, nothing really new in terms of API compared to Proguard. I added custom obfuscation passes that can be enabled as follows:

# Mixed boolean-arithmetic Obfuscation
-obfuscate-arithmetic,high class com.dprotect.salsa20 {
  *;
}
# Strings protection
-obfuscate-strings "XEYnuNOGoEQ*", "TOKEN: *"
-obfuscate-strings class dprotect.Connect {
  private static java.lang.String API_KEY;
  public static java.lang.String getToken();
}
# Constants
-obfuscate-constants    class com.dprotect.secret.** { *; }
# Control-Flow
-obfuscate-control-flow class com.dprotect.internal.** { *; }

The logic of these dProtect’s passes relies on existing internal components available in Proguard and Proguard-CORE which have been developed by Eric Lafortune and James Hamilton.

In its current version, dProtect provides the following obfuscation passes:

CI

O-MVLL and dProtect are CI compiled with nightly packages available at these addresses:

The CI for O-MVLL could have been challenging to setup but since it is based on an out-of-tree plugin, once the right version of LLVM is pre-compiled, it takes about 2 minutes to compile the project from scratch for both: the NDK and the Xcode toolchain.

The environment to compile O-MVLL for the Android NDK and Xcode is fully Dockerized and available on Dockerhub: https://hub.docker.com/u/openobfuscator:

For those who are interested, you can check out the Github Action configurations:

Demo

I think the better way to demonstrate the level of obfuscation that can be provided by O-MVLL and dProtect in their current version, is to obfuscate an open-source project.

For that purpose, I choose optiv/android-ndk-crackme and the configuration of O-MVLL and dProtect are available here:

The cherry on the cake, I also applied some techniques described in “The Poor Man Obfuscator”:

In the end, we have this un-obfuscated version of the crackme: com.optiv.ndkcrackme.apk and this one, protected with O-MVLL and dProtect: com.optiv.ndkcrackme-protected.apk

About iOS, O-MVLL works as a PoC and the support is far from being finished:

iOS

The obfuscation passes Anti-Hooking and Control-Flow Breaking are not working correctly due to an issue (from O-MVLL) in the JIT engine.

The overall support for iOS is highly experimental.

Nevertheless, you can compare this (unprotected) version of zlib for iOS: libz.dylib with this protected one: libz-obfuscated.dylib.

The obfuscation has been done with this configuration file: ios-zlib-obf.py.

Last words

The project is one month old so you can expect bugs, limitations, and typos. I released a beta version on GitHub so feel free to try and send your feedback.

Regarding the time spent on the project:

  • 60% on the documentation, UI, website, etc
  • 30% on the obfuscation passes themselves
  • 5% on the CI
  • 5% on the tests

So yes, some obfuscation passes still need to be improved to be really efficient and the support for iOS is very very experimental (i.e. depending on the configuration, the compiler might crash). There is also a test suite for both dProtect and O-MVLL but it is not public yet.

Open-obfuscator is going to be my second largest project after LIEF and I hope it will serve its long-term purpose:

Providing an obfuscation playground for both, developers and reverse engineers.


Thank you for reading and happy Halloween 🎃

Romain


  1. Breaking obfuscation is a moot topic. ↩︎

  2. I guess the terms of use of the demo does not allow you to reverse it. ↩︎

  3. See: https://obfuscator.re/omvll/passes/control-flow-flattening/#references ↩︎

  4. This fork exists: https://github.com/emc2314/YANSOllvm ↩︎

  5. https://gitlab.com/eshard/scared ↩︎

  6. https://gitlab.com/eshard/estraces ↩︎

  7. In the current version, only the name is provided. ↩︎