Romain Thomas

The first part is here: Part 1 – SingPass RASP Analysis

After SingPass, I had a look at another application protected with the same obfuscator but with enhanced protections.

Compared to the previous application, this new application crashes immediately as soon as it is launched.

By checking the crash log, we don’t get any meaningful information since the obfuscator trashes some registers like LR before crashing. By trashing LR, the iOS crash analytics service is not able to correctly build the call stack of the functions that led to the crash.

On the other hand, by tracing the libraries loaded by the application, we can identify in which loaded library the application crashes, and thus, the library is likely in charge of checking the environment’s integrity.

$ ijector.py --spawn ios.app
iTrace started
PID: 63969 | tid: 771
Home: /private/var/mobile/Containers/Data/Application/A59541E1-106A-4C31-8188-0830E651449E
...
ImageLoader::containsAddress(0x1065f948c): cxxreact!1948c
ImageLoader::containsAddress(0x10564e270): ReactCommon!1a270
ImageLoader::containsAddress(0x103e5ed84): GRDB!12ed84
ImageLoader::containsAddress(0x104407790): Intercom!1bb790
ImageLoader::containsAddress(0x104c29d7c): KaaSLogging!9d7c
ImageLoader::containsAddress(0x105871bb4): RxSwift!91bb4
ImageLoader::containsAddress(0x1056f00cc): RxBluetoothKit!440cc
ImageLoader::containsAddress(0x104633f50): KaaSBle!bbf50
---> CRASH!

So the application crashes when loading the KaaSBle library embedded as a third-party framework of the application.

Compared SingPass, the library does not leak symbols about the RASP checks nor about the obfuscator. In addition, some functions are obfuscated with control-flow flattening and Mixed Boolean-Arithmetic (MBA) expressions as we can observe in the following figure:

iOS Control-Flow Flattening — Figure 1 - Control-Flow Flattening in the Constructor of KaaSBle

Based on the previous analysis of SingPass, we know that RASP checks related to jailbreak or debugger detection use uncommon functions like getpid, unmount or pathconf. It turns out that, these functions are also imported by KaaSBle which enables to identify where some of the RASP checks are located.

Uncommon imported functions like unmount are usually a good signature to identify potential RASP checks

For instance, the function sub_EBDC which uses getpid is likely involved in the debugger detection. This function is obfuscated with an MBA and control-flow flattening and, its graph is represented in Figure 2¹

Figure 2 - BinaryNinja HLIL Graph of sub_EBDC

Control-Flow Flattening

I won’t detail how generally control-flow flattening works as it already exists a good bunch of articles on this topic:

Nevertheless, we can notice that the state variable that is used to drive the execution through the flattened blocks is linear and not encoded:

The state variable set at the end of the basic block exactly defines the next basic block to execute.

This means that given:

A state value
The switch table
The switch base address

It is possible to easily compute the targeted basic block:

Design of Graph Flattening — Fig 3. Computation of the Basic Block from a State Variable

Fig 4. Simplified Overview

Since there is no encoding, we can determine the next states of a basic block by looking at the constant written in the local stack variable [sp, 0x50+var_4c] or the state_variable of the BinaryNinja High Level IL representation (Figure 2).

From a graph recovery perspective, this design completely fits in the case of the Quarkslab’s blog: recovering an OLLVM-protected program , thus the original graph could be completely recovered.

I also checked other large control-flow-flattened functions in the binary and they follow the same design with the same weakness.

Improvements

Spoiler: This example comes from an on-going larger project: open-obfuscator.

Actually we can enhance the protections of the control-flow flattening by encoding the state variable and by identifying the basic blocks of the switch table with random numbers (instead of 1, 2, 3 etc).

The following figure outlines this design:

Fig 5. Control-Flow Flattening with Random ID and Encoding

Concretely, the code generated does not use a lookup-switch table and the dispatcher is a succession of conditions:

Head of the O-MVLL flattened function — Figure 6 - Head of the Control-Flow Flattening

We can also observe the encoding block at the end of the graph:

Tail of the O-MVLL flattened function — Figure 7 - Tail of the Control-Flow Flattening

In this example, the encoding is simply $E(X) = X \oplus A + B$ but it could be protected with an MBA and generated with different expressions, unique per function. Globally speaking, any injective (or bijective) function should fit as an encoding.

In the end, it would increase the complexity of recovering the original graph at scale (even though the design is known).

Mixed-Boolean Arithmetic

We can also observe in Figure 2 that the function uses an MBA as an opaque zero or more precisely an opaque boolean.

Generally speaking, MBA are widely used by the obfuscator but they are usually represented under their simple form like $(A \oplus B) + (A \& B) \times 2$. In other words, we can’t quickly identify the underlying arithmetic operation but with limited efforts, we can simplify the expression using public tools.

If you want to dig more into MBA deobfuscation, I highly recommend this recent blog post Improving MBA Deobfuscation using Equality Saturation by Tim Blazytko and Matteo which also lists open-source tools that can be used for simplifying MBA like:

sspam
msynth (Used for this binary)

Triton also supports program synthesis: synthesizing_obfuscated_expressions.py :)

Strings Encoding

Most of the strings used in the library are encoded which prevents identifying quickly sensitive functions.

These encoded strings are decoded just-in-time near the instruction that uses given the string. In the blog post about PokemonGO, all the strings were decrypted at once in the Mach-O constructors which enabled to recover all of these strings without caring about reverse engineering the decoding routines. For the current obfuscator, we can’t exactly apply this technique.

Fig 8. Differences in Designing String Encryption

To better understand the difficulty, let’s take a closer look at how strings are encoded with the _unmount() function. As a reminder, this function is used as a part of jailbreak detection.

In the KaaSBle library, there are five cross-references to _unmount():

When looking at the prologue of the _unmount() calls, we get the following basic blocks:

Decoding Routine for /.bootstrapped — Figure 9 - Decoding Routine for the String /.bootstrapped

Which is equivalent to this snippet:

from itertools import cycle

def decode(encrypted: bytes, key: str, op):
    key       = bytes.fromhex(key)
    encrypted = bytes.fromhex(encrypted)
    out = ""
    for idx, (k, v) in enumerate(zip(encrypted, cycle(key))):
        out += chr(op(idx, k, v) & 0xFF)
    return out

# /.bootstrapped
clear = decode("9f0b698a3abc17e70bb54332271180", # Encoded string
               "b0250be555c8649379d43342427580", # Key
               lambda _, k, v: (k ^ v))          # Operation

It is worth mentioning that the string is not decoded in-placed but in another __data variable. This means that an encoded string takes potentially twice its size in the final binary.

Another example of a decoding routine:

Decoding Routine for /.installed_odyssey — Figure 10 - Decoding Routine for the String /.installed_odyssey

Which is equivalent to:

# /.installed_odyssey
clear = decode("1bec336463362f66602b365d672e4f756f3353", # Encoded string
               "ecbdc8f3",                               # Key
               lambda i, k, v: (k - v - i))              # Operation

In this case, the key is an uint32_t integer for which the bytes are accessed through a stack variable. The weird operation x12 = x8 & (x8 ^ 0xfffffffffffffffc) is simply a modulus sizeof(uint32_t) :)

In summary, because of the disparity of the encodings which are mixed with MBA and unique keys, it would be quite difficult to statically decode all the strings of the library. On the other hand, since the clear strings are written in the __data section of the binary, we can dump – at some point in the execution – this section and observe the clear strings (c.f. Singpass RASP Analysis - Jailbreak Detection).

Crash Analysis

When the obfuscator detects that the environment is compromised (jailbroken device, debugger attached, …), it reacts by crashing the application. This crash occurs through different techniques among which:

Corrupting a global pointer
Executing a break instruction (BRK #1)
Trashing the link register and frame register (LR / FP)
Calling objc_msgSend with corrupted parameters

The instructions involved in crashing the application are inlined in the function where the check occurs. This means that there is as many crash routine as there are RASP checks.

In particular, with such a design, we can’t target a single function to bypass the different checks as I did for SingPass.

Hooking the Syscalls

This approach is inspired by this talk at Pass the Salt: Jailbreak Detection and How to Bypass Them

To better understand the problem, let’s recap the situation:

The code is obfuscated with CFG flattening, MBA, etc
The RASP checks are inlined in the code
The application crashes near the detection spot. In particular and compared to SingPass, there is no RASP endpoint that can be hooked.

The following figure depicts the differences in the RASP reaction between the two applications:

Figure 11 - RASP Reaction: User Callback vs Crash

We can’t actually hook a function to bypass the RASP checks but the structure of the AArch64 instructions has a valuable property:

The size of an AArch64 instruction is fixed

As a consequence, we can linearly search the SVC #80 instructions which are encoded as 0xD4001001.

Interception

Let’s consider the following approach to intercept the syscalls:

We linearly scan the __text section to find the SVC instructions (i.e. the four-bytes 0xD4001001)
We replace this instruction with a branch (BL #imm) to a function we control
We process the redirection to disable the RASP checks

For the first point, thanks to the fixed instruction’s size, we can search syscalls by reading the whole __text section:

static constexpr uint32_t SVC         = 0xD4001001; // SVC #0x80
static constexpr size_t   SIZEOF_INST = 4;

for (size_t addr = text_start; addr < text_end; addr += SIZEOF_INST) {
  // Read the instruction
  auto inst = *reinterpret_cast<uint32_t*>(addr);
  if (inst != SVC) {
    continue;
  }

  // We found a syscall instruction at: `addr`
}

For the second point, on a syscall instruction, we have to patch the syscall with a branch. To do so, Frida’s gum_memory_patch_code is pretty convenient:

void* svc_addr = /* Address of the syscall to patch */

gum_memory_patch_code(svc_addr, /* sizeof an arm64 inst */ 4,
                      [] (void* addr, void*) {
                        GumArm64Writer* writer = gum_arm64_writer_new(addr);

                        /* Transform a SVC #0x80 into BL #AABBCC */
                        gum_arm64_writer_put_bl_imm(writer, 0xAABBCC);
                      }, nullptr);
);

The pending question is where to branch the new BL instruction instead of 0xAABBCC?

Ideally, we would like to jump on our own dedicated stub:

void handler() {
  // ...
}

{
  // ...
  gum_arm64_writer_put_bl_imm(writer, &handler);
}

But, the bl #imm instruction only accepts an immediate value in the range of ]-0x8000000; 0x8000000[. This range might be too narrow to encode our absolute pointer &handler.

The BL instruction encodes the signed #imm as a multiple of 4 on 26 bits. Thus, and because of the sign bit, this #imm can range from: ±1 << (26 + 2 - 1);

We can actually workaround this restriction by using a trampoline located in the library where the RASP checks occur. It is quite common for large binary to find small functions with one or two instructions that are not likely or rarely used:

The idea is to use one of these functions as a placeholder to write two instructions which enables to branch an absolute address:

LDR   x15, =&handler
BR    x15

Since this placeholder function is located within the library where the syscalls take place, we can BL #imm to this function without risking too much that #imm overflows the range ]-0x8000000; 0x8000000[.

Fig 14. Syscall Patch

Now that we found a mechanism to redirect the syscall instruction, we can focus on the handler function which aims at welcoming the syscall’s redirection.

First, the SVC instructions are atomic which means that our handler function must take care of not corrupting the values of the registers.

In particular, handler can’t follow the ARM64 calling convention. If we consider the following instructions:

mov x6, #0
...
svc #0x80
...
mov x2, x6

svc #0x80 does not corrupt x6 while this code:

mov x6, #0
...
BL #imm
...
mov x2, x6

could corrupt x6 according to the ARM64 calling convention. Therefore, our handler() function must really mimic an interruption and take care of correctly saving/restoring the registers.

In other words, we must write a small assembly stub to save and restore the registers²

stp x0,  x1,  [sp, -16]!
...
stp x28, x29, [sp, -16]!
stp x30, xzr, [sp, -16]!

mov x0, sp

bl _syscall_handler;

ldp x30, xzr, [sp], 16
ldp x28, x29, [sp], 16
...
ldp xzr, x1,  [sp], 16
ret

The syscall_handler function takes a pointer to the stack frame as a parameter. Thus, we can access the saved registers:

extern "C" {
uintptr_t syscall_handler(uintptr_t* sp) {
  uintptr_t x16 = sp[14]; // Syscall number
  return -1;
}
}

Apple prefixes (or mangles) symbols with a _ this is why syscall_handler is referenced by _syscall_handler in the assembly code.

Given our syscall_handler function, we have access to the original AArch64 registers such as we can access the syscall number and its parameters. We are also able to modify the return value since the original syscall is replaced by a branch.

Fig 14. Syscall Redirection

A PoC that wraps all this logic will be published on GitHub.

Conclusion

Whilst this application uses the same obfuscator as in the previous blog post, it was configured with multi-layered code obfuscation which includes control-flow flattening and MBA. In addition, the RASP checks are also configured to crash the application instead of calling a callback function and displaying a message. These improvements in the configuration of the obfuscator make the reverse engineering of the application harder compared to the previous SingPass application.

This blog post also detailed a new AArch64-generic technique to intercept RASP syscalls which resulted in a successful bypass of the RASP checks. This technique should also apply to Android AArch64.

This is the last part of this series about iOS obfuscation. As I said in the first disclaimer, the obfuscator used for protecting these applications is and remains a good choice to protect assets from reverse engineering.

The graph is more convenient to explore if Javascript is enabled. ↩︎
We don’t restore x0 as we want to change the return value from _syscall_handler. ↩︎

Part 2 – iOS Native Code Obfuscation and Syscall Hooking

Control-Flow Flattening

Improvements

Mixed-Boolean Arithmetic

Strings Encoding

Crash Analysis

Hooking the Syscalls

Interception

Conclusion