Hybrid File-based (File)

Facebook X Reddit LinkedIn

Months ago, I started working on a prototype project related to attack surface management(ASM). The project needs some kernel programming to optimize and extend some functionalities. After days of work, I faced a problem that was not a problem!

The Kallsyms

As you probably know, Because of some security and licensing claims issues, in newer kernel versions(>5.7.0), functions are not exported anymore by default. One of these functions is kallsyms_lookup_name(). Kernel modules can utilize it to resolve symbol addresses. So, because this function is no longer exported, some activities like system call hooking face some complexities.

But if we want to know why this decision was made, the primary reason is as follows:

kallsyms_lookup_name() and kallsyms_on_each_symbol() are exported to modules despite having no in-tree users and being wide open to abuse by out-of-tree modules that can use them as a method to invoke arbitrary non-exported kernel functions.

Any Solution?

There were some situations in which I needed to use this function(and some other unexported functions). I found some solutions like brute forcing symbol addresses using sprint_symbol() function or using kprobe. As you probably know, the sprint_symbol() does almost the opposite of kallsyms_lookup_name(), so we can utilize this to brute-force addresses to find a symbol. Another solution that uses a kprobe to simulate kallsyms_lookup_name() exists to resolve itself.

The first solution is straightforward, but it can be mitigated by the kernel easily later on. The second solution is remarkable. But after some thrash, I decided to implement my solution. Honestly, it was just a naughty job. I implemented my idea in various versions. I named the first version Hybrid file-based. And this post is about this. The idea behind the first version is as follows:

The symbol addresses, especially system calls, exist in /proc/kallsyms file, so we can read this file and get the desired symbol address. Note that if we want to extract the correct addresses, you should read this file with root privilege(e.g. sudo). It is not a problem because loading kernel modules requires root privilege.

Reading the /proc/kallsyms is challenging. It requires bypassing security restrictions, so I wrote a separate app to read and extract the desired symbol from /proc/kallsyms and write the result to a file. The logic behind the app is simple: It opens the /proc/kallsyms, finds the specific line, extracts the address from the line, and writes the address to a file. I pushed the code to my new Github here.

Before compiling the code, change the value of SYMBOL_NAME to your desired symbol(e.g. __x64_sys_read), and OUTPUT_FILE to the address of the file you want to write the address to.

# gcc -o userspace userspace.c

Bash

I want to ignore the explanation of this code because it’s clear and simple.

Now, the kernel-space program should execute the later app and read the address from the file that the app that wrote the address. It has a function named run_user_program(). This function sets inputs and environment variables for the previous app and then returns a call of the call_usermodehelper() function. The call_usermodehelper() function in the Linux kernel is used to start a user-space application from within kernel space. This function is particularly useful for scenarios where the kernel needs to execute a user-space program, such as during certain system events or for specific kernel modules. This function potentially gets four arguments:

path(const char*): the path of the user-space executable
argv(char **): the argument array for the process
envp(char **): the environment variables for the process
wait(int): wait for the application to finish and return status

There are three possible value for wait:

UMH_NO_WAIT(0): The kernel does not wait for the user-space application to start or complete. The application runs asynchronously as a child of the keventd process
UMH_WAIT_EXEC(1): The kernel waits for the user-space application to start (i.e., for the exec system call to complete) but does not wait for the application to finish executing
UMH_WAIT_PROC(2): The kernel waits for the user-space application to start (i.e., for the exec system call to complete) but does not wait for the application to finish executing

It’s clear that the best choice for our case is UMH_WAIT_PROC because we want the user-space program to write the address to the file and then the kernel module read it.

The sym_addr_lookup() function reads the OUTPUT_FILE, which is defined and specified in the user-space program, and returns the address of the SYMBOL_NAME which is defined and specified in the user-space program. It first opens the file with filp_open() and reads the file with the kernel_read() function. Finally, it converts the read string to unsigned long and returns that.
I pushed this code here.

Note that the get_fs and set_fs functions have been removed from newer versions of the Linux kernel primarily due to security concerns and the need for cleaner, safer code therefore I used the kernel_read function.

Testing as a Kernel Module

Now, let’s test the solution as a real kernel module. I want to create a simple kernel module that reads the specific symbol address from a file written by the user-space program and prints the address to the dmesg. We should add two functions to the kernel-space code: lkm_init and lkm_exit. Note that the first part of the name is arbitrary. Therefore, one may choose anything instead of lkm but with correct declarations of __init and __exit.
Let’s look at the newly added functions:

static int __init lkm_init(void)
{
    unsigned long addr;
    addr = sym_addr_lookup();
    printk(KERN_INFO, "The address of read is: %lx\n", addr);
    return 0;
}

static void __exit lkm_exit(void)
{
    printk(KERN_INFO, "Bye\n");
}

the lkm_init() function is executed once the module is loaded, and lkm_exit() when it is unloaded. Finally add two line of code to get the entry point to the kernel module:4

module_init(lkm_init);
module_exit(lkm_exit);

Now, create a Makefile to compile the code:

obj-m += lkm.o

all:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
        
clean:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Makefile

Note that the kernel module of mine named: lkm so all files start with lkm.
Compile the code and load the module:

# make

Bash

The make command generates some files(e.g. object file). One of the is lkm.ko. This file is a kernel file with some kernel data structures to extend the kernel functionalities. We should load this file:

# insmod lkm.ko

Bash

Now if you watch kernel log with dmesg, you should see the following message:

Address of read: ffffffffbb99f2e0

Bash

Note that this address is for my system. The printed address for you may differ.

The kernel module code can be found here.

What will happen after this?

I will improve the idea by converting the solution to a fileless one. First I will use shared memory instead of file. Then I will use some other ways to share data between user-space and kernel-space. Finally I will remove the user-space program and execute the code in memory.

Hey, This is AmirReza!

Sense the Darkness! Discover the Brilliance!

The Kallsyms

Any Solution?

Testing as a Kernel Module

What will happen after this?

Leave a Reply Cancel reply