kretprobe

connect()


static __always_inline int trace_ret_generic(u32 id, struct pt_regs *ctx, u64 types, u32 scope)
{
    if (skip_syscall())
        return 0;

    sys_context_t context = {};
    args_t args = {};

    if (ctx == NULL)
        return 0;

    if (load_args(id, &args) != 0)
        return 0;

    init_context(&context);

    context.event_id = id;
    context.argnum = get_arg_num(types);
    context.retval = PT_REGS_RC(ctx);

    // skip if No such file/directory or if there is an EINPROGRESS
    // EINPROGRESS error, happens when the socket is non-blocking and the connection cannot be completed immediately.
    if (context.retval == -2 || context.retval == -115)
    {
        return 0;
    }

    if (context.retval >= 0 && drop_syscall(scope))
    {
        return 0;
    }

    set_buffer_offset(DATA_BUF_TYPE, sizeof(sys_context_t));

    bufs_t *bufs_p = get_buffer(DATA_BUF_TYPE);
    if (bufs_p == NULL)
        return 0;

    save_context_to_buffer(bufs_p, (void *)&context);
    save_args_to_buffer(types, &args);
    events_perf_submit(ctx);
    return 0;
}

Overview

  • The function starts by checking if skip_syscall() returns a truthy value. If it does, the function returns 0 and exits early.

  • Two local variables, context and args, are declared and initialized. context is of type sys_context_t and args is of type args_t.

  • If the ctx parameter is NULL, the function returns 0 and exits early.

  • The function calls the load_args function, passing id and a pointer to args. If load_args returns a non-zero value (indicating an error), the function returns 0 and exits early.

  • The init_context function is called to initialize the context variable.

  • Various fields of the context structure are set:

    • event_id is set to id.
    • argnum is set to the result of the get_arg_num function, passing types.
    • retval is set to the return value of the system call, obtained from PT_REGS_RC(ctx).
  • If the retval is -2 or -115 (indicating “No such file/directory” or “EINPROGRESS” error respectively), the function returns 0 and exits early.

  • If the retval is greater than or equal to 0 and the drop_syscall function returns a truthy value when passed scope, the function returns 0 and exits early.

  • The set_buffer_offset function is called to set the buffer offset for the DATA_BUF_TYPE

  • The get_buffer function is called to retrieve a pointer to a buffer of type DATA_BUF_TYPE. If the pointer is NULL, the function returns 0 and exits early.

  • The save_context_to_buffer function is called to save the context structure to the buffer pointed by bufs_p.

  • The save_args_to_buffer function is called to save the arguments (args) to the buffer based on the provided types.

  • The events_perf_submit function is called, passing ctx as an argument. Finally, the function returns 0.

Overall, this function seems to be part of a tracing mechanism for system calls.

Breakdown of the function

static __always_inline int trace_ret_generic(u32 id, struct pt_regs *ctx, u64 types, u32 scope)
  • static: This keyword indicates that the visibility of the function is limited to the translation unit (source file) where it is defined. It means that the function cannot be accessed from other source files.

  • __always_inline: This is a compiler-specific attribute that suggests the compiler to inline the function whenever possible, regardless of the optimization level set by the user.

  • int: It specifies the return type of the function, which in this case is an integer.

  • trace_ret_generic: This is the name of the function.

  • (u32 id, struct pt_regs *ctx, u64 types, u32 scope): These are the function parameters. It expects four arguments:

  • id of type u32 (unsigned 32-bit integer)

  • ctx of type struct pt_regs * (a pointer to a structure of type pt_regs)

  • types of type u64 (unsigned 64-bit integer)

  • scope of type u32 (unsigned 32-bit integer)


  1. The function is intended to be used for tracing and returning a value of type int.
  sys_context_t context = {};

1.1 Required definitions

typedef struct __attribute__((__packed__)) sys_context
{
    u64 ts;

    u32 pid_id;
    u32 mnt_id;

    u32 host_ppid;
    u32 host_pid;

    u32 ppid;
    u32 pid;
    u32 uid;

    u32 event_id;
    u32 argnum;
    s64 retval;

    char comm[TASK_COMM_LEN];
} sys_context_t;

The line sys_context_t context = {}; initializes a variable named context of type sys_context_t using an empty initializer {}.

  • __attribute__((__packed__)): This is an attribute specified using the attribute syntax. It indicates that the structure should be packed tightly, without any padding between members. This ensures that the structure’s memory layout is compact and doesn’t contain any unused bytes due to alignment requirements.

The members of the sys_context structure are defined as follows:

  • u64 ts: An unsigned 64-bit integer representing a timestamp.
  • u32 pid_id: An unsigned 32-bit integer representing a process identifier (PID) ID.
  • u32 mnt_id: An unsigned 32-bit integer representing a mount identifier (MNT) ID.
  • u32 host_ppid: An unsigned 32-bit integer representing the host's parent process identifier (PPID).
  • u32 host_pid: An unsigned 32-bit integer representing the host's process identifier (PID).
  • u32 ppid: An unsigned 32-bit integer representing the parent process identifier (PPID).
  • u32 pid: An unsigned 32-bit integer representing the process identifier (PID).
  • u32 uid: An unsigned 32-bit integer representing the user identifier (UID).
  • u32 event_id: An unsigned 32-bit integer representing the event identifier.
  • u32 argnum: An unsigned 32-bit integer representing the number of arguments.
  • s64 retval: A signed 64-bit integer representing the return value of a system call.
  • char comm[TASK_COMM_LEN]: An array of characters representing the command name associated with the process. TASK_COMM_LEN is likely a predefined constant specifying the maximum length of the command name.

Overall, this structure is used to store various context information related to a system call or process, such as timestamps, process identifiers, user identifiers, event information, and return values. The use of the __attribute__((__packed__)) attribute ensures that the structure’s memory layout is tightly packed without any padding.


  1. This check ensures that the function can safely handle the case when the ctx pointer is NULL and avoids potential issues or crashes that may occur when trying to access or manipulate data through a null pointer
  if (ctx == NULL)
        return 0;
  • ctx is a pointer of type struct pt_regs *. It is being checked to see if it is pointing to NULL, indicating that it doesn’t point to a valid memory location.

  • If ctx is indeed NULL, the code block following the condition is executed. In this case, the code simply returns 0, which means that the function trace_ret_generic exits early and returns a value of 0.


  1. In summary, the load_args function retrieves previously saved arguments from a BPF map (args_map) based on the event_id and current process/thread group ID (tgid). It copies the retrieved arguments to the provided args structure and removes the entry from the map. If the lookup fails, it returns -1.
   if (load_args(id, &args) != 0)
        return 0;

3.1 Required definition

static __always_inline int load_args(u32 event_id, args_t *args)
{
    u32 tgid = bpf_get_current_pid_tgid();
    u64 id = ((u64)event_id << 32) | tgid;

    args_t *saved_args = bpf_map_lookup_elem(&args_map, &id);
    if (saved_args == 0)
    {
        return -1; // missed entry or not a container
    }

    args->args[0] = saved_args->args[0];
    args->args[1] = saved_args->args[1];
    args->args[2] = saved_args->args[2];
    args->args[3] = saved_args->args[3];
    args->args[4] = saved_args->args[4];
    args->args[5] = saved_args->args[5];

    bpf_map_delete_elem(&args_map, &id);

    return 0;
}

if (load_args(id, &args) != 0) return 0; is another conditional statement in the trace_ret_generic function. Here’s what it does:

The code calls the load_args function, passing id and a pointer to the args structure (&args) as arguments.

The result of the load_args function is compared to 0 using the inequality operator !=. The load_args function likely returns 0 to indicate success, while a non-zero value indicates an error or failure.

If the result of load_args is not equal to 0, indicating an error occurred during the function call, the code block following the condition is executed.

In this case, the code simply returns 0, indicating that the trace_ret_generic function exits early and returns a value of 0.

This check is used to handle the case when the load_args function fails or encounters an error. By returning 0, the function indicates that it cannot proceed further due to the error in loading the arguments and terminates its execution.

load_args function Let’s understand its functionality:

  • The function takes two parameters: event_id of type u32 and args of type args_t*, which is a pointer to a structure args_t.

  • The function starts by getting the current process ID and thread group ID using bpf_get_current_pid_tgid() and assigns it to the variable tgid.

  • The variable id is created by combining event_id and tgid using a bitwise shift and bitwise OR operations.

  • The bpf_map_lookup_elem function is called, passing the address of args_map and the address of id as arguments. It attempts to look up an element in the args_map BPF map using the id as the key.

  • If the return value saved_args is equal to 0 (indicating a missed entry or not a container), the function returns -1 to indicate an error.

  • If a valid saved_args element is found in the map, the individual elements of saved_args are copied to the corresponding elements of args using assignments.

  • After the values are copied, the bpf_map_delete_elem function is called to remove the element from the args_map using the id as the key.

  • Finally, the function returns 0 to indicate successful loading of arguments.


  1. Calls a function named init_context and passes the address of the context variable as an argument (&context).
    init_context(&context);
  • init_context is a function that initializes the context structure with default values or performs some necessary setup.

  • By passing the address of the context variable, the function can modify the contents of the context structure directly within the init_context function.

  • The function starts by getting the current task’s task_struct pointer using bpf_get_current_task() and assigns it to the local variable task.

  • The timestamp (ts) of the context structure is set to the current time in nanoseconds using bpf_ktime_get_ns().

  • The host_ppid field of the context structure is set by calling the get_task_ppid function, passing the task pointer.

  • The host_pid field of the context structure is set by shifting the result of bpf_get_current_pid_tgid() by 32 bits to the right.

  • The uid field of the context structure is set to the current user identifier (UID) obtained from bpf_get_current_uid_gid().

  • The command name (comm) of the current task is retrieved using bpf_get_current_comm and stored in the context->comm array with a size of sizeof(context->comm).

Finally, the function returns 0 to indicate successful initialization of the context structure.

The updated init_context function initializes the tshost_ppidhost_piduid, and comm fields of the sys_context_t structure based on the current task’s information.

4.1 Required Definition

static __always_inline u32 get_task_ppid(struct task_struct *task)
{
    struct task_struct *parent = READ_KERN(task->parent);
    return READ_KERN(parent->pid);
}

Defines a function get_task_ppid that takes a pointer to a struct task_struct named task as an argument. It is marked with the __always_inline attribute, indicating that it should be inlined by the compiler whenever possible.

  • Inside the function, it performs the following steps:

  • It declares a local variable parent of type struct task_struct*.

  • It reads the value of task->parent using the READ_KERN macro or function, which suggests that it reads a kernel memory location.

  • It assigns the value read from task->parent to the parent variable.

  • It reads the value of parent->pid using the READ_KERN macro or function.

  • It returns the value read from parent->pid as the result of the function.

Overall, the get_task_ppid function retrieves the parent process ID (PPID) of a given task by accessing the parent field of the task_struct and reading its pid field.

4.2 Required Definition

#define GET_FIELD_ADDR(field) &field

#define READ_KERN(ptr)                                     \
    ({                                                     \
        typeof(ptr) _val;                                  \
        __builtin_memset((void *)&_val, 0, sizeof(_val));  \
        bpf_probe_read((void *)&_val, sizeof(_val), &ptr); \
        _val;                                              \
    })

Provides two macros: GET_FIELD_ADDR and READ_KERN.

GET_FIELD_ADDR(field) macro simply takes a field name as an argument and returns the address of that field. For example, if you have a field named my_field in a structure, you can use GET_FIELD_ADDR(my_field) to obtain its address.

READ_KERN(ptr) macro is a compound statement that reads a value from kernel memory at the given pointer ptr. It uses the bpf_probe_read function to safely read the value, ensuring that the memory access is valid and doesn’t cause issues. The macro is defined using a GCC extension called statement expression, denoted by ({ … }). Within the macro:

a. It declares a local variable _val with the same type as ptr. b. It uses __builtin_memset to zero out the memory occupied by _val. c. It then calls bpf_probe_read to read the value from ptr into _val, ensuring that it doesn’t exceed the size of _val. d. Finally, it returns the value of _val.

The purpose of the READ_KERN macro is to provide a safe mechanism to read values from kernel memory within the eBPF program, using the bpf_probe_read function. It helps ensure that the memory access is valid and avoids potential issues.


  1. These assignments populate the event_idargnum, and retval fields of the sys_context_t structure with relevant information related to the traced syscall.
    context.event_id = id;
    context.argnum = get_arg_num(types);
    context.retval = PT_REGS_RC(ctx);
  • The event_id field is assigned the value of the id variable. This value is used to identify the specific event or syscall being traced.

  • The argnum field is assigned the result of the get_arg_num function, which takes the types parameter as an argument.

  • The retval field is assigned the return value of the traced syscall, obtained from PT_REGS_RC(ctx)PT_REGS_RC is a macro or function that extracts the return value from the pt_regs structure pointed to by the ctx parameter.

5.1 Required defintion

static __always_inline int get_arg_num(u64 types)
{
    unsigned int i, argnum = 0;

#pragma unroll
    for (i = 0; i < MAX_ARGS; i++)
    {
        if (DEC_ARG_TYPE(i, types) != NONE_T)
            argnum++;
    }

    return argnum;
}
  • get_arg_num function calculates the number of arguments based on the types parameter, which is of type u64 (unsigned 64-bit integer).

Here’s how the function works:

  • The function declares two variables: i, which represents the loop counter, and argnum, which is used to count the number of non-NONE_T argument types.

  • The #pragma unroll directive suggests that the loop should be unrolled by the compiler for performance optimization. This pragma is used to provide a hint to the compiler about loop unrolling, but its effect may vary depending on the compiler.

  • The loop iterates over MAX_ARGS (presumably a predefined constant) times, starting from 0 and incrementing i by 1 in each iteration.

  • Inside the loop, the function checks if the argument type for the current index (i) obtained from DEC_ARG_TYPE(i, types) is not equal to NONE_T. If the argument type is not NONE_T, the argnum variable is incremented.

  • After the loop completes, the function returns the final value of argnum, which represents the number of non-NONE_T argument types encountered during the loop.

In summary, the get_arg_num function iterates over a range of indices and counts the number of non-NONE_T argument types based on the provided types value.

5.2 Required defintion

#define DEC_ARG_TYPE(n, type) ((type >> (8 * n)) & 0xFF)

The macro DEC_ARG_TYPE(n, type) takes two arguments:

  • n: Represents the index of the argument type to be extracted.
  • type: Represents the input value from which the argument type is extracted.

Here’s how the macro works:

  • The macro shifts the type value to the right by 8 * n bits. This effectively aligns the desired argument type at the least significant byte position.

  • The & 0xFF operation is performed to mask all but the least significant byte, ensuring that only the value of the desired argument type is retained.

  • The resulting value represents the argument type extracted from the type value at the specified index.

In summary, the DEC_ARG_TYPE macro extracts the argument type at the given index from the provided type value. The 8 * n bit shift aligns the desired argument type, and the & 0xFF operation masks the value to retain only the least significant byte.

5.2 Required defintion

#define MAX_ARGS 6
#define ENC_ARG_TYPE(n, type) type << (8 * n)
#define ARG_TYPE0(type) ENC_ARG_TYPE(0, type)
#define ARG_TYPE1(type) ENC_ARG_TYPE(1, type)
#define ARG_TYPE2(type) ENC_ARG_TYPE(2, type)
#define ARG_TYPE3(type) ENC_ARG_TYPE(3, type)
#define ARG_TYPE4(type) ENC_ARG_TYPE(4, type)
#define ARG_TYPE5(type) ENC_ARG_TYPE(5, type)

This defines several macros related to argument types. Let’s go through each of them:

  • MAX_ARGS: This macro defines the maximum number of arguments as 6.

ENC_ARG_TYPE(n, type): This macro takes two arguments:

  • n: Represents the index of the argument type.

  • type: Represents the value of the argument type. The macro left-shifts the type value by 8 * n bits, effectively encoding the argument type at the specified index.

  • ARG_TYPE0ARG_TYPE1ARG_TYPE2ARG_TYPE3ARG_TYPE4ARG_TYPE5: These macros are convenience macros for encoding argument types at specific indices. Each of these macros takes a single argument, type, which represents the value of the argument type. They use the ENC_ARG_TYPE macro to encode the argument type at the corresponding index.

For example, if you want to encode an argument type for index 2, you can use the ARG_TYPE2 macro like this: ARG_TYPE2(my_argument_type). This will effectively encode the my_argument_type value at index 2 by left-shifting it by 8 * 2 bits.

These macros provide a convenient way to encode and manipulate argument types based on their respective indices.

Example:

ARG_TYPE0(SOCK_DOM_T) | ARG_TYPE1(SOCK_TYPE_T) | ARG_TYPE2(INT_T)
  • results in a bitwise OR operation between the encoded argument types for index 0,index 1, and index 2.Here’s what it means:

  • ARG_TYPE0(SOCK_DOM_T): This macro expands to the encoding of the SOCK_DOM_T argument type at index 0. Based on the provided definitions, SOCK_DOM_T has a value of 15UL. Therefore, ARG_TYPE0(SOCK_DOM_T) results in the encoded value 15UL « (8 * 0), which is 15UL.

  • ARG_TYPE1(SOCK_TYPE_T): This macro expands to the encoding of the SOCK_TYPE_T argument type at index 1. According to the definitions, SOCK_TYPE_T has a value of 16UL. Therefore, ARG_TYPE1(SOCK_TYPE_T) results in the encoded value 16UL « (8 * 1), which is 4096UL.

  • ARG_TYPE2(INT_T): This macro expands to the encoding of the INT_T argument type at index 2. From the definitions, INT_T has a value of 1UL. Thus, ARG_TYPE2(INT_T) results in the encoded value 1UL « (8 * 2), which is 65536UL.

The bitwise OR operation (|) is applied between these encoded values, resulting in the final value:

15UL | 4096UL | 65536UL

6.Checks the value of the context.retval variable and returns 0 if it is equal to either -2 or -115. This condition is used to skip further processing if the returned value indicates specific error conditions related to file/directory operations or non-blocking socket connections.

if (context.retval == -2 || context.retval == -115)
    {
        return 0;
    }

7.The set_buffer_offset function is used to update the value associated with a key in the bufs_offset map, allowing the buffer offset to be set for a specific buffer type.

    set_buffer_offset(DATA_BUF_TYPE, sizeof(sys_context_t));

7.1 Required definition

#define DATA_BUF_TYPE 0

7.2 Required definition

static __always_inline void set_buffer_offset(int buf_type, u32 off)
{
    bpf_map_update_elem(&bufs_offset, &buf_type, &off, BPF_ANY);
}
  • Defines a function called set_buffer_offset that takes two parameters: buf_type of type int and off of type u32. The function is declared with the __always_inline attribute, which suggests that the compiler should try to inline the function whenever possible.

  • The function uses the bpf_map_update_elem function to update an element in the bufs_offset map.

  • The bpf_map_update_elem function is used to update the value associated with the buf_type key in the bufs_offset map. It takes the address of buf_type as the key parameter, the address of off as the value parameter, and the BPF_ANY flag to indicate that the update should overwrite any existing value associated with the key.

7.3 Required definition

BPF_PERCPU_ARRAY(bufs_offset, u32, 3);

Defines a BPF (Berkeley Packet Filter) per-CPU array named bufs_offset. Here’s a breakdown of its components:

  • BPF_PERCPU_ARRAY: This is a macro provided by the BPF framework. It is used to declare a per-CPU array in the eBPF (extended Berkeley Packet Filter) program.

Per-CPU arrays allow each CPU to have its own separate copy of the array, providing efficient parallel access to the array elements.

  • bufs_offset: This is the name given to the per-CPU array. You can use this name to refer to the array in other parts of the code.

  • u32: This specifies the type of the elements in the array. In this case, u32 represents an unsigned 32-bit integer.

  • 3: This specifies the size of the array. It indicates that the bufs_offset array will have 3 elements. r-CPU array named bufs_offset with 3 element Overall, the line of code declares a pes, where each element is an unsigned 32-bit integer.


  1. This line retrieves a buffer of type DATA_BUF_TYPE and assigns its address to the bufs_p pointer variable. The bufs_p pointer can then be used to access and manipulate the buffer contents.
    bufs_t *bufs_p = get_buffer(DATA_BUF_TYPE);
    if (bufs_p == NULL)
        return 0;

The line bufs_t *bufs_p = get_buffer(DATA_BUF_TYPE); declares a pointer variable bufs_p of type bufs_t* and initializes it with the result of the get_buffer function call. The argument passed to the get_buffer function is DATA_BUF_TYPE.

After obtaining the pointer bufs_p, the code checks if it is NULL. If the pointer is NULL, it means that the buffer retrieval was unsuccessful, possibly indicating an error or absence of the desired buffer. In such a case, the code returns 0 to indicate failure or an error condition.

8.1 Required Definition

#define MAX_BUFFER_SIZE 32768
typedef struct buffers
{
   u8 buf[MAX_BUFFER_SIZE];
} bufs_t;

Defines a structure named buffers with a single member buf. Here’s a breakdown of its components:

  • typedef: This keyword is used to create a new type alias. In this case, it is used to create an alias bufs_t for the structure struct buffers.

  • struct buffers: This specifies the name of the structure being defined.

  • u8 buf[MAX_BUFFER_SIZE]: This declares a member variable buf of type u8 (unsigned 8-bit integer) array. The array has a size of MAX_BUFFER_SIZE

  • Each instance of bufs_t will have a buf member that can hold up to MAX_BUFFER_SIZE bytes of data.

8.2 Required Definition

static __always_inline bufs_t *get_buffer(int buf_type)
{
    return bpf_map_lookup_elem(&bufs, &buf_type);
}

The function get_buffer is defined as an inline function with the static and __always_inline attributes. It takes an integer buf_type as an argument.

  • Inside the function, it calls the bpf_map_lookup_elem function, passing the address of the buf_type variable and the bufs map as arguments. The bpf_map_lookup_elem function is used to retrieve the value associated with the specified key (buf_type) from the bufs map.

  • The function then returns the result of the bpf_map_lookup_elem function call, which is a pointer to the bufs_t structure corresponding to the buf_type key in the bufs map.

  • In summary, the get_buffer function is responsible for looking up and returning a pointer to a buffer of type bufs_t based on the provided buf_type

BPF_PERCPU_ARRAY(bufs, bufs_t, 3);
  • BPF_PERCPU_ARRAY(bufs, bufs_t, 3); declares a per-CPU array named bufs with a capacity of 3 elements, where each element has the type bufs_t.

  • A per-CPU array is a special type of array in eBPF programs that provides individual instances of the array for each CPU core. Each CPU core has its own separate copy of the array, which allows for concurrent access without synchronization.

  • In this case, the per-CPU array bufs is declared to hold elements of type bufs_t, which is a structure with a member buf of type u8 array. The array size is determined by the MAX_BUFFER_SIZE constant defined earlier as 32768. Therefore, each element of the bufs array will have a buf member capable of holding up to MAX_BUFFER_SIZE bytes of data.

  • The per-CPU array bufs can be accessed and manipulated by different CPU cores concurrently, making it suitable for storing and processing data in a multi-core environment.


  1. In summary, the code snippet attempts to save the contents of the sys_context_t structure into a buffer by calling the save_context_to_buffer function. If successful, it returns the size of the structure; otherwise, it returns 0 to indicate failure.
    save_context_to_buffer(bufs_p, (void *)&context);   
static __always_inline int save_context_to_buffer(bufs_t *bufs_p, void *ptr)
{
    if (bpf_probe_read(&(bufs_p->buf[0]), sizeof(sys_context_t), ptr) == 0)
    {
        return sizeof(sys_context_t);
    }

    return 0;
}
  • Assuming the bufs_p pointer is not NULL, the code proceeds to call the save_context_to_buffer function. This function is responsible for copying the contents of the sys_context_t structure pointed to by ptr into the buffer pointed to by bufs_p->buf.

  • The bpf_probe_read function is used to safely read the memory pointed to by ptr and copy it into the buffer bufs_p->buf. If the read operation is successful (returns 0), the function returns the size of the sys_context_t structure (which is sizeof(sys_context_t)). Otherwise, if the read operation fails, the function returns 0.


  1.  
save_args_to_buffer(types, &args);

10.1 Required Definition

static __always_inline int save_args_to_buffer(u64 types, args_t *args)
{
    if (types == 0)
    {
        return 0;
    }

    bufs_t *bufs_p = get_buffer(DATA_BUF_TYPE);
    if (bufs_p == NULL)
    {
        return 0;
    }

#pragma unroll
    for (int i = 0; i < MAX_ARGS; i++)
    {
        switch (DEC_ARG_TYPE(i, types))
        {
        case NONE_T:
            break;
        case INT_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), INT_T);
            break;
        case OPEN_FLAGS_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), OPEN_FLAGS_T);
            break;
        case FILE_TYPE_T:
            save_file_to_buffer(bufs_p, (void *)args->args[i]);
            break;
        case PTRACE_REQ_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), PTRACE_REQ_T);
            break;
        case MOUNT_FLAG_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), MOUNT_FLAG_T);
            break;
        case UMOUNT_FLAG_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), UMOUNT_FLAG_T);
            break;
        case STR_T:
            save_str_to_buffer(bufs_p, (void *)args->args[i]);
            break;
        case SOCK_DOM_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), SOCK_DOM_T);
            break;
        case SOCK_TYPE_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), SOCK_TYPE_T);
            break;
        case SOCKADDR_T:
            if (args->args[i])
            {
                short family = 0;
                bpf_probe_read(&family, sizeof(short), (void *)args->args[i]);
                switch (family)
                {
                case AF_UNIX:
                    save_to_buffer(bufs_p, (void *)(args->args[i]), sizeof(struct sockaddr_un), SOCKADDR_T);
                    break;
                case AF_INET:
                    save_to_buffer(bufs_p, (void *)(args->args[i]), sizeof(struct sockaddr_in), SOCKADDR_T);
                    break;
                case AF_INET6:
                    save_to_buffer(bufs_p, (void *)(args->args[i]), sizeof(struct sockaddr_in6), SOCKADDR_T);
                    break;
                default:
                    save_to_buffer(bufs_p, (void *)&family, sizeof(short), SOCKADDR_T);
                }
            }
            break;
        case UNLINKAT_FLAG_T:
            save_to_buffer(bufs_p, (void *)&(args->args[i]), sizeof(int), UNLINKAT_FLAG_T);
            break;
        }
    }

    return 0;
}

The save_args_to_buffer function is responsible for saving the arguments (args) to a buffer. The saving process depends on the types of the arguments specified by the types parameter.

Here’s a breakdown of the actions performed by the function:

  • If types is 0, indicating that there are no arguments to save, the function returns 0.
  • The function obtains a pointer to the buffer by calling the get_buffer function with the DATA_BUF_TYPE as the argument. If the obtained pointer is NULL, indicating an error in obtaining the buffer, the function returns 0.
  • The function then iterates over each argument using a loop. It uses the DEC_ARG_TYPE macro to extract the argument type at the given index i from the types.

Depending on the argument type, the function performs different actions:

  • For argument types such as INT_TOPEN_FLAGS_TPTRACE_REQ_TMOUNT_FLAG_TUMOUNT_FLAG_TSOCK_DOM_T, and UNLINKAT_FLAG_T, it calls the save_to_buffer function to save the argument value into the buffer.
  • For FILE_TYPE_T, it calls the save_file_to_buffer function to save the file information into the buffer.
  • For STR_T, it calls the save_str_to_buffer function to save the string argument into the buffer.
  • For SOCKADDR_T, it checks the family of the socket address and based on the family type (AF_UNIXAF_INETAF_INET6, or others), it calls the save_to_buffer function to save the corresponding sockaddr structure or the family value into the buffer.
  • After iterating over all the arguments, the function returns 0 to indicate successful saving of the arguments.

10.1 Required Definition

The save_to_buffer function is responsible for saving data to a buffer. Here’s an explanation of how the code works:

static __always_inline int save_to_buffer(bufs_t *bufs_p, void *ptr, int size, u8 type)
{
// the biggest element that can be saved with this function should be defined here
#define MAX_ELEMENT_SIZE sizeof(struct sockaddr_un)

    if (type == 0)
    {
        return 0;
    }

    u32 *off = get_buffer_offset(DATA_BUF_TYPE);
    if (off == NULL)
    {
        return -1;
    }

    if (*off > MAX_BUFFER_SIZE - MAX_ELEMENT_SIZE)
    {
        return 0;
    }

    if (bpf_probe_read(&(bufs_p->buf[*off]), 1, &type) != 0)
    {
        return 0;
    }

    *off += 1;

    if (*off > MAX_BUFFER_SIZE - MAX_ELEMENT_SIZE)
    {
        return 0;
    }

    if (bpf_probe_read(&(bufs_p->buf[*off]), size, ptr) == 0)
    {
        *off += size;
        set_buffer_offset(DATA_BUF_TYPE, *off);
        return size;
    }

    return 0;
}

This function takes four parameters: bufs_p, a pointer to the buffer structure (bufs_t)ptr, a pointer to the data to be saved; size, the size of the data to be saved; and type, the type of the data.

  • The function first checks if the type is zero. If it is, it returns 0, indicating that no data should be saved.

  • Next, it calls the get_buffer_offset function to retrieve the offset value associated with the data buffer type (DATA_BUF_TYPE). If the offset value is NULL, indicating that the buffer is not available, it returns -1.

  • The function then checks if the current offset value plus the maximum element size (MAX_ELEMENT_SIZE) exceeds the maximum buffer size (MAX_BUFFER_SIZE). If it does, it returns 0, indicating that there is not enough space in the buffer to save the data.

  • Next, it uses the bpf_probe_read function to read the type and save it to the buffer at the current offset position. If the bpf_probe_read operation fails, it returns 0, indicating that the data could not be saved.

  • The offset is then incremented by 1 to account for the saved type.

  • The function checks again if the current offset value plus the maximum element size exceeds the maximum buffer size. If it does, it returns 0, indicating that there is not enough space in the buffer to save the remaining data.

  • Finally, it uses the bpf_probe_read function again to read the data from the ptr pointer and save it to the buffer at the current offset position.

  • If the bpf_probe_read operation succeeds, the offset is incremented by size. The set_buffer_offset function is called to update the buffer offset value with the new offset.

The function returns the size if the data was successfully saved, or 0 if there was an error during the saving process.


  1. Defines a function events_perf_submit that is called with a struct pt_regs pointer ctx as an argument.
events_perf_submit(ctx);

11.1 Required definition

static __always_inline int events_perf_submit(struct pt_regs *ctx)
{
    bufs_t *bufs_p = get_buffer(DATA_BUF_TYPE);
    if (bufs_p == NULL)
        return -1;

    u32 *off = get_buffer_offset(DATA_BUF_TYPE);
    if (off == NULL)
        return -1;

    void *data = bufs_p->buf;
    int size = *off & (MAX_BUFFER_SIZE - 1);

    return bpf_perf_event_output(ctx, &sys_events, BPF_F_CURRENT_CPU, data, size);
}

Defines a function events_perf_submit that is called with a struct pt_regs pointer ctx as an argument.

Inside the function, it performs the following steps:

  • It retrieves a pointer to the buffer of type DATA_BUF_TYPE by calling the get_buffer function. If the pointer is NULL, indicating that the buffer is not found, it returns -1.
  • It retrieves the offset of the buffer by calling the get_buffer_offset function. If the offset pointer is NULL, it returns -1.
  • It assigns the pointer data to the buffer’s data array bufs_p->buf.
  • It calculates the size of the data based on the offset value, using a bitwise AND operation with (MAX_BUFFER_SIZE - 1).
  • Finally, it calls bpf_perf_event_output to submit the performance event, passing the ctx pointer, the sys_events map, the BPF_F_CURRENT_CPU flag, the data pointer, and the size as arguments. It returns the result of this function call. The code snippet also includes a call to events_perf_submit(ctx), which suggests that this function is invoked with the ctx argument at some point in the code.

11.2 Required definition

static __always_inline u32 *get_buffer_offset(int buf_type)
{
    return bpf_map_lookup_elem(&bufs_offset, &buf_type);
}

The get_buffer_offset function is used to retrieve the offset value associated with a specific buffer type. Here’s how the code works:

  • This function takes the buf_type parameter, which represents the type of the buffer. It then calls the bpf_map_lookup_elem function to retrieve the offset value associated with the specified buffer type from the bufs_offset map.

  • The bpf_map_lookup_elem function searches for the given key (&buf_type) in the map (bufs_offset) and returns a pointer to the associated value (offset). If the key is found, the function returns a non-NULL pointer to the offset value.

  • Otherwise, it returns NULL, indicating that the offset value for the specified buffer type is not found in the map.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top