Using DWARF to find call sites of inline functions

2023.02.07 | tags: programming · projects

Table of contents

  1. What is DWARF?
  2. Inline functions in DWARF
  3. Calculating call boundaries
  4. Finding the caller function
  5. inlinecall(1)

What is DWARF?

From the DWARF Debugging Standard’s documentation:

This document defines a format for describing programs to facilitate user source level debugging. This description can be generated by compilers, assemblers and linkage editors. It can be used by debuggers and other tools.

Debugging information entries (DIEs) are represented as a tree, one per compilation unit (CU). Each DIE has a tag (DW_TAG_*, see DWARF PDF Figure 1) denoting its class, and attributes (DW_AT_*, see DWARF PDF Figure 2) denoting its various characteristics, associated with it.

The next entry of a DIE is a child DIE. If a DIE doesn’t have children, the next entry is a “sibling”.

Consider the following structure:

CU1 (DW_TAG_compile_unit)
	func1 (DW_TAG_subprogram)
		DW_AT_foo
		DW_AT_bar
		func2 (DW_TAG_subprogram)
			DW_AT_foo
			DW_AT_bar
	myvar (DW_TAG_variable)
		DW_AT_foo
		DW_AT_bar
CU2 (DW_TAG_compile_unit)
	...

CU1 has func1 and myvar as chidren and CU2 as siblings. func1 has func2 as a child.

The debug file can be generated by compiling with the -g option. To dump DWARF info you can use readelf -wi <file> and dwarfdump <file>.

Inline functions in DWARF

DIEs of inline function declarations have the DW_TAG_subprogram tag and the DW_AT_inline attribute. DIEs of inline copies of this function will have the DW_TAG_inlined_subroutine tag.

Attributes inline copies can have include:

For example, if we dump the DWARF info for my FreeBSD kernel:

$ readelf -wi /usr/lib/debug/boot/kernel/kernel.debug > ~/foo

We find that vfs_freevnodes_dec gets inlined:

 <1><1dfa144>: Abbrev Number: 94 (DW_TAG_subprogram)
    <1dfa145>   DW_AT_name        : (indirect string) vfs_freevnodes_dec
    <1dfa149>   DW_AT_decl_file   : 1
    <1dfa14a>   DW_AT_decl_line   : 1447
    <1dfa14c>   DW_AT_prototyped  : 1
    <1dfa14c>   DW_AT_inline      : 1

Inline copies will have DW_AT_abstract_origin point to the declaration’s DIEs offset, in this case 0x1dfa144. If we look for 0x1dfa144, we do indeed find a few inline copies.

 <3><1dfe45e>: Abbrev Number: 24 (DW_TAG_inlined_subroutine)
    <1dfe45f>   DW_AT_abstract_origin: <0x1dfa144>
    <1dfe463>   DW_AT_low_pc      : 0xffffffff80cf701d
    <1dfe46b>   DW_AT_high_pc     : 0x38
    <1dfe46f>   DW_AT_call_file   : 1
    <1dfe470>   DW_AT_call_line   : 3458
    <1dfe472>   DW_AT_call_column : 5

 <3><1dfd2e2>: Abbrev Number: 58 (DW_TAG_inlined_subroutine)
    <1dfd2e3>   DW_AT_abstract_origin: <0x1dfa144>
    <1dfd2e7>   DW_AT_ranges      : 0x1f1290
    <1dfd2eb>   DW_AT_call_file   : 1
    <1dfd2ec>   DW_AT_call_line   : 3405
    <1dfd2ee>   DW_AT_call_column : 3

  ...there are more

As I described in the first section, a debug file may consist of multiple CUs that define the same inline function. We want treat each CU independently, that is, each inline copy is handled relative to its CU.

Calculating call boundaries

There are 2 cases we have to take care of when calculating the actual call boundaries of an inline copy.

The DIE has DW_AT_low_pc and DW_AT_high_pc

 <3><1dfe45e>: Abbrev Number: 24 (DW_TAG_inlined_subroutine)
    <1dfe45f>   DW_AT_abstract_origin: <0x1dfa144>
    <1dfe463>   DW_AT_low_pc      : 0xffffffff80cf701d
    <1dfe46b>   DW_AT_high_pc     : 0x38
    <1dfe46f>   DW_AT_call_file   : 1
    <1dfe470>   DW_AT_call_line   : 3458
    <1dfe472>   DW_AT_call_column : 5

In this case, the lower boundary is low_pc and the upper boundary is low_pc + high_pc, which, for the DIE shown in this example, the boundaries are:

low = 0xffffffff80cf701d
high = 0xffffffff80cf701d + 0x38 = 0xffffffff80cf7055 

The DIE has DW_AT_ranges

 <3><1dfd2e2>: Abbrev Number: 58 (DW_TAG_inlined_subroutine)
    <1dfd2e3>   DW_AT_abstract_origin: <0x1dfa144>
    <1dfd2e7>   DW_AT_ranges      : 0x1f1290
    <1dfd2eb>   DW_AT_call_file   : 1
    <1dfd2ec>   DW_AT_call_line   : 3405
    <1dfd2ee>   DW_AT_call_column : 3

This is a bit more involved. DW_AT_ranges refers to the .debug_ranges section found in debug files. We can dump the ranges:

$ dwarfdump -N /usr/lib/debug/boot/kernel/kernel.debug
.debug_ranges
 Ranges group 0:
                ranges: 3 at .debug_ranges offset 0 (0x00000000) (48 bytes)
                        [ 0] range entry    0x00000019 0x00000073
                        [ 1] range entry    0x0000007e 0x00000106
                        [ 2] range end      0x00000000 0x00000000
 Ranges group 1:
                ranges: 3 at .debug_ranges offset 48 (0x00000030) (48 bytes)
                        [ 0] range entry    0x00000022 0x0000006a
                        [ 1] range entry    0x0000007e 0x00000106
                        [ 2] range end      0x00000000 0x00000000
 ...

If we search for 0x1f1290 (the inline copy’s ranges), we find its range group:

 Ranges group 38809:
                ranges: 3 at .debug_ranges offset 2036368 (0x001f1290) (48 bytes)
                        [ 0] range entry    0x000025c8 0x000025f9
                        [ 1] range entry    0x0000261a 0x00002621
                        [ 2] range end      0x00000000 0x00000000

To get the call boundaries, we add each range entry’s boundaries to the DW_AT_low_pc of the root DIE of the CU. The root DIE is found programmatically, but I happen to know that in this case, the root DIE is:

 <0><1dee9fb>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <1dee9fc>   DW_AT_producer    : (indirect string) FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
    <1deea00>   DW_AT_language    : 12	(C99)
    <1deea02>   DW_AT_name        : (indirect string) /usr/src/sys/kern/vfs_subr.c
    <1deea06>   DW_AT_stmt_list   : 0x6cb448
    <1deea0a>   DW_AT_comp_dir    : (indirect string) /usr/obj/usr/src/amd64.amd64/sys/GENERIC
    <1deea0e>   DW_AT_low_pc      : 0xffffffff80cf4020
    <1deea16>   DW_AT_high_pc     : 0xde3d

Finally, we end up with the following boundaries:

low = 0xffffffff80cf4020 + 0x000025c8 = 0xffffffff80cf65e8
high = 0xffffffff80cf4020 + 0x000025f9 = 0xffffffff80cf6619

low = 0xffffffff80cf4020 + 0x0000261a = 0xffffffff80cf663a
high = 0xffffffff80cf4020 + 0x00002621 = 0xffffffff80cf6641

Finding the caller function

There are cases where we want to know which function an inline function is being called from. Because DWARF does not encode that information, we’ll have to scan ELF symbol tables.

$ readelf -s /usr/lib/debug/boot/kernel/kernel.debug

Since we know the inline copy’s boundaries, we only have to find which symbol’s boundaries the inline copy is inside. In other words, the following condition has to be met:

sym_lower_bound <= inline_lower_bound <= inline_upper_bound <= sym_upper_bound

Because searching through ELF symbol tables manually and doing calculations by hand would take too long, the best way to do this is programmatically through LibELF.

inlinecall(1)

I wrote a little program that does everything I talked about in this post automatically. It works on FreeBSD as-is, and most likely needs some modification to get it to work on other platforms.

The program takes an inline function name and a debug file as arguments:

inlinecall <function> <file>

And outputs the results in the following form:

cu1_func_declaration_file:line
	[low_bound - high_bound]	inline_copy1_file:line	caller_func()
	[low_bound - high_bound]	inline_copy2_file:line	caller_func()
	...
cu2_func_declaration_file:line
	...
...

For example:

$ inlinecall critical_enter /usr/lib/debug/boot/kernel/kernel.debug
/usr/src/sys/sys/systm.h:175
        [0xffffffff809eb51f - 0xffffffff809eb526]       /usr/src/sys/kern/kern_intr.c:1387      intr_event_handle()
/usr/src/sys/sys/systm.h:175
        [0xffffffff80a051f4 - 0xffffffff80a05208]       /usr/src/sys/kern/kern_malloc.c:431     malloc_type_freed()
        [0xffffffff80a0514c - 0xffffffff80a0515b]       /usr/src/sys/kern/kern_malloc.c:388     malloc_type_zone_allocated()
/usr/src/sys/sys/systm.h:175
        [0xffffffff80a263c4 - 0xffffffff80a263d3]       /usr/src/sys/kern/kern_resource.c:509   rtp_to_pri()
/usr/src/sys/sys/systm.h:175
        [0xffffffff80a28f59 - 0xffffffff80a28f5f]       /usr/src/sys/kern/kern_rmlock.c:775     _rm_assert()
        [0xffffffff80a29087 - 0xffffffff80a2908d]       /usr/src/sys/kern/kern_rmlock.c:801     _rm_assert()
        [0xffffffff80a29eb0 - 0xffffffff80a29eb7]       /usr/src/sys/kern/kern_rmlock.c:645     _rm_rlock_debug()
        [0xffffffff80a28c4b - 0xffffffff80a28c5a]       /usr/src/sys/kern/kern_rmlock.c:160     unlock_rm()
...more

Nested inline functions

inlinecall(1) resolves nested inline functions recursively:

$ ./inlinecall critical_enter /usr/lib/debug/boot/kernel/kernel.debug
/usr/src/sys/sys/systm.h:175
        [0xffffffff80a19d7a - 0xffffffff80a19d8b]       /usr/src/sys/sys/buf_ring.h:80  drbr_enqueue()
/usr/src/sys/sys/systm.h:175
        [0xffffffff80a6387a - 0xffffffff80a6388b]       /usr/src/sys/sys/buf_ring.h:80  drbr_enqueue()
...

Looking at the definition of critical_enter()’s caller function in buf_ring.h:

static __inline int
buf_ring_enqueue(struct buf_ring *br, void *buf)
{
	...
	critical_enter();
	...
}

Even though inlinecall(1) reported that critical_enter() is called from drbr_enqueue() in buf_ring.h:80, we see that it’s called from buf_ring_enqueue() instead, but buf_ring_enqueue() is also an inline function:

$ ./inlinecall buf_ring_enqueue /usr/lib/debug/boot/kernel/kernel.debug
/usr/src/sys/sys/buf_ring.h:63
        [0xffffffff80a19d7a - 0xffffffff80a19dcd]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80a19ddc - 0xffffffff80a19e18]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80a19e1f - 0xffffffff80a19e3b]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
/usr/src/sys/sys/buf_ring.h:63
        [0xffffffff80a6387a - 0xffffffff80a638cd]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80a638dc - 0xffffffff80a63918]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80a6391f - 0xffffffff80a6393b]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
/usr/src/sys/sys/buf_ring.h:63
        [0xffffffff80d1f81a - 0xffffffff80d1f879]       /usr/src/sys/net/ifq.c:57       drbr_enqueue()
        [0xffffffff80d1f91d - 0xffffffff80d1f964]       /usr/src/sys/net/ifq.c:57       drbr_enqueue()
        [0xffffffff80d1f9dd - 0xffffffff80d1f9f5]       /usr/src/sys/net/ifq.c:57       drbr_enqueue()
/usr/src/sys/sys/buf_ring.h:63
        [0xffffffff80ff07ba - 0xffffffff80ff080d]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80ff081c - 0xffffffff80ff0858]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()
        [0xffffffff80ff085f - 0xffffffff80ff087b]       /usr/src/sys/net/ifq.h:337      drbr_enqueue()

Here drbr_enqueue() is defined twice — once in ifq.h and once in ifq.c. The definition in ifq.h is also an inline definition, and in ifq.c it’s a non-inline one. We know that buf_ring_enqueue() is called from the non-inline version of drbr_enqueue(), otherwise inlinecall(1) would have reported the function which calls the inline version of drbr_enqueue().