Table of Contents
Quick background
DTrace is a framework that gives administrators and kernel developers the ability to observe kernel behavior in real time. DTrace has modules called “providers”, that perform a particular instrumentation in the kernel (and sometimes userland) using “probes”.
kinst is a new low-level DTrace provider co-authored by Christos Margiolis and Mark Johnston for the FreeBSD operating system, which allows the user to trace arbitrary instructions in kernel functions. It is part of the base system as of FreeBSD 14.0.
kinst probes take the form of kinst::<function>:<instruction>
, where <function>
is the kernel function to be traced, and <instruction>
is the offset to
the instruction, relative to the beginning of the function, and can be obtained
from the function’s disassembly. If the <instruction>
field is left empty,
kinst will trace all instructions in that function. Unlike
FBT, kinst can also trace the
entry and return points of inline functions (see Inline function
tracing).
The origin of the name is inspired from an early paper written by A. Tamches and B. Miller discussing a tracing tool they developed called “KernInst”.
Usage
Find the offset corresponding to the third instruction in vm_fault()
and trace it,
printing the contents of the RSI register:
# kgdb
(kgdb) disas /r vm_fault
Dump of assembler code for function vm_fault:
0xffffffff80f4e470 <+0>: 55 push %rbp
0xffffffff80f4e471 <+1>: 48 89 e5 mov %rsp,%rbp
0xffffffff80f4e474 <+4>: 41 57 push %r15
...
# dtrace -n 'kinst::vm_fault:4 {printf("%#x", regs[R_RSI]);}'
2 81500 vm_fault:4 0x827c56000
2 81500 vm_fault:4 0x827878000
2 81500 vm_fault:4 0x1fab9bef0000
2 81500 vm_fault:4 0xe16cf749000
0 81500 vm_fault:4 0x13587c366000
^C
Trace the return point of critical_enter()
, which is an inline function:
# dtrace -n 'kinst::critical_enter:return'
dtrace: description 'kinst::critical_enter:return' matched 130 probes
CPU ID FUNCTION:NAME
1 71024 spinlock_enter:53
0 71024 spinlock_enter:53
1 70992 uma_zalloc_arg:49
1 70925 malloc_type_zone_allocated:21
1 70994 uma_zfree_arg:365
1 70924 malloc_type_freed:21
1 71024 spinlock_enter:53
0 71024 spinlock_enter:53
0 70947 _epoch_enter_preempt:122
0 70949 _epoch_exit_preempt:28
^C
Inline function tracing
How it works
To trace inline functions, libdtrace makes use of the DWARF Debugging Standard, to detect if the function specified is an inline call. If it is, D syntax is transformed to create kinst probes for each of the inline copies found. All work is done in libdtrace, instead of kinst(4). This feature has been added to FreeBSD with this patch.
Contrary to how kinst expects a <function>:<instruction>
tuple to create
probes, for inline functions, <instruction>
is replaced by entry
and
return
.
Syntactic transformations
Suppose the user wants to trace a probe of the form:
kinst::<func>:<entry|return>
/<pred>/
{
<acts>
}
libdtrace sees that we have specified entry
or return
, instead of an
offset, which is what a regular kinst probe would look like, so it loops
through all loaded kernel modules and parses their DWARF and ELF info to see if
this function is an inline — if not, the probe is converted to an FBT one,
so that we don’t duplicate FBT’s functionality in kinst:
# dtrace -dn 'kinst::malloc:entry {exit(0);}'
fbt::malloc:entry
{
exit(0x0);
}
dtrace: description 'kinst::malloc:entry ' matched 1 probe
CPU ID FUNCTION:NAME
2 31144 malloc:entry
If the function however is an inline, libdtrace will find all calls refering to this function and create new probes for each one of the inline copies found.
# dtrace -dn 'kinst::cam_iosched_has_more_trim:entry { printf("\t%d\t%s", pid, execname); }'
kinst::cam_iosched_get_trim:13,
kinst::cam_iosched_next_bio:13,
kinst::cam_iosched_schedule:40
{
printf("\t%d\t%s", pid, execname);
}
dtrace: description 'kinst::cam_iosched_has_more_trim:entry ' matched 4 probes
CPU ID FUNCTION:NAME
0 81502 cam_iosched_schedule:40 2 clock
0 81501 cam_iosched_next_bio:13 2 clock
2 81502 cam_iosched_schedule:40 2 clock
1 81502 cam_iosched_next_bio:13 0 kernel
1 81503 cam_iosched_schedule:40 0 kernel
^C
There can also be both inline and non-inline definitions of the same function. In this case, kinst creates an additional FBT probe for the non-inline definition.
The -d
flag used in these examples to dump the D script after libdtrace has
applied syntactic transformations, has been added to DTrace in
commit 1e136a9cbd3a.
Heuristic for calculating the entry
and return
offsets
libdtrace reuses parts of the mechanism implemented in my inlinecall(1) program, which finds and prints all call sites of a given inline function:
$ ./inlinecall vm_page_mvqueue /usr/lib/debug/boot/kernel/kernel.debug
/usr/src/sys/vm/vm_page.c:4142
[0xffffffff80f91541 - 0xffffffff80f91599] /usr/src/sys/vm/vm_page.c:4195 vm_page_readahead_finish()
[0xffffffff80f915f5 - 0xffffffff80f91603] /usr/src/sys/vm/vm_page.c:4195 vm_page_readahead_finish()
[0xffffffff80f9163d - 0xffffffff80f916c2] /usr/src/sys/vm/vm_page.c:4184 vm_page_activate()
[0xffffffff80f916cd - 0xffffffff80f916e5] /usr/src/sys/vm/vm_page.c:4184 vm_page_activate()
[0xffffffff80f916fe - 0xffffffff80f91747] /usr/src/sys/vm/vm_page.c:4195 vm_page_deactivate()
[0xffffffff80f91750 - 0xffffffff80f91768] /usr/src/sys/vm/vm_page.c:4195 vm_page_deactivate()
[0xffffffff80f94a59 - 0xffffffff80f94aa9] /usr/src/sys/vm/vm_page.c:4195 vm_page_reclaim_contig_domain()
[0xffffffff80f94de4 - 0xffffffff80f94df9] /usr/src/sys/vm/vm_page.c:4195 vm_page_reclaim_contig_domain()
[0xffffffff80f9661e - 0xffffffff80f96667] /usr/src/sys/vm/vm_page.c:4202 vm_page_deactivate_noreuse()
[0xffffffff80f96670 - 0xffffffff80f96688] /usr/src/sys/vm/vm_page.c:4202 vm_page_deactivate_noreuse()
[0xffffffff80f9669e - 0xffffffff80f966ea] /usr/src/sys/vm/vm_page.c:4212 vm_page_launder()
[0xffffffff80f966f3 - 0xffffffff80f9670b] /usr/src/sys/vm/vm_page.c:4212 vm_page_launder()
[0xffffffff80f96d4c - 0xffffffff80f96dac] /usr/src/sys/vm/vm_page.c:4212 vm_page_advise()
[0xffffffff80f96dac - 0xffffffff80f96e07] /usr/src/sys/vm/vm_page.c:4202 vm_page_advise()
Most of the entries above appear twice but with different boundaries:
[0xffffffff80f9163d - 0xffffffff80f916c2] /usr/src/sys/vm/vm_page.c:4184 vm_page_activate()
[0xffffffff80f916cd - 0xffffffff80f916e5] /usr/src/sys/vm/vm_page.c:4184 vm_page_activate()
This means that the inline copy’s boundaries are split into more than one
parts, which can be caused by having early return
s inside the inline
function. By using this assumption, we can deduce that the entry point of the
function is 0xffffffff80f9163d
, and each of the upper boundaries are the
return points (i.e 0xffffffff80f916c2
and 0xffffffff80f916e5
), so we end up
with one entry
and two return
trace points.
However, this is not exactly true, because the final return address given by DWARF corresponds to the instruction after the actual return instruction. We can verify this in GDB using the two return addresses from the example above:
# kgdb
(kgdb) disas vm_page_activate
...
0xffffffff80f916c0 <+144>: jmp 0xffffffff80f91673 <vm_page_activate+67>
0xffffffff80f916c2 <+146>: add $0x8,%rsp <-- first address given by DWARF
...
0xffffffff80f916e0 <+176>: call 0xffffffff80bed360 <panic> <-- last instruction
<-- second address should be here
It turns out that 0xffffffff80f916e5
is, in fact, outside
vm_page_activate()
altogether!
(kgdb) x/i 0xffffffff80f916e5
0xffffffff80f916e5: data16 cs nopw 0x0(%rax,%rax,1)
After running a couple of tests, I came to the conclusion that there are two possible cases with return addresses:
- If the inline copy’s DIE has
DW_AT_lowpc
andDW_AT_highpc
set, the return address is always outside the inline function’s boundaries. - If the inline copy’s DIE has
DW_AT_ranges
set, only the last return address is outside the inline function’s boundaries. - Combining the two bullets above, if the return address of the inline function is the same as the upper boundary of the caller function, it’s outside both the inline copy’s and the caller function’s boundaries.
In order to fix this, we have to go one instruction back whenever we come across one of those 3 cases.
Finally the entry
offset is calculated as:
inline_bound_lo - caller_bound_lo
And the return
one, including the modifications (if any) discussed above, as:
inline_bound_hi - caller_bound_lo
These offsets are then used to create regular kinst probes of the form
kinst::<func>:<instruction>
, which is what kinst actually expects:
# dtrace -dn 'kinst::vm_page_mvqueue:entry,kinst::vm_page_mvqueue:return'
dtrace: description 'kinst::vm_page_mvqueue:entry,kinst::vm_page_mvqueue:return' matched 22 probes
kinst::vm_page_readahead_finish:33,
kinst::vm_page_readahead_finish:121,
kinst::vm_page_activate:13,
kinst::vm_page_deactivate:14,
kinst::vm_page_reclaim_contig_domain:1961,
kinst::vm_page_deactivate_noreuse:14,
kinst::vm_page_launder:14,
kinst::vm_page_advise:236,
kinst::vm_page_advise:332,
kinst::vm_page_readahead_finish:220,
kinst::vm_page_activate:146,
kinst::vm_page_activate:176,
kinst::vm_page_deactivate:87,
kinst::vm_page_deactivate:115,
kinst::vm_page_reclaim_contig_domain:2041,
kinst::vm_page_reclaim_contig_domain:2884,
kinst::vm_page_deactivate_noreuse:87,
kinst::vm_page_deactivate_noreuse:115,
kinst::vm_page_launder:90,
kinst::vm_page_launder:118,
kinst::vm_page_advise:330,
kinst::vm_page_advise:421
{
}
CPU ID FUNCTION:NAME
3 95381 vm_page_activate:13
3 95389 vm_page_activate:146
2 95381 vm_page_activate:13
2 95389 vm_page_activate:146
1 95387 vm_page_advise:332
1 95400 vm_page_advise:421
1 95387 vm_page_advise:332
1 95400 vm_page_advise:421
1 95387 vm_page_advise:332
^C