|  | .. _tracing: | 
|  |  | 
|  | ======= | 
|  | Tracing | 
|  | ======= | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | This document describes the tracing infrastructure in QEMU and how to use it | 
|  | for debugging, profiling, and observing execution. | 
|  |  | 
|  | Quickstart | 
|  | ========== | 
|  |  | 
|  | Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write`` | 
|  | events:: | 
|  |  | 
|  | $ qemu --trace "memory_region_ops_*" ... | 
|  | ... | 
|  | 719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1 | 
|  | 719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2 | 
|  |  | 
|  | This output comes from the "log" trace backend that is enabled by default when | 
|  | ``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified. | 
|  |  | 
|  | Multiple patterns can be specified by repeating the ``--trace`` option:: | 
|  |  | 
|  | $ qemu --trace "kvm_*" --trace "virtio_*" ... | 
|  |  | 
|  | When patterns are used frequently it is more convenient to store them in a | 
|  | file to avoid long command-line options:: | 
|  |  | 
|  | $ echo "memory_region_ops_*" >/tmp/events | 
|  | $ echo "kvm_*" >>/tmp/events | 
|  | $ qemu --trace events=/tmp/events ... | 
|  |  | 
|  | Trace events | 
|  | ============ | 
|  |  | 
|  | Sub-directory setup | 
|  | ------------------- | 
|  |  | 
|  | Each directory in the source tree can declare a set of trace events in a local | 
|  | "trace-events" file. All directories which contain "trace-events" files must be | 
|  | listed in the "trace_events_subdirs" variable in the top level meson.build | 
|  | file. During build, the "trace-events" file in each listed subdirectory will be | 
|  | processed by the "tracetool" script to generate code for the trace events. | 
|  |  | 
|  | The individual "trace-events" files are merged into a "trace-events-all" file, | 
|  | which is also installed into "/usr/share/qemu". | 
|  | This merged file is to be used by the "simpletrace.py" script to later analyse | 
|  | traces in the simpletrace data format. | 
|  |  | 
|  | The following files are automatically generated in <builddir>/trace/ during the | 
|  | build: | 
|  |  | 
|  | - trace-<subdir>.c - the trace event state declarations | 
|  | - trace-<subdir>.h - the trace event enums and probe functions | 
|  | - trace-dtrace-<subdir>.h - DTrace event probe specification | 
|  | - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration | 
|  | - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace) | 
|  | - trace-ust-<subdir>.h - UST event probe helper declarations | 
|  |  | 
|  | Here <subdir> is the sub-directory path with '/' replaced by '_'. For example, | 
|  | "accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c" | 
|  | becomes "trace-accel_kvm.c". | 
|  |  | 
|  | Source files in the source tree do not directly include generated files in | 
|  | "<builddir>/trace/". Instead they #include the local "trace.h" file, without | 
|  | any sub-directory path prefix. eg io/channel-buffer.c would do:: | 
|  |  | 
|  | #include "trace.h" | 
|  |  | 
|  | The "io/trace.h" file must be created manually with an #include of the | 
|  | corresponding "trace/trace-<subdir>.h" file that will be generated in the | 
|  | builddir:: | 
|  |  | 
|  | $ echo '#include "trace/trace-io.h"' >io/trace.h | 
|  |  | 
|  | While it is possible to include a trace.h file from outside a source file's own | 
|  | sub-directory, this is discouraged in general. It is strongly preferred that | 
|  | all events be declared directly in the sub-directory that uses them. The only | 
|  | exception is where there are some shared trace events defined in the top level | 
|  | directory trace-events file.  The top level directory generates trace files | 
|  | with a filename prefix of "trace/trace-root" instead of just "trace". This is | 
|  | to avoid ambiguity between a trace.h in the current directory, vs the top level | 
|  | directory. | 
|  |  | 
|  | Using trace events | 
|  | ------------------ | 
|  |  | 
|  | Trace events are invoked directly from source code like this:: | 
|  |  | 
|  | #include "trace.h"  /* needed for trace event prototype */ | 
|  |  | 
|  | void *qemu_vmalloc(size_t size) | 
|  | { | 
|  | void *ptr; | 
|  | size_t align = QEMU_VMALLOC_ALIGN; | 
|  |  | 
|  | if (size < align) { | 
|  | align = getpagesize(); | 
|  | } | 
|  | ptr = qemu_memalign(align, size); | 
|  | trace_qemu_vmalloc(size, ptr); | 
|  | return ptr; | 
|  | } | 
|  |  | 
|  | Declaring trace events | 
|  | ---------------------- | 
|  |  | 
|  | The "tracetool" script produces the trace.h header file which is included by | 
|  | every source file that uses trace events.  Since many source files include | 
|  | trace.h, it uses a minimum of types and other header files included to keep the | 
|  | namespace clean and compile times and dependencies down. | 
|  |  | 
|  | Trace events should use types as follows: | 
|  |  | 
|  | * Use stdint.h types for fixed-size types.  Most offsets and guest memory | 
|  | addresses are best represented with uint32_t or uint64_t.  Use fixed-size | 
|  | types over primitive types whose size may change depending on the host | 
|  | (32-bit versus 64-bit) so trace events don't truncate values or break | 
|  | the build. | 
|  |  | 
|  | * Use void * for pointers to structs or for arrays.  The trace.h header | 
|  | cannot include all user-defined struct declarations and it is therefore | 
|  | necessary to use void * for pointers to structs. | 
|  |  | 
|  | * For everything else, use primitive scalar types (char, int, long) with the | 
|  | appropriate signedness. | 
|  |  | 
|  | * Avoid floating point types (float and double) because SystemTap does not | 
|  | support them.  In most cases it is possible to round to an integer type | 
|  | instead.  This may require scaling the value first by multiplying it by 1000 | 
|  | or the like when digits after the decimal point need to be preserved. | 
|  |  | 
|  | Format strings should reflect the types defined in the trace event.  Take | 
|  | special care to use PRId64 and PRIu64 for int64_t and uint64_t types, | 
|  | respectively.  This ensures portability between 32- and 64-bit platforms. | 
|  | Format strings must not end with a newline character.  It is the responsibility | 
|  | of backends to adapt line ending for proper logging. | 
|  |  | 
|  | Each event declaration will start with the event name, then its arguments, | 
|  | finally a format string for pretty-printing. For example:: | 
|  |  | 
|  | qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" | 
|  | qemu_vfree(void *ptr) "ptr %p" | 
|  |  | 
|  |  | 
|  | Hints for adding new trace events | 
|  | --------------------------------- | 
|  |  | 
|  | 1. Trace state changes in the code.  Interesting points in the code usually | 
|  | involve a state change like starting, stopping, allocating, freeing.  State | 
|  | changes are good trace events because they can be used to understand the | 
|  | execution of the system. | 
|  |  | 
|  | 2. Trace guest operations.  Guest I/O accesses like reading device registers | 
|  | are good trace events because they can be used to understand guest | 
|  | interactions. | 
|  |  | 
|  | 3. Use correlator fields so the context of an individual line of trace output | 
|  | can be understood.  For example, trace the pointer returned by malloc and | 
|  | used as an argument to free.  This way mallocs and frees can be matched up. | 
|  | Trace events with no context are not very useful. | 
|  |  | 
|  | 4. Name trace events after their function.  If there are multiple trace events | 
|  | in one function, append a unique distinguisher at the end of the name. | 
|  |  | 
|  | Generic interface and monitor commands | 
|  | ====================================== | 
|  |  | 
|  | You can programmatically query and control the state of trace events through a | 
|  | backend-agnostic interface provided by the header "trace/control.h". | 
|  |  | 
|  | Note that some of the backends do not provide an implementation for some parts | 
|  | of this interface, in which case QEMU will just print a warning (please refer to | 
|  | header "trace/control.h" to see which routines are backend-dependent). | 
|  |  | 
|  | The state of events can also be queried and modified through monitor commands: | 
|  |  | 
|  | * ``info trace-events`` | 
|  | View available trace events and their state.  State 1 means enabled, state 0 | 
|  | means disabled. | 
|  |  | 
|  | * ``trace-event NAME on|off`` | 
|  | Enable/disable a given trace event or a group of events (using wildcards). | 
|  |  | 
|  | The "--trace events=<file>" command line argument can be used to enable the | 
|  | events listed in <file> from the very beginning of the program. This file must | 
|  | contain one event name per line. | 
|  |  | 
|  | If a line in the "--trace events=<file>" file begins with a '-', the trace event | 
|  | will be disabled instead of enabled.  This is useful when a wildcard was used | 
|  | to enable an entire family of events but one noisy event needs to be disabled. | 
|  |  | 
|  | Wildcard matching is supported in both the monitor command "trace-event" and the | 
|  | events list file. That means you can enable/disable the events having a common | 
|  | prefix in a batch. For example, virtio-blk trace events could be enabled using | 
|  | the following monitor command:: | 
|  |  | 
|  | trace-event virtio_blk_* on | 
|  |  | 
|  | Trace backends | 
|  | ============== | 
|  |  | 
|  | The "tracetool" script automates tedious trace event code generation and also | 
|  | keeps the trace event declarations independent of the trace backend.  The trace | 
|  | events are not tightly coupled to a specific trace backend, such as LTTng or | 
|  | SystemTap.  Support for trace backends can be added by extending the "tracetool" | 
|  | script. | 
|  |  | 
|  | The trace backends are chosen at configure time:: | 
|  |  | 
|  | ./configure --enable-trace-backends=simple,dtrace | 
|  |  | 
|  | For a list of supported trace backends, try ./configure --help or see below. | 
|  | If multiple backends are enabled, the trace is sent to them all. | 
|  |  | 
|  | If no backends are explicitly selected, configure will default to the | 
|  | "log" backend. | 
|  |  | 
|  | The following subsections describe the supported trace backends. | 
|  |  | 
|  | Nop | 
|  | --- | 
|  |  | 
|  | The "nop" backend generates empty trace event functions so that the compiler | 
|  | can optimize out trace events completely.  This imposes no performance | 
|  | penalty. | 
|  |  | 
|  | Note that regardless of the selected trace backend, events with the "disable" | 
|  | property will be generated with the "nop" backend. | 
|  |  | 
|  | Log | 
|  | --- | 
|  |  | 
|  | The "log" backend sends trace events directly to standard error.  This | 
|  | effectively turns trace events into debug printfs. | 
|  |  | 
|  | This is the simplest backend and can be used together with existing code that | 
|  | uses DPRINTF(). | 
|  |  | 
|  | The -msg timestamp=on|off command-line option controls whether or not to print | 
|  | the tid/timestamp prefix for each trace event. | 
|  |  | 
|  | Simpletrace | 
|  | ----------- | 
|  |  | 
|  | The "simple" backend writes binary trace logs to a file from a thread, making | 
|  | it lower overhead than the "log" backend. A Python API is available for writing | 
|  | offline trace file analysis scripts. It may not be as powerful as | 
|  | platform-specific or third-party trace backends but it is portable and has no | 
|  | special library dependencies. | 
|  |  | 
|  | Monitor commands | 
|  | ~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | * ``trace-file on|off|flush|set <path>`` | 
|  | Enable/disable/flush the trace file or set the trace file name. | 
|  |  | 
|  | Analyzing trace files | 
|  | ~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | The "simple" backend produces binary trace files that can be formatted with the | 
|  | simpletrace.py script.  The script takes the "trace-events-all" file and the | 
|  | binary trace:: | 
|  |  | 
|  | ./scripts/simpletrace.py trace-events-all trace-12345 | 
|  |  | 
|  | You must ensure that the same "trace-events-all" file was used to build QEMU, | 
|  | otherwise trace event declarations may have changed and output will not be | 
|  | consistent. | 
|  |  | 
|  | Ftrace | 
|  | ------ | 
|  |  | 
|  | The "ftrace" backend writes trace data to ftrace marker. This effectively | 
|  | sends trace events to ftrace ring buffer, and you can compare qemu trace | 
|  | data and kernel(especially kvm.ko when using KVM) trace data. | 
|  |  | 
|  | if you use KVM, enable kvm events in ftrace:: | 
|  |  | 
|  | # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable | 
|  |  | 
|  | After running qemu by root user, you can get the trace:: | 
|  |  | 
|  | # cat /sys/kernel/debug/tracing/trace | 
|  |  | 
|  | Restriction: "ftrace" backend is restricted to Linux only. | 
|  |  | 
|  | Syslog | 
|  | ------ | 
|  |  | 
|  | The "syslog" backend sends trace events using the POSIX syslog API. The log | 
|  | is opened specifying the LOG_DAEMON facility and LOG_PID option (so events | 
|  | are tagged with the pid of the particular QEMU process that generated | 
|  | them). All events are logged at LOG_INFO level. | 
|  |  | 
|  | NOTE: syslog may squash duplicate consecutive trace events and apply rate | 
|  | limiting. | 
|  |  | 
|  | Restriction: "syslog" backend is restricted to POSIX compliant OS. | 
|  |  | 
|  | LTTng Userspace Tracer | 
|  | ---------------------- | 
|  |  | 
|  | The "ust" backend uses the LTTng Userspace Tracer library.  There are no | 
|  | monitor commands built into QEMU, instead UST utilities should be used to list, | 
|  | enable/disable, and dump traces. | 
|  |  | 
|  | Package lttng-tools is required for userspace tracing. You must ensure that the | 
|  | current user belongs to the "tracing" group, or manually launch the | 
|  | lttng-sessiond daemon for the current user prior to running any instance of | 
|  | QEMU. | 
|  |  | 
|  | While running an instrumented QEMU, LTTng should be able to list all available | 
|  | events:: | 
|  |  | 
|  | lttng list -u | 
|  |  | 
|  | Create tracing session:: | 
|  |  | 
|  | lttng create mysession | 
|  |  | 
|  | Enable events:: | 
|  |  | 
|  | lttng enable-event qemu:g_malloc -u | 
|  |  | 
|  | Where the events can either be a comma-separated list of events, or "-a" to | 
|  | enable all tracepoint events. Start and stop tracing as needed:: | 
|  |  | 
|  | lttng start | 
|  | lttng stop | 
|  |  | 
|  | View the trace:: | 
|  |  | 
|  | lttng view | 
|  |  | 
|  | Destroy tracing session:: | 
|  |  | 
|  | lttng destroy | 
|  |  | 
|  | Babeltrace can be used at any later time to view the trace:: | 
|  |  | 
|  | babeltrace $HOME/lttng-traces/mysession-<date>-<time> | 
|  |  | 
|  | SystemTap | 
|  | --------- | 
|  |  | 
|  | The "dtrace" backend uses DTrace sdt probes but has only been tested with | 
|  | SystemTap.  When SystemTap support is detected a .stp file with wrapper probes | 
|  | is generated to make use in scripts more convenient.  This step can also be | 
|  | performed manually after a build in order to change the binary name in the .stp | 
|  | probes:: | 
|  |  | 
|  | scripts/tracetool.py --backends=dtrace --format=stap \ | 
|  | --binary path/to/qemu-binary \ | 
|  | --probe-prefix qemu.system.x86_64 \ | 
|  | --group=all \ | 
|  | trace-events-all \ | 
|  | qemu.stp | 
|  |  | 
|  | To facilitate simple usage of systemtap where there merely needs to be printf | 
|  | logging of certain probes, a helper script "qemu-trace-stap" is provided. | 
|  | Consult its manual page for guidance on its usage. | 
|  |  | 
|  | Trace event properties | 
|  | ====================== | 
|  |  | 
|  | Each event in the "trace-events-all" file can be prefixed with a space-separated | 
|  | list of zero or more of the following event properties. | 
|  |  | 
|  | "disable" | 
|  | --------- | 
|  |  | 
|  | If a specific trace event is going to be invoked a huge number of times, this | 
|  | might have a noticeable performance impact even when the event is | 
|  | programmatically disabled. | 
|  |  | 
|  | In this case you should declare such event with the "disable" property. This | 
|  | will effectively disable the event at compile time (by using the "nop" backend), | 
|  | thus having no performance impact at all on regular builds (i.e., unless you | 
|  | edit the "trace-events-all" file). | 
|  |  | 
|  | In addition, there might be cases where relatively complex computations must be | 
|  | performed to generate values that are only used as arguments for a trace | 
|  | function. In these cases you can use 'trace_event_get_state_backends()' to | 
|  | guard such computations, so they are skipped if the event has been either | 
|  | compile-time disabled or run-time disabled. If the event is compile-time | 
|  | disabled, this check will have no performance impact. | 
|  |  | 
|  | :: | 
|  |  | 
|  | #include "trace.h"  /* needed for trace event prototype */ | 
|  |  | 
|  | void *qemu_vmalloc(size_t size) | 
|  | { | 
|  | void *ptr; | 
|  | size_t align = QEMU_VMALLOC_ALIGN; | 
|  |  | 
|  | if (size < align) { | 
|  | align = getpagesize(); | 
|  | } | 
|  | ptr = qemu_memalign(align, size); | 
|  | if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) { | 
|  | void *complex; | 
|  | /* some complex computations to produce the 'complex' value */ | 
|  | trace_qemu_vmalloc(size, ptr, complex); | 
|  | } | 
|  | return ptr; | 
|  | } | 
|  |  |