| .. _tracing: | 
 |  | 
 | ======= | 
 | Tracing | 
 | ======= | 
 |  | 
 | Introduction | 
 | ============ | 
 |  | 
 | This document describes the tracing infrastructure in QEMU and how to use it | 
 | for debugging, profiling, and observing execution. | 
 |  | 
 | Quickstart | 
 | ========== | 
 |  | 
 | Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write`` | 
 | events:: | 
 |  | 
 |     $ qemu --trace "memory_region_ops_*" ... | 
 |     ... | 
 |     719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1 | 
 |     719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2 | 
 |  | 
 | This output comes from the "log" trace backend that is enabled by default when | 
 | ``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified. | 
 |  | 
 | Multiple patterns can be specified by repeating the ``--trace`` option:: | 
 |  | 
 |     $ qemu --trace "kvm_*" --trace "virtio_*" ... | 
 |  | 
 | When patterns are used frequently it is more convenient to store them in a | 
 | file to avoid long command-line options:: | 
 |  | 
 |     $ echo "memory_region_ops_*" >/tmp/events | 
 |     $ echo "kvm_*" >>/tmp/events | 
 |     $ qemu --trace events=/tmp/events ... | 
 |  | 
 | Trace events | 
 | ============ | 
 |  | 
 | Sub-directory setup | 
 | ------------------- | 
 |  | 
 | Each directory in the source tree can declare a set of trace events in a local | 
 | "trace-events" file. All directories which contain "trace-events" files must be | 
 | listed in the "trace_events_subdirs" variable in the top level meson.build | 
 | file. During build, the "trace-events" file in each listed subdirectory will be | 
 | processed by the "tracetool" script to generate code for the trace events. | 
 |  | 
 | The individual "trace-events" files are merged into a "trace-events-all" file, | 
 | which is also installed into "/usr/share/qemu". | 
 | This merged file is to be used by the "simpletrace.py" script to later analyse | 
 | traces in the simpletrace data format. | 
 |  | 
 | The following files are automatically generated in <builddir>/trace/ during the | 
 | build: | 
 |  | 
 |  - trace-<subdir>.c - the trace event state declarations | 
 |  - trace-<subdir>.h - the trace event enums and probe functions | 
 |  - trace-dtrace-<subdir>.h - DTrace event probe specification | 
 |  - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration | 
 |  - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace) | 
 |  - trace-ust-<subdir>.h - UST event probe helper declarations | 
 |  | 
 | Here <subdir> is the sub-directory path with '/' replaced by '_'. For example, | 
 | "accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c" | 
 | becomes "trace-accel_kvm.c". | 
 |  | 
 | Source files in the source tree do not directly include generated files in | 
 | "<builddir>/trace/". Instead they #include the local "trace.h" file, without | 
 | any sub-directory path prefix. eg io/channel-buffer.c would do:: | 
 |  | 
 |   #include "trace.h" | 
 |  | 
 | The "io/trace.h" file must be created manually with an #include of the | 
 | corresponding "trace/trace-<subdir>.h" file that will be generated in the | 
 | builddir:: | 
 |  | 
 |   $ echo '#include "trace/trace-io.h"' >io/trace.h | 
 |  | 
 | While it is possible to include a trace.h file from outside a source file's own | 
 | sub-directory, this is discouraged in general. It is strongly preferred that | 
 | all events be declared directly in the sub-directory that uses them. The only | 
 | exception is where there are some shared trace events defined in the top level | 
 | directory trace-events file.  The top level directory generates trace files | 
 | with a filename prefix of "trace/trace-root" instead of just "trace". This is | 
 | to avoid ambiguity between a trace.h in the current directory, vs the top level | 
 | directory. | 
 |  | 
 | Using trace events | 
 | ------------------ | 
 |  | 
 | Trace events are invoked directly from source code like this:: | 
 |  | 
 |     #include "trace.h"  /* needed for trace event prototype */ | 
 |      | 
 |     void *qemu_vmalloc(size_t size) | 
 |     { | 
 |         void *ptr; | 
 |         size_t align = QEMU_VMALLOC_ALIGN; | 
 |       | 
 |         if (size < align) { | 
 |             align = getpagesize(); | 
 |         } | 
 |         ptr = qemu_memalign(align, size); | 
 |         trace_qemu_vmalloc(size, ptr); | 
 |         return ptr; | 
 |     } | 
 |  | 
 | Declaring trace events | 
 | ---------------------- | 
 |  | 
 | The "tracetool" script produces the trace.h header file which is included by | 
 | every source file that uses trace events.  Since many source files include | 
 | trace.h, it uses a minimum of types and other header files included to keep the | 
 | namespace clean and compile times and dependencies down. | 
 |  | 
 | Trace events should use types as follows: | 
 |  | 
 |  * Use stdint.h types for fixed-size types.  Most offsets and guest memory | 
 |    addresses are best represented with uint32_t or uint64_t.  Use fixed-size | 
 |    types over primitive types whose size may change depending on the host | 
 |    (32-bit versus 64-bit) so trace events don't truncate values or break | 
 |    the build. | 
 |  | 
 |  * Use void * for pointers to structs or for arrays.  The trace.h header | 
 |    cannot include all user-defined struct declarations and it is therefore | 
 |    necessary to use void * for pointers to structs. | 
 |  | 
 |  * For everything else, use primitive scalar types (char, int, long) with the | 
 |    appropriate signedness. | 
 |  | 
 |  * Avoid floating point types (float and double) because SystemTap does not | 
 |    support them.  In most cases it is possible to round to an integer type | 
 |    instead.  This may require scaling the value first by multiplying it by 1000 | 
 |    or the like when digits after the decimal point need to be preserved. | 
 |  | 
 | Format strings should reflect the types defined in the trace event.  Take | 
 | special care to use PRId64 and PRIu64 for int64_t and uint64_t types, | 
 | respectively.  This ensures portability between 32- and 64-bit platforms. | 
 | Format strings must not end with a newline character.  It is the responsibility | 
 | of backends to adapt line ending for proper logging. | 
 |  | 
 | Each event declaration will start with the event name, then its arguments, | 
 | finally a format string for pretty-printing. For example:: | 
 |  | 
 |     qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" | 
 |     qemu_vfree(void *ptr) "ptr %p" | 
 |  | 
 |  | 
 | Hints for adding new trace events | 
 | --------------------------------- | 
 |  | 
 | 1. Trace state changes in the code.  Interesting points in the code usually | 
 |    involve a state change like starting, stopping, allocating, freeing.  State | 
 |    changes are good trace events because they can be used to understand the | 
 |    execution of the system. | 
 |  | 
 | 2. Trace guest operations.  Guest I/O accesses like reading device registers | 
 |    are good trace events because they can be used to understand guest | 
 |    interactions. | 
 |  | 
 | 3. Use correlator fields so the context of an individual line of trace output | 
 |    can be understood.  For example, trace the pointer returned by malloc and | 
 |    used as an argument to free.  This way mallocs and frees can be matched up. | 
 |    Trace events with no context are not very useful. | 
 |  | 
 | 4. Name trace events after their function.  If there are multiple trace events | 
 |    in one function, append a unique distinguisher at the end of the name. | 
 |  | 
 | Generic interface and monitor commands | 
 | ====================================== | 
 |  | 
 | You can programmatically query and control the state of trace events through a | 
 | backend-agnostic interface provided by the header "trace/control.h". | 
 |  | 
 | Note that some of the backends do not provide an implementation for some parts | 
 | of this interface, in which case QEMU will just print a warning (please refer to | 
 | header "trace/control.h" to see which routines are backend-dependent). | 
 |  | 
 | The state of events can also be queried and modified through monitor commands: | 
 |  | 
 | * ``info trace-events`` | 
 |   View available trace events and their state.  State 1 means enabled, state 0 | 
 |   means disabled. | 
 |  | 
 | * ``trace-event NAME on|off`` | 
 |   Enable/disable a given trace event or a group of events (using wildcards). | 
 |  | 
 | The "--trace events=<file>" command line argument can be used to enable the | 
 | events listed in <file> from the very beginning of the program. This file must | 
 | contain one event name per line. | 
 |  | 
 | If a line in the "--trace events=<file>" file begins with a '-', the trace event | 
 | will be disabled instead of enabled.  This is useful when a wildcard was used | 
 | to enable an entire family of events but one noisy event needs to be disabled. | 
 |  | 
 | Wildcard matching is supported in both the monitor command "trace-event" and the | 
 | events list file. That means you can enable/disable the events having a common | 
 | prefix in a batch. For example, virtio-blk trace events could be enabled using | 
 | the following monitor command:: | 
 |  | 
 |     trace-event virtio_blk_* on | 
 |  | 
 | Trace backends | 
 | ============== | 
 |  | 
 | The "tracetool" script automates tedious trace event code generation and also | 
 | keeps the trace event declarations independent of the trace backend.  The trace | 
 | events are not tightly coupled to a specific trace backend, such as LTTng or | 
 | SystemTap.  Support for trace backends can be added by extending the "tracetool" | 
 | script. | 
 |  | 
 | The trace backends are chosen at configure time:: | 
 |  | 
 |     ./configure --enable-trace-backends=simple,dtrace | 
 |  | 
 | For a list of supported trace backends, try ./configure --help or see below. | 
 | If multiple backends are enabled, the trace is sent to them all. | 
 |  | 
 | If no backends are explicitly selected, configure will default to the | 
 | "log" backend. | 
 |  | 
 | The following subsections describe the supported trace backends. | 
 |  | 
 | Nop | 
 | --- | 
 |  | 
 | The "nop" backend generates empty trace event functions so that the compiler | 
 | can optimize out trace events completely.  This imposes no performance | 
 | penalty. | 
 |  | 
 | Note that regardless of the selected trace backend, events with the "disable" | 
 | property will be generated with the "nop" backend. | 
 |  | 
 | Log | 
 | --- | 
 |  | 
 | The "log" backend sends trace events directly to standard error.  This | 
 | effectively turns trace events into debug printfs. | 
 |  | 
 | This is the simplest backend and can be used together with existing code that | 
 | uses DPRINTF(). | 
 |  | 
 | The -msg timestamp=on|off command-line option controls whether or not to print | 
 | the tid/timestamp prefix for each trace event. | 
 |  | 
 | Simpletrace | 
 | ----------- | 
 |  | 
 | The "simple" backend writes binary trace logs to a file from a thread, making | 
 | it lower overhead than the "log" backend. A Python API is available for writing | 
 | offline trace file analysis scripts. It may not be as powerful as | 
 | platform-specific or third-party trace backends but it is portable and has no | 
 | special library dependencies. | 
 |  | 
 | Monitor commands | 
 | ~~~~~~~~~~~~~~~~ | 
 |  | 
 | * ``trace-file on|off|flush|set <path>`` | 
 |   Enable/disable/flush the trace file or set the trace file name. | 
 |  | 
 | Analyzing trace files | 
 | ~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | The "simple" backend produces binary trace files that can be formatted with the | 
 | simpletrace.py script.  The script takes the "trace-events-all" file and the | 
 | binary trace:: | 
 |  | 
 |     ./scripts/simpletrace.py trace-events-all trace-12345 | 
 |  | 
 | You must ensure that the same "trace-events-all" file was used to build QEMU, | 
 | otherwise trace event declarations may have changed and output will not be | 
 | consistent. | 
 |  | 
 | Ftrace | 
 | ------ | 
 |  | 
 | The "ftrace" backend writes trace data to ftrace marker. This effectively | 
 | sends trace events to ftrace ring buffer, and you can compare qemu trace | 
 | data and kernel(especially kvm.ko when using KVM) trace data. | 
 |  | 
 | if you use KVM, enable kvm events in ftrace:: | 
 |  | 
 |    # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable | 
 |  | 
 | After running qemu by root user, you can get the trace:: | 
 |  | 
 |    # cat /sys/kernel/debug/tracing/trace | 
 |  | 
 | Restriction: "ftrace" backend is restricted to Linux only. | 
 |  | 
 | Syslog | 
 | ------ | 
 |  | 
 | The "syslog" backend sends trace events using the POSIX syslog API. The log | 
 | is opened specifying the LOG_DAEMON facility and LOG_PID option (so events | 
 | are tagged with the pid of the particular QEMU process that generated | 
 | them). All events are logged at LOG_INFO level. | 
 |  | 
 | NOTE: syslog may squash duplicate consecutive trace events and apply rate | 
 |       limiting. | 
 |  | 
 | Restriction: "syslog" backend is restricted to POSIX compliant OS. | 
 |  | 
 | LTTng Userspace Tracer | 
 | ---------------------- | 
 |  | 
 | The "ust" backend uses the LTTng Userspace Tracer library.  There are no | 
 | monitor commands built into QEMU, instead UST utilities should be used to list, | 
 | enable/disable, and dump traces. | 
 |  | 
 | Package lttng-tools is required for userspace tracing. You must ensure that the | 
 | current user belongs to the "tracing" group, or manually launch the | 
 | lttng-sessiond daemon for the current user prior to running any instance of | 
 | QEMU. | 
 |  | 
 | While running an instrumented QEMU, LTTng should be able to list all available | 
 | events:: | 
 |  | 
 |     lttng list -u | 
 |  | 
 | Create tracing session:: | 
 |  | 
 |     lttng create mysession | 
 |  | 
 | Enable events:: | 
 |  | 
 |     lttng enable-event qemu:g_malloc -u | 
 |  | 
 | Where the events can either be a comma-separated list of events, or "-a" to | 
 | enable all tracepoint events. Start and stop tracing as needed:: | 
 |  | 
 |     lttng start | 
 |     lttng stop | 
 |  | 
 | View the trace:: | 
 |  | 
 |     lttng view | 
 |  | 
 | Destroy tracing session:: | 
 |  | 
 |     lttng destroy | 
 |  | 
 | Babeltrace can be used at any later time to view the trace:: | 
 |  | 
 |     babeltrace $HOME/lttng-traces/mysession-<date>-<time> | 
 |  | 
 | SystemTap | 
 | --------- | 
 |  | 
 | The "dtrace" backend uses DTrace sdt probes but has only been tested with | 
 | SystemTap.  When SystemTap support is detected a .stp file with wrapper probes | 
 | is generated to make use in scripts more convenient.  This step can also be | 
 | performed manually after a build in order to change the binary name in the .stp | 
 | probes:: | 
 |  | 
 |     scripts/tracetool.py --backends=dtrace --format=stap \ | 
 |                          --binary path/to/qemu-binary \ | 
 |                          --probe-prefix qemu.system.x86_64 \ | 
 |                          --group=all \ | 
 |                          trace-events-all \ | 
 |                          qemu.stp | 
 |  | 
 | To facilitate simple usage of systemtap where there merely needs to be printf | 
 | logging of certain probes, a helper script "qemu-trace-stap" is provided. | 
 | Consult its manual page for guidance on its usage. | 
 |  | 
 | Trace event properties | 
 | ====================== | 
 |  | 
 | Each event in the "trace-events-all" file can be prefixed with a space-separated | 
 | list of zero or more of the following event properties. | 
 |  | 
 | "disable" | 
 | --------- | 
 |  | 
 | If a specific trace event is going to be invoked a huge number of times, this | 
 | might have a noticeable performance impact even when the event is | 
 | programmatically disabled. | 
 |  | 
 | In this case you should declare such event with the "disable" property. This | 
 | will effectively disable the event at compile time (by using the "nop" backend), | 
 | thus having no performance impact at all on regular builds (i.e., unless you | 
 | edit the "trace-events-all" file). | 
 |  | 
 | In addition, there might be cases where relatively complex computations must be | 
 | performed to generate values that are only used as arguments for a trace | 
 | function. In these cases you can use 'trace_event_get_state_backends()' to | 
 | guard such computations, so they are skipped if the event has been either | 
 | compile-time disabled or run-time disabled. If the event is compile-time | 
 | disabled, this check will have no performance impact. | 
 |  | 
 | :: | 
 |  | 
 |     #include "trace.h"  /* needed for trace event prototype */ | 
 |      | 
 |     void *qemu_vmalloc(size_t size) | 
 |     { | 
 |         void *ptr; | 
 |         size_t align = QEMU_VMALLOC_ALIGN; | 
 |      | 
 |         if (size < align) { | 
 |             align = getpagesize(); | 
 |         } | 
 |         ptr = qemu_memalign(align, size); | 
 |         if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) { | 
 |             void *complex; | 
 |             /* some complex computations to produce the 'complex' value */ | 
 |             trace_qemu_vmalloc(size, ptr, complex); | 
 |         } | 
 |         return ptr; | 
 |     } | 
 |  |