Design of

NPTL Trace System


Version 0.3

2004/04/30


A set of ideas to add a tracing mechanism to the Native Posix Threads Library as a debugging facility.


Document Change Control

Change Summary

Version Number Date of Revision Document status
0.1 2004/03/29 First draft
0.2 2004/04/05 Draft
0.3 2004/04/30 Guidelines draft

Approvers

Reviewers (Candidates)

Table of Contents

Introduction

Currently it is only possible to observe and debug runtime execution of NPTL-based multithreaded programs by using gdb, or any debugging tool relying on breakpoints and static states reading. Dealing with multithreading, one might find such information irrelevant and rather get a trace of what happened during execution without interfering with normal execution. It is of course possible to make a program trace itself, but one can't get feedback from NPTL internal synchronization mechanisms without modifying it.

This document attempts to describe ideas to add such tracing mechanism to the NPTL, from multithreaded programs as well as NPTL maintainers' point of view.

Inspiration

This document is inspired from other NPTL trace mechanisms proposals, trying to take every needs and wishes into account, but taking a few definitive decisions nevertheless. It is written in the hope to get comments and to become a design documentation before actually implementing the tracing system, as such a system seems actually wanted, being part of the official branch or as a patch.

Goals

Tracing the NPTL is relevant in case of a multithreaded program using it is malfunctioning, or when a developer wants to check whether threads behave and synchronize as expected. Therefore, traces should help checking threads concurrency, detecting unwanted behavior or common problems like deadlocks. Traces should also give information to track back the cause of problems that are detected. Finally traces should help checking whether the NPTL itself behaves as expected.

Requirements to achieve these goals are :

Proposed method

Three parts

There are three entities, three parts to study for building up a NPTL tracing system :

User's point of view

Needs

End-users willing to get an execution trace of their multithreaded programs should put in place :

Usage

When wanting to trace a multithreaded program by dynamically link it with a "trace-enabled" NPTL, the user should define an environment variable, which indicates the trace handler to use, optional parameters for it, trace verbosity, and possibly optional parameters for the NPTL tracing mechanism itself (only in prevision for future extensions.)

Traced events

It should be distinguishable if traces come from the NPTL, from the trace handler or from the traced program. Traces should indicate :

Example

A program could be traced with different kinds of handlers. For instance, a handler could :

The final output presentation and format will essentially depend on the trace handler that the user chose. However, the default handler we plan to develop will output the traces in a file. We plan to output in a format that can be read with LTT's standard visualizer, if possible. We keep in mind that LTT and this trace mechanism do not trace the same things and that visualizers do not necessarily need the same features (for instance we do not indicate which processor generated which event, and we are interested in viewing only events that deal with a given mutex.) The final decision will be taken once a few trace hooks are successfully inserted in NPTL code and a trace handler is able to output it in a trivial format.

NPTL maintainer's point of view

In order to enable developers to trace and debug their NPTL-based multithreaded programs, the NPTL needs changes to provide at least internal hooks. These hooks can then call external and customizable handler functions, but adding hooks to the NPTL itself is the very least one must do to achieve providing a NPTL tracing mechanism. Such modifications would probably be appreciated in the official NPTL tree (ie the official glibc tree,) but it will be provided as a patch, first.

Here is a proposal for a minimal modification to be done.

Initialization changes

At NPTL initialization, a function should check and obey the PTHREAD_NPTL_TRACE environment variable, load the requested handler shared object if there is one, attach trace hooks to its functions, and call a nptl_trace_init(char**) function of the handler, passing its parameters.

Some boolean global variable of the NPTL should indicate whether an execution trace has been requested. For instance, int __nptl_trace_enabled. Some integer global variable should indicate the requested detail level, for instance int __nptl_trace_detail_level.

At NPTL termination, a function should call a nptl_trace_fini() function of the handler, then unload it.

Internal changes

At each point in code where a significant event occurs, a trace must be generated if its detail level is less than or equal to the requested one. It could be done by building a struct nptl_trace_event and calling a nptl_trace_handler_getevent(struct nptl_trace_event) function from the handler.

Ideally, generating a trace should be decided both at compile time and at runtime. Thus, before building the structure and calling the handler function, a single test on the __nptl_trace_enabled and __nptl_trace_detail_level variables should be made. Doing this, the library's normal behavior should be little disturbed when not enabling traces. In addition, a preprocessor check on an ENABLE_NPTL_TRACE definition around each event generator will keep ability to not support traces at all in the library.

Thus, we would define some global function that looks like :

static inline int __must_generate_trace_level(int level) {
#ifdef ENABLE_NPTL_TRACE
  if(__nptl_trace_enabled)
    if(__nptl_trace_detail_level >= level)
      return 1;
  return 0;
#else
  return 0;
#endif
}

And each event occurrence in the code would look like :

if(__must_generate_trace_level(event_detail_level)) {
   (*nptl_trace_handler_getevent) ( { event structure } );
}

nptl_trace_event structure

An NPTL trace event structure is defined. It contains an enumerated type identifying the event source (NPTL, handler or program,) two others to identify the event itself (event domain and event identity in the domain), and a void pointer to indicate mem space containing event parameters when needed, along with the parameters size.

struct nptl_trace_event {
   char source;
   char domain;
   char event;
   size_t param_size;
   void* param;
};

We forgot about providing timestamps because it is a trace handler problem, and also because it may be irrelevant. We plan to use an event counter atomically increased instead.

Trace parameters formats depend on the event traced itself, therefore the structure can't be more precise. It should be allowed to free the memory where the parameters are stored after nptl_trace_handler_getevent returns, though. It is the handler's responsibility to copy and store it.

The domain and event properties both are there to identify an event, and making it not-so-complicated to add events inside the NPTL. Domain will identify a function or a function group (functions about mutexes...) and event will identify a unique event in the domain.

Supporting user events

Thanks to a field indicating the source of each event in the event structure, supporting the possibility to add traces that are specific to the trace handler and the traced program is trivial. The trace handler may add its own traces to the stream without needing to communicate with the other two entities. The traced program will need a function to generate events and add them to the stream. As it seems a complication to make the traced program aware of the trace handler, an nptl_trace_send_user_event(struct nptl_trace_event) function should be provided to the program by the NPTL, which will report it to the trace handler.

Trace handler developer's point of view

Collecting traces

The trace handler is in charge to collect traces generated during execution. To do that, it must implement a set of entry points that are called by the modified NPTL to submit traces of occurring events. Only this is mandatory theorically, but it is necessary to keep in mind a few constraints to produce something usable. As the only features we plan to add inside the NPTL itself are hooks to generate traces and trace mechanisms initialization, the trace handler is in charge of the whole complexity.

Constraints

Since trace handlers entry points are called on thread events, they run as threads, inside a NPTL primitive function. This implies that these entry points and the trace collecting mechanisms must respect a few constraints :

Proposed methods
Native specific method

As no system call can be made while tracing, it seems necessary to store collected traces in memory, and to output them after the program completion. An immediate issue with this method is the size of the buffer storing traces : the buffer can't be resized during execution as it is at least time-consuming, and thread-unsafe.

A proposal is to create a "big enough" circular buffer at initialization time, and to share the buffer memory with another process in charge to read and output the buffer content. How to output traces exactly is where the handler becomes really customizable : it could be written on standard output in a readable format, stored in a file for later analysis, or used by some runtime analyzer tool.

How the buffer is built precisely is an issue.

Userspace ported RelayFS method

RelayFS already implements a lockless, thread-safe, in-memory storing algorithm, and it is claimed that it can be fairly doable to make it run in userspace without the need of any syscall. We did not study this method much, but we plan to, when we will try to implement a smart trace handler.

Temporary trivial method

To make it possible to start implementing NPTL's trace hooks and initialization, as well as trace handlers entry points, we will implement a trivial trace handler that simply reserves a big buffer and stops storing new traces when it is full. It will still write an output file in an obvious format at the program termination to make it possible to trace at least the beginning. Of course, this method is only for pre-debugging purpose.

Visualizing traces

Once traces have been generated by the NPTL, submitted to the trace handler, stored in a buffer and re-read by a parallel process, we need some output of them. It is the trace handler responsibility to decide how to output traces, but we can list a few methods and directions.

Annex A : traced events list

(Building)

Thread events

Event nameparameters
NEW_THREAD_CREATEDvoid* address of start routine, pthread_t created thread id
THREAD_BEGINnone
THREAD_ENDvoid* pthread_exit() returned value (not so useful, but might help in some cases)

End of document