 |
nptl.BullOpenSource.Org NPTL Tests & Trace project. Homepage.
|
| View previous topic :: View next topic |
| Author |
Message |
Tony Project Maintener

Joined: 21 Apr 2004 Posts: 9
|
Posted: Thu Apr 22, 2004 1:58 pm Post subject: Frank Levine <levinef at us ibm com> |
|
|
| Frank Levine wrote: |
Hi Tony,
You might want to consider other approaches to tracing. Another approach
would be to use the approach similar to kernel hooks, where NPTL implements
the mechanism to have call out to the trace facilities. The trace
facilities could dynamically connect and disconnect to NPTL. The trace
facilities would be responsible for time stamps and whatever else it
implements. I think this would work with LTT and would definitely work
with Performance Inspector. Both LTT and Performance Inspector provide
ways to format tracing information. The mechanism is fairly efficient
when there are no registered call outs, it simply has a branch or a check
and a branch. If you are willing to dynamically change NPTL code, this
could even be an unconditional branch which is virtually no performance
overhead on many hardware architectures. Another approach would be to
allow exactly one call out per instrumentation placement, where there is
simply a check to see if there is a call out. At any rate, I believe it is
important to support a limited number of call outs [dispatch, thread
creates, and thread destroys] without requiring a special patch.
Frank
|
|
|
| Back to top |
|
 |
Tony Project Maintener

Joined: 21 Apr 2004 Posts: 9
|
Posted: Thu Apr 22, 2004 2:00 pm Post subject: Mayeul Marguet |
|
|
| Mayeul wrote: |
I am not sure I understood all of it. Here are answers on what I
understood are your points :
Adding hooks inside the NPTL to call out trace facilities is already
pretty much the idea I tried to describe. I agree that letting the trace
facilities take care of timestamps does make sense, the fewer the NPTL
itself implements the better. Same goes for identifying current running
thread, I guess, and if possible. We still need to identify the hooks
and probably parameters.
As for LTT, Performance Inspector and other toolkits, the reasons why we
don't use it are :
- We want the trace mechanism to be the possibly least dependent to
other projects than the NPTL itself, in the hope that it would
eventually be added in the official NPTL branch. This does not rule out
using LTT or Performance Inspector in the trace facilities, though.
- We do not want to add NPTL-unrelated syscalls while tracing, as it
would most likely break "normal" scheduling. As far as I have understood
existing trace toolkits, they all eventually rely on syscalls while
tracing userspace programs.
We would indeed appreciate to be able to trace even a limited set of
events without requiring a patch, but it seems pretty impossible. Well,
it is probably possible to [ls]trace, but it seems really limited, and
it leads to unwanted syscalls. If you were thinking about another
method, I am afraid I know nothing about it, and I am willing to learn.
Please let me know if I got something wrong. Thank you again.
|
Last edited by Tony on Thu Apr 22, 2004 2:58 pm; edited 1 time in total |
|
| Back to top |
|
 |
Tony Project Maintener

Joined: 21 Apr 2004 Posts: 9
|
Posted: Thu Apr 22, 2004 2:05 pm Post subject: Answer from Frank Levine |
|
|
The answer from Frank begins with a >> at first column.
| Frank wrote: |
I am not sure I understood all of it. Here are answers on what I
understood are your points :
Adding hooks inside the NPTL to call out trace facilities is already
pretty much the idea I tried to describe.
>> I was trying to distinguish between environment variables and
>> registering call out. The registering of call outs tends to be
>> dynamic and programmable. The usage of environment variables, suggests
>> that the support is only allowed at initialization.
I agree that letting the trace
facilities take care of timestamps does make sense, the fewer the NPTL
itself implements the better. Same goes for identifying current running
thread, I guess, and if possible. We still need to identify the hooks
and probably parameters.
As for LTT, Performance Inspector and other toolkits, the reasons why we
don't use it are :
- We want the trace mechanism to be the possibly least dependent to
other projects than the NPTL itself, in the hope that it would
eventually be added in the official NPTL branch. This does not rule out
using LTT or Performance Inspector in the trace facilities, though.
- We do not want to add NPTL-unrelated syscalls while tracing, as it
would most likely break "normal" scheduling. As far as I have understood
existing trace toolkits, they all eventually rely on syscalls while
tracing userspace programs.
>> I would like to see these toolkits be able to use your call out mechanism.
>> It is only while tracing that there would be any significant overhead.
>> It would be great if you could figure out a way to trace without affecting
>> performance, but if the placement of the hooks are done properly and the
>> routines being called are efficient, then we may be able to have negligible
>> impact. Specifying, having a call out when a thread is dispatched could allow
>> Performance Inspector to write that thread id to a mapped memory location for
>> usage on tprof with only minor perturbation.
We would indeed appreciate to be able to trace even a limited set of
events without requiring a patch, but it seems pretty impossible. Well,
it is probably possible to [ls]trace, but it seems really limited, and
it leads to unwanted syscalls. If you were thinking about another
method, I am afraid I know nothing about it, and I am willing to learn.
>> Again, I was trying to suggest that there be a few places where at
>> least one user could register to be called. Essentially, NPTL allows
>> a trace facility to be called. The trace facility may or may not actually
>> make a syscall.
Please let me know if I got something wrong. Thank you again.
|
Last edited by Tony on Thu Apr 22, 2004 3:00 pm; edited 1 time in total |
|
| Back to top |
|
 |
Tony Project Maintener

Joined: 21 Apr 2004 Posts: 9
|
Posted: Thu Apr 22, 2004 2:08 pm Post subject: Karim Yaghmour <karim at opersys com> answer |
|
|
| Karim Yaghmour wrote: |
Just thought I'd respond to this one:
Frank Levine wrote:
> - We do not want to add NPTL-unrelated syscalls while tracing, as it
> would most likely break "normal" scheduling. As far as I have understood
> existing trace toolkits, they all eventually rely on syscalls while
> tracing userspace programs.
True ... for now that is. With the lockless scheme now part of relayfs
(which LTT relies on), it should be fairly straight forward to map the
buffer index in user-space (in addition to the mmapped trace buffer) and
let the user-space do the same cmpxchge the kernel part of relayfs
currently already does. In other words, you would get user-space
tracing in relayfs without any system calls and other such disturbances.
Background info.: Using the cmpxchge, it's just a matter of trying to
allocate space, and retrying in case of failure (i.e. user-space got
interrupted by some kernel code that put something in the buffer, so
user-space must retry.) Once you've incremented the index, it's just a
matter of writing the actual data.
All in all, I really think it's worth looking at using relayfs for NPTL.
By using relayfs, NPTL should actually not need to rely on any of LTT's
kernel tracing infrastructure. The trace reading tools may need some
extension, but it really isn't esoteric. Plus, relayfs can loaded as a
module (i.e. if relayfs were extended for user-space index sharing, NPTL
could start using it without having to wait for kernel inclusion.)
[I've cc'ed the other people involved in LTT and relayfs.]
Karim
|
Last edited by Tony on Thu Apr 22, 2004 3:02 pm; edited 1 time in total |
|
| Back to top |
|
 |
Tony Project Maintener

Joined: 21 Apr 2004 Posts: 9
|
Posted: Thu Apr 22, 2004 2:34 pm Post subject: Perez-Gonzalez, Iñaky <inaky perez-gonzalez a |
|
|
Answer of Iñaky to Mayeul's comment.
| Iñaky wrote: |
> > ... [code snippets ommited] ...
> > or even fold that into a static inline wrapper that does the if for you
> > and is #defined out as NOP if not enabled at compile time. The point here
> > is to suggest examples that clarify and don't make the code too thick.
> > I know it is an implementation problem, but you know Ulrich
>
> Your suggestion seems fine. Didn't think it was necessary to write
> actual code as an example.
Agreed--it is an implementation detail.
> > Were timestamps are not available, they could be substituted with an atomic
> > counter, so at least you get a way to sort them. In architectures/platforms
> > where there is no way to have a synced up TSC, it could be combined with
> > the sequential atomic counter to help in the sync up. In arches that have none
> > of any (like most ARMs, afaik), it could be done via kernel calls to get the
> > CLOCK_REALTIME value.
>
> Makes sense. Thank you for the suggestion.
>
>
> > On the buffer storage: in my experience, I have had to use one of the three
> > methods for different problems at different times. I'd rule out the per-processor
> > one because it is kind of unworkable in user space.
>
> I would too, but never really checked whether possible methods would
> exist, so it couldn't be completely ruled out.
The main problem is how to associate to which CPU you are running
and how to make sure you ain't moved from one CPU to another while
you are logging--and without kernel help (big overhead), it is kind
of very difficult to do it.
In many cases, something that I have also used has been to associate
a buffer which each thread at thread creation time and then, at
destruction time, flushing it to a global buffer [only the actual
thread exit handler had access to it].
It took some overhead on creation/destruction, but by preallocating a
series of small buffers and playing ping-pong with them, it worked
pretty well; in a nutshell, the ping pong game was: small same-size
buffers are allocated and put in a global free buffer list--each created
thread gets one from that list. When logging, log to the thread-specific
buffer; if full, attach the full buffer to a global full buffer list,
pick up a free one from the global free list and proceed with logging.
Another thread (the collector), runs around emptying the full buffer list
to a file and moving the now emptied buffers to the global free list; as
well, it makes sure there is enough space in the global free list by
allocating more if needed [although this didn't work too well, it was better
to repeat the execution with a higher number of buffers allocated at the
beginning].
The only thing I never got to solve was how to recover information when
a thread was killed and didn't have the chance to cleanup after itself
and flush its current buffer out to the global full list (and this list
to be flushed to disk). I was using cancellation handlers, so sig11s
would kill most of that.
It probably could be handled by hacking it into the library (instead of
relying on the pthread interface). As well, for added safety, you could
do it by having a separate process mapping the memory where the buffers
are allocated, so it cannot get killed by a sig11 to the traced process
[gets to be more complex though ].
> > I think a solution could be to let the user control which method to use using
> > environment variables, so he can decide depending on the type of application.
> > Same thing with the size and also a way to flush if the circular buffer fills up
> > [this toggle could do, if on: if I am up to fill up the buffer, flush to file so
> > and so--mark as an event the time just before we issued the write and the time
> > just before we regained control after it, so the analyzer can take into account
> > flushing times].
>
> So you suggest we provide a default trace handler that can use both
> methods. Setting it up could be made with the 'trace handler parameters'
> field of the proposed PTHREAD_NPTL_TRACE.
Yeah, as long as you use function pointers, you can instantiate different
methods [have a global structure with the methods].
> As for flushing the buffer, as it is necessary, it will probably be
> discussed in a future version of the draft.
Great
|
|
|
| Back to top |
|
 |
Guest
|
Posted: Fri Apr 23, 2004 10:28 am Post subject: |
|
|
| Karim Yaghmour wrote: | True ... for now that is. With the lockless scheme now part of relayfs
(which LTT relies on), it should be fairly straight forward to map the
buffer index in user-space (in addition to the mmapped trace buffer) and
let the user-space do the same cmpxchge the kernel part of relayfs
currently already does. In other words, you would get user-space
tracing in relayfs without any system calls and other such disturbances. |
Seems like very good news to me. If I got it right, you claim it is fairly doable to adapt relayfs and make it handle traces without any syscall during the tracing process. Sadly, relayfs was ruled out early just because of the 'syscall part,' and I do not know much about it. (Therefore, requires time to study and adapt it.) I trust there are reasons why it is used by other toolkits, though.
| Karim Yaghmour wrote: | | All in all, I really think it's worth looking at using relayfs for NPTL. |
I am not sure it is clear, so I'll precise it : it is much wanted that any modifications to the NPTL itself should only be hooks taking place wherever it seems relevent to trace an event. The part that is called by these hooks, I will call it the trace handler, should be as independant and customizable as possible. The general idea being that making NPTL tracable and collecting generated traces are different problems.
This said, I have the impression many people would be enthusiastic for a relayfs-based trace handler (especially if this can save from resolving known problems yet again.) This is more a problem of when. First planned step is a kind of proof of concept, with few hooks, a 'trivial' trace handler and probably only ia32 support. I am not convinced we should start with adapting and using relayfs, because of the probably necessary time to study and adapt it, while we only want to verify the hooks mechanism is fairly usable.
In any case, thank you very much for the hint. |
|
| Back to top |
|
 |
sthibault Active contributor

Joined: 29 Apr 2004 Posts: 1 Location: LaBRI, Bordeaux
|
Posted: Thu Apr 29, 2004 3:16 pm Post subject: |
|
|
Using relayfs seems indeed a good way to get trace from kernel space to user space.
But in FKT (Fast Kernel Tracing, see http://www.cs.unh.edu/~rdr), we chose exactly the opposite way: since data is (almost) always eventually flushed by the kernel (on disk, via nfs, ...), we keep data in kernel space, and it is flushed in kernel space, instead of a user-level flushing which might involve useless things like copying data in the page cache.
This also avoids any system call except at initialization: a mere sendfile() system call with an arbitrary huge size value is done between a special block driver /dev/fkt and any file the user wants data to go in ; sendfile() returns only when the trace is finished. See http://perso.ens-lyon.fr/samuel.thibault/stage/2/report.ps.gz, particularly section 2, for details. |
|
| Back to top |
|
 |
Guest
|
Posted: Thu Apr 29, 2004 4:51 pm Post subject: |
|
|
Well I guess this is another 'collect and store traces' method to study in order to come up with the most convenient possible solution. I'll look at the details.
The point in using relayfs was the fact that it already implements a high speed concurrent memory writes algorithm. In user or kernel space, we will need one anyway.
As long as performance is fair, the more convenient the better : ideally, we won't even need a kernel module. We keep daydreaming of a trace mechanism directly included inside distributed NPTLs, therefore we would appreciate a trace collecting system that is easily installable as well. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|