NPTL (Native POSIX Threading Library) Tests and Trace

pingouin

PROJECT RESULTS

gnou
 

 

This webpage contains a summary of the problems found in NPTL and the problems NOT found in NPTL (i.e. what is proved to work). All this information is extracted from the forum where more details can be found.

The tests results of our test campaigns can be found in this database.

 

 > Spotted problems

Disclaimer:

It cannot be expected to find an enormous bug in a library such as NPTL, which as been in use for several years now, and is included in several Linux distributions (Red Hat 9, Fedora,...). Features which are wildly used, such as mutex locks, have been stable and reliable for a long time.
What we are tracking here are the minor problems (divergences with the standard, strange behaviors), which most people will ignore, until someone get into the problem and get a headhache.

 
  Function sem_open
Description When there is not enough space to create a new semaphore, the function should set the errno to ENOSPC, but currently the value is ENOMEM
Status This has been reported in glibc bugzilla (bug #774).
 
  Function sem_close
Description The function sem_close() is not scalable: the duration depends on the number of opened semaphores.
Status This is by design, and fixing it would mean reworking the semaphores. As an application rarely uses lots of different semaphores, no further work will be done there.
 
  Function sigaction
Description A signal handler set with signal() can be saved then restored with sigaction().
Status This has been reported in glibc bugzilla (bug #749).
* Update * The bug was in the test case -- now fixed.
 
  Function sigaction
Description When SA_RESTART is set, interruptible functions shall not return EINTR, but instead they shall restart silently.
Status This has been reported in glibc bugzilla (bug #748).
 
  Function sigaction
Description When SA_RESETHAND is set, the function shall behave as if SA_NODEFER was also set -- which is not the case yet.
Status This has been reported in glibc bugzilla (bug #746).
 
  Function <unistd.h>
Description The <unistd.h> header inclusion shall make visible the symbol ftruncate() for POSIX sources, but currently you have to include the XSI features as well.
Status This has been reported in glibc bugzilla (bug #640).
 
  Function pthread_create()
Description Threads created with an explicit scheduling policy (SCHED_RR) does not behave as SCHED_RR threads should. This is more likely a kernel scheduler issue.
Status This has been reported in kernel bugzilla (bug #3770).
 
  Function pthread_mutex_timedlock()
Description This function can sometimes result in a hang in the application. The bug was posted to comp.programming.threads and relayed by our project.
Status This has been reported in bugzilla (bug #417).
 
  Function pthread_create()
Description Under some circumstances (detail is in bugzilla), pthread_create() segfaults instead of returning EPERM.
Status This has been reported in bugzilla (bug #405).
 
  Function pthread_create()
Description When the function fails with EPERM, the stack reserved for the new thread is not reclaimed, the result is a memory leak.
Status This has been reported in bugzilla (bug #401).
 
  Function pthread_create()
Description When the function fails because of lack of resource, it returns ENOMEM but should return EAGAIN.
Status This has been reported in bugzilla (bug #386).
 
  Function pthread_create()
Description In some circumstances, when pthread_create() fails, the new thread can execute some instructions before being destroyed.
Status This has been reported in bugzilla (bug #379).
 
  Function pthread_cond_destroy()
Description The POSIX standard requires that it is safe to destroy a condvar when no thread is blocked on it, but doing so in NPTL currently results in the waiting threads hanging.
Status This has been reported in bugzilla (bug #342).
 
  Function pthread_rwlock_t
Description The whole RWLock feature is unavailable when only _POSIX_C_SOURCE is defined (to 200112L), but it should be.
Status Reported in bugzilla as bug #320.
 
  Function pthread_cancel()
Description When trying to cancel a thread which is exiting, the process sometimes is killed with a segmentation fault.
Status A sample to demonstrate the problem could not be written -- looks like some other side effects are also involved.
 
  Function pthread_cancel()
Description On cancelation of a thread waiting for a conditionnal variable, the canceled thread sometimes does not re-acquire the mutex before calling the cancelation handler.
Status This has been reported in bugzilla (bug #300).
 
  Function pthread_cond_timedwait()
Description When the timeout parameter is invalid (tv_nsec < 0 or >= 1^9) the function should return EINVAL but it does not. This is not a non-conformance as POSIX does not specify what is an invalid timeout for this function.
Status This has been reported and accepted as a defect in POSIX.
 
  Function pthread_cond_timedwait()
Description When a thread enters a wait with an absolute timeout T1 then the clock is set to T2 > T1, the thread should be woken as if it timedout, but it does not.
Status This one *should* be fixed with kernels 2.6.8 and later.
 
  Function sysconf()
Description The returned value for _SC_TIMERS and _SC_THREAD_PROCESS_SHARED was 1, when 200112 is expected.
Status Those values were corrected in NPTL ~ end of May 2004. Need additionnal checks for other values.
 
  Function pthread_cond_broadcast()
Description The bug was not spotted inside the project, but we found a possible execution path leading to a hang. This is due to the FUTEX_REQUEUE usage.
Status Should have been corrected in NPTL CVS main branch ~ mid june 2004.
 
  Function malloc()
Description When there is no more memory left, malloc should return NULL and set errno to ENOMEM, but instead the process is killed.
Status This is the "Out Of Memory" (oom) daemon. When system becomes short on free memory, a process (theorically the responsible for this) is killed. This mechanism is not quite reliable as I saw several times the wrong process being killed. Someone told me: do
 
  Function pthread_barrier_destroy()
Description The function could be called successfully while some threads were still inside the barrier, resulting in a hang for this thread. A complete description of the problem can be found here.
Status This has been corrected in NPTL since Feb 20, 2004.
 

 

 > Tested features

 
 

Coverage Progress Indicator

 
 

The NPTL routines have been splitted into 4 groups, according to their priorities. More detailed information on the routines priority can be found here.

Click here for a graphical progress view.

 

 
Priority 1
15 functions
Started: Apr 22, 2004.
Current: 15 / 15 (100%) on Sep 28, 2004
(70 sample(s) written so far)
100%
100%
 
Priority 2
27 functions
Started: Nov 29, 2004.
Current: 27 / 27 (100%) on Mar 9, 2005
(149 sample(s) written so far)
100%
100%
 
Priority 3
45 functions
Started: not started.
Current: 0 / 45 (0%)
(0 sample(s) written so far)
0%
0%
 
Priority 4
60 functions
Started: not started.
Current: 0 / 60 (0%)
(0 sample(s) written so far)
0%
0%
 
 

Assertions coverage of conformance and stress tests

 

Click on the function name to get more information on assertions and coverage.

 Function #1 #2 #3
fork() 24 18 18
getpid() 1 1 1
pthread_atfork() 7 7 7
pthread_cancel() 7 7 1
pthread_cleanup_pop() 2 2 0
pthread_cleanup_push() 4 3 0
pthread_cond_broadcast() 6 3 3
pthread_cond_destroy() 4 2 2
pthread_cond_init() 9 8 5
pthread_cond_signal() 5 3 3
pthread_cond_timedwait() 13 12 8
pthread_cond_wait() 8 7 5
pthread_create() 11 10 8
pthread_detach() 4 4 3
pthread_equal() 3 3 1
pthread_exit() 7 7 7
pthread_getschedparam() 3 2 2
pthread_join() 5 4 3
pthread_kill() 6 6 2
pthread_mutex_destroy() 3 3 3
pthread_mutex_init() 10 10 5
pthread_mutex_lock() 10 7 3
pthread_mutex_trylock() 5 5 5
pthread_mutex_unlock() 9 7 2
pthread_once() 6 6 4
pthread_self() 1 1 1
pthread_setschedparam() 3 3 3
pthread_sigmask() 9 9 1
raise() 3 3 1
sched_yield() 2 2 0
sem_close() 3 2 1
sem_destroy() 2 2 0
sem_getvalue() 3 3 1
sem_init() 6 5 3
sem_open() 18 10 1
sem_post() 3 3 0
sem_trywait() 5 5 0
sem_unlink() 8 8 7
sem_wait() 4 4 1
sigaction() 25 23 10
sigprocmask() 8 8 0
sigwait() 8 8 4
 Total:283246135

 Average OPTS coverage: 90% 
 Our average participation:  55%
 
Legend: The coverage figures here above stand for those routines only.

#1

For each routines, POSIX specifies some criterias that the routine must match (mandatory criterias) and some other that the routine may match (optionnal criterias). The routine is said to be compliant when at least all the mandatory criterias are matched. This column in the table shows the number of different assertions that POSIX requires as mandatory for the routine.

#2

This column shows the amount of POSIX assertions that are completly tested in the current Open POSIX Test Suite release. Some other assertion might be partially tested, but this is not reported here.

#3

This is the number of different assertions for which a least one test has been contributed by our project into the Open POSIX Test Suite.

 

 

Features coverage of scalability and stress tests

 

Cancelation:

Heavy threads cancelation does not break the system.

There is no problem when canceling a terminating thread.

Cond vars:

Condvar Initialization and destruction duration does not depend on the number of condvars in use in the system -- for any kind of condvar.

Condvar initialization then destruction does not consume any system resource -- for any kind of condvar.

Latency between pthread_cond_timedwait timeout parameter and function actual return does not depend on the number of threads waiting on the condvar -- whatever kind of condvar.

No condition signaling (signal or broadcast) is lost.

No condition signaling can happen after the function has released the mutex, and not be visible to the function.

No condition signaling is consumed when a thread waiting for the condvar is canceled.

Mutexes:

Mutex initialization and destruction duration does not depend on the number of mutex in use in the system -- for any type of mutex.

Mutex initialization then destruction does not consume any system resource (memory leak, ...) -- for any type of mutex.

With a large amount of threads contending for some mutexes (of several types) with pthread_mutex_lock, pthread_mutex_trylock and pthread_mutex_timedlock, there is never more than one thread owning the same mutex at the same time.

Other:

The process creation time in fork() does not depend on the number of running processes.

Scheduling:

pthread_getschedparam() always returns the scheduling parameters of the queried thread.

Semaphores:

The sem_getvalue always returns the value of the semaphore at a given time during the call into the sval argument.

The sem_init and sem_destroy duration does not depend on the number of opened unnamed semaphores.

The sem_open and sem_close duration does not depend on the number of opened semaphores.

Signals:

Heavy signal delivery with pthread_kill() does not break the system, and no unpending signal get lost.

Threads:

pthread_self() always returns a unique thread ID.

The init_routine argument of pthread_once is never called more or less than once.

The thread creation time does not depend on the amount of threads already present in the process.

When a thread exits or is joined (depending on its attributes), its resources are freed.

When pthread_create() fails because of a lack of resource, EAGAIN is returned.

 

 
 

Page maintained by: Tony Reix
Last update: 2004, July 30th