NPTL Test Campaign #1

~ Report ~


NPTL Test Campaign - Introduction

* This document:

Here is the Table Of Content:

* Context:

The NPTL Tests and Trace Project (http://nptl.bullopensource.org) has reached a milestone, with the completion of its highest priority function list (15 functions) testing. During this first stage, 54 conformance tests, 6 scalability tests and 10 stress tests have been written. This test campaign concludes and validates this stage.

* Goals:

There are several goals to this testing:
  1. Ensure that the tests are bug-free and can execute on different hardware and software.
  2. Check the NPTL bugs, in several situations:
  3. Build a status on NPTL on main distributions, on the same machine (i686).

* Methodology:

On each configuration, a test script is run as root, with no other activities on the machine.
This script does the following actions:
-> Run the complete Open POSIX TestSuite conformance tests collection (1856 tests), and build a status of the results.
-> Run the 6 scalability tests.
-> Run again 3 scalability tests with graphical output ability.
-> Run the 10 stress tests and let the execution run for 6 hours. During this run, vmstat records the machine memory evolution.

The complete test package, with the scripts, can be found here. Just uncompress and run procedure.sh.
Several patches are also available:
Please note that the script may contain some dependencies on my environment (directory names, ...). If you want to use it and encounter problems, feel free to contact me (contact address on the website).

* Test machines:

NPTL:
This is the "small" machine. As its name indicates, it is dedicated to our project. It is a dual-Xeon 2.80GHz, with 2GB memory and 4 SCSI hard-drives.
nptl.jpg

FASTBOOT:
Our "big i686 machine". It is a 8xXeon 1.40GHz, with 32GB memory and more than 80 SCSI hard-drives.
fastboot.jpg

VERONE:
This is a quad Power-PC IV, cadenced at 600MHz, with a total of 4GB memory.

verone.jpg

STLINUX12:
This is the "monster" :-). It is a partition of a Bull Server. The partition contains 24xItanium2 CPU at 1.4GHz, and 96GB memory.
stlinux12.jpg



Conformance Tests - Results

Here is a table of the OPTS run results for every configuration.
The complete details of these runs can be found here:
http://nptl.bullopensource.org/Tests/results/run-browse.php

Configuration:
NPTL-latest
NPTL-SLES9
NPTL-FC3T2
FASTBOOT-FC3T2
FASTBOOT-SLES9
VERONE-SLES9
IA64-BAS3
IA64-latest
CPU:
2xIA32
2xIA32 2xIA32 8xIA32
8xIA32 4xPPC
24xItanium2
24xItanium2
RAM:
2GB
2GB 2GB 32GB
32GB 4GB
64GB
64GB
kernel:
2.6.8.1
2.6.5-7.97-smp
2.6.8-1.541smp
2.6.8-1.541smp
2.6.5-7.97-bigsmp
2.6.5-7.97-pseries64
2.6.7-BAS3V14 2.6.8.1
libc:
2004-10-11
NPTL 0.61
NPTL 2.3.3
NPTL 2.3.3
NPTL 0.61
NPTL 0.61
NPTL 0.60 2004-10-11

Total Tests (*): 1846
1846
1846
1846
1846
1846
1846
1846
Tests Passed: 1467
1313
1439
1440
1311
1261
1288
1338
Build Failures: 39
186
56
56
186
193
194
48
Link Failures: 0
4
4
4
4
4
4
0
Link Skipped: 233
234
233
233
234
233
230
230
Tests Failed: 17
19
21
20
21
21
36
61
Tests Untested: 46
46
46
46
46
46
46
46
Tests Unresolved: 5
5
6
6
5
4
4
85
Tests Unsupported: 29
29
29
29
29
37
39
29
Tests Killed: 8
8
9
9
8
45
5
7
Tests Hung: 2
2
3
3
2
2
0
2
Duration (min): 20.35
23.12
25.65
37.18
29.55
37.55
21.02
25.18

Legend:


Scalability Tests - Results

Configuration:
NPTL-latest
NPTL-SLES9
NPTL-FC3T2
FASTBOOT-FC3T2
FASTBOOT-SLES9
VERONE-SLES9
IA64-BAS3
IA64-latest
CPU:
2xIA32
2xIA32 2xIA32 8xIA32
8xIA32 4xPPC
24xItanium2
24xItanium2
RAM:
2GB
2GB 2GB 32GB
32GB 4GB
64GB
64GB
kernel:
2.6.8.1
2.6.5-7.97-smp
2.6.8-1.541smp
2.6.8-1.541smp
2.6.5-7.97-bigsmp
2.6.5-7.97-pseries64
2.6.7-BAS3V14 2.6.8.1
libc:
2004-10-11
NPTL 0.61
NPTL 2.3.3
NPTL 2.3.3
NPTL 0.61
NPTL 0.61
NPTL 0.60
2004-10-11

pthread_cond_init
PASS
COMPIL
COMPIL COMPIL COMPIL COMPIL COMPIL PASS
pthread_cond_timedwait
PASS
COMPIL
COMPIL COMPIL COMPIL COMPIL COMPIL KILL
pthread_create
FAIL(*)
PASS
FAIL(*) FAIL(*) PASS PASS PASS PASS
pthread_mutex_init PASS
PASS
PASS PASS PASS PASS PASS PASS
pthread_mutex_lock1
PASS
PASS
PASS PASS PASS PASS KILL
KILL
pthread_mutex_lock2
PASS
PASS
PASS PASS PASS PASS KILL
KILL
pthread_cond_timedwait (graph) OK
COMPIL
COMPIL COMPIL COMPIL COMPIL COMPIL KILL
pthread_create (graph)
FAIL(*)
OK
FAIL(*) FAIL(*) OK OK OK OK
pthread_mutex_init (graph) OK
OK
OK OK OK OK OK OK

(*) This test fails because of a conformance issue, not because of a scalability problem.

Legend:





Scalability Tests - Graphics

In addition to their return value (based on a statistical analysis of the results), some of the scalability tests are able to output their information in a format suitable for gnuplot. Three of them have this ability: pthread_cond_timedwait.c, pthread_create.c and pthread_mutex_init.c. As the previous table shows, the pthread_cond_timedwait.c test did execute only in the first test configuration, so the output has not been reported in this report. An output example of this test can be found in the project's forum.

* pthread_create.c:

This test measures the time needed by the implementation to create a new thread, depending on the number of threads already created in the process. For each thread attributes, a separate graph is generated.

Note: The legend of the following graphics is:
2xi686 + latest glibc
Conformance issue - no graph was generated
2xi686 + SLES9 nptl-sles9.create.png
2xi686 + FC3T2 Conformance issue
8xi686 + FC3T2 Conformance issue
8xi686 + SLES9 fastboot-sles9.create.png
4xPPC + SLES9 verone-sles9.create.png
24xItanium2 + BAS3 stlinux12-bas3.create.png
24xItanium2 + latest glibc
stlinux12-latest.create.png

Note: Some of the cases report a duration of '0'. This actually means that the thread creation failed with the given attributes.

Comments:

We can see that the duration of thread create operation is not dependent on the number of already-created threads. The only exception is the case 21 (Explicit FIFO, max param, alt scope) on the Itanium 2 box. This configuration has a vanilla kernel with no patch running on a NUMA machine -- this could explain the results.

One can also see that the case 17 (Explicit FIFO, max param) is slightly slower than the other configurations, for every Linux runs.



* pthread_mutex_init.c:

This testcase measures the time required for the pthread_mutex_init() operation, depending on the number of mutex already initialized in the system.

2xi686 + latest glibc nptl-latest.mutex_init.png
2xi686 + SLES9 nptl-sles9.mutex_init.png
2xi686 + FC3T2 nptl-fc3t2.mutex_init.png
8xi686 + FC3T2 fastboot-fc2t2.mutex_init.png
8xi686 + SLES9 fastboot-sles9.mutex_init.png
4xPPC + SLES9 verone-sles9.mutex_init.png
24xItanium2 + BAS3 stlinux12-bas3.mutex_init.png
24xItanium2 + latest glibc stlinux12-latest.mutex_init.png

Comment:
We can see that the mutex initialization duration is not dependant on the number of mutexes initialized in the system.




Stress Tests - Results

The stress tests are meant to push the implementation to its limits. Each test is designed to run alone in the system, so by running the tests alltogether, some normal failures may appear. Anyway, some of the stress tests can run to completion, and give useful data.

During the tests execution, vmstat was run to monitor the free memory in the system.
A complete monitoring tool, perfware, has also been used with the latest configuration. Its results are in the next section.

Configuration:
NPTL-latest
NPTL-SLES9
NPTL-FC3T2
FASTBOOT-FC3T2
FASTBOOT-SLES9
VERONE-SLES9
IA64-BAS3
IA64-latest
CPU:
2xIA32
2xIA32 2xIA32 8xIA32
8xIA32 4xPPC
24xItanium2
24xItanium2
RAM:
2GB
2GB 2GB 32GB
32GB 4GB
64GB
64GB
kernel:
2.6.8.1
2.6.5-7.97-smp
2.6.8-1.541smp
2.6.8-1.541smp
2.6.5-7.97-bigsmp
2.6.5-7.97-pseries64
2.6.7-BAS3V14
2.6.8.1
libc:
2004-10-11
NPTL 0.61
NPTL 2.3.3
NPTL 2.3.3
NPTL 0.61
NPTL 0.61
NPTL 0.60
2004-10-11

Wait duration:
6h00'21''
6h00'19'' 6h00'30'' 6h00'27'' 6h00'16'' 6h00'15'' 6h00'05'' 6h00'23''
Free memory:
DEC
CONST
DEC
DEC
CONST CONST DEC
CONST
pthread_cond_init
PASS (6h00'01'')
COMPIL
COMPIL COMPIL COMPIL COMPIL COMPIL PASS (6h00'00'')
pthread_cond_timedwait1
FAIL (10'40'')
COMPIL COMPIL COMPIL COMPIL COMPIL COMPIL PASS (6h00'00'')
pthread_cond_timedwait2 FAIL (1'40'')
COMPIL COMPIL COMPIL COMPIL COMPIL COMPIL SEGV
pthread_cond_wait1 FAIL (4'03'')
COMPIL COMPIL COMPIL COMPIL COMPIL COMPIL FAIL (12'21'')
pthread_cond_wait2 FAIL (1'50'')
COMPIL COMPIL COMPIL COMPIL COMPIL COMPIL FAIL (4'42'')
pthread_cond_wait3 FAIL (6'11'')
COMPIL COMPIL COMPIL COMPIL COMPIL COMPIL PASS (6h00'01'')
pthread_exit
PASS (6h00'47'')
PASS (6h00'01'') PASS (6h00'06'') PASS (6h00'01'') PASS (6h00'00'') PASS (6h00'03'') PASS (6h00'00'') PASS (6h00'01'')
pthread_mutex_init PASS (6h00'01'') PASS (6h00'02'') PASS (6h00'02'') PASS (6h00'01'') PASS (6h00'01'') PASS (6h00'04'') PASS (6h00'00'') PASS (6h00'00'')
pthread_mutex_lock
PASS (6h00'09'') PASS (6h00'12'') PASS (6h00'04'') PASS (6h00'01'') PASS (6h00'03'') PASS (6h00'06'') PASS (6h00'00'') PASS (6h00'00'')
pthread_mutex_trylock
FAIL (4'55'') PASS (6h00'00'') PASS (6h00'01'') PASS (6h00'00'') PASS (6h00'00'') PASS (6h00'03'') PASS (6h00'00'') PASS (6h00'02'')

Note:
On the Itanium 2 machine, the stress factor (S.F., specified at compile time) was raised from 1 to 5 to match the machine possibilities.
Amount of threads created ~= (1 + S.F.) * 1000 * S.F
S.F.=1 => about 2000 threads
S.F.=5 => about 30000 threads

Legend:



Stress Tests - Graphics

The following graphics have been generated with perfware (Bull internal tool), during a stress test on the IA64 machine, with the latest NPTL.
The execution script was:
$ for i in 1 2 3 4 5;
   do (compile all the stress tests with scalability factor = $i);
        (run for 1 hour;)
        (sleep 2 minutes;)
   done.

The scalability factor makes the number of concurrent threads and processes that a stress test generates change.
With the value 1, about 2000 threads are created.
With the value 5, about 30000 threads are created.

Please note that some of the stress tests failed in the first minutes of the run.
With SF=1, the machine can handle more tests than with the other values; and therefore the load seems higher.

Charge on the  CPUs.
stlinux12-latest.cpu_0.png
Load average (over 1, 5 and 15 min)
stlinux12-latest.loadavg_0.png
Number of tasks
stlinux12-latest.nbproc_0.png


Conclusions

Ensure that the tests are bug-free and can execute on different hardware and software.

This first goal has been partly fulfilled. In each run, at least 5 testcases are killed with a segmentation fault. This may be due to bugs in the system or in the tests themselves. Further analysis is needed here.

Nevertheless, no unexpected problem was found in the 15 functions from our priority-1 list.

Check the NPTL bugs, in several situations:
 - bleeding-edge context, with vanilla kernel and glibc; or standard OS distributions
 - with large machines (many CPUs, many I/O, lot of memory)
 - with several CPU architectures (i686, Power-PC, Itanium 2)

The second goal was disturbed by the fact that a lot of POSIX functionnalities have been recently added to the Linux kernel and to glibc (message queues for example), and most of the tested distributions are older than these additions -- and therefore a lot of header and execution problems appeared with those routines.

It is also worth to notice that it was not possible to compile the library for Power-PC linux (as the binutils package failed to install), and that the process was far less straight-forward on Itanium-2 than on i686 (needed to add some links, etc...).

On the other hand, the NPTL behaved very well on large machines, showing no scalability weakness.

Build a status on NPTL on main distributions, on the same machine (i686).
 - Suse Linux Entreprise Server 9 (SLES9)
 - RedHat Fedora Core 3 (test 2)
 - RHEL3 (updated with NPTL support and recent kernel)

The conclusion for the third goal is that the more recent the distribution components (kernel, glibc), the better the results. Indeed, this is due to the NPTL being still under heavy development, with changes almost everyday; so a two-months delay between two Linux distributions leads to a large difference in results.

It is worth noting anyway that all the tested distributions have this header problem that prevented most of the scalability tests to be run; even if their NPTL binary file contains the functionnality. This is due to NPTL-only routines being unusable in default environment -- one need special linker flags to be able to use it (actually tests are linked to linuxthreads library at compilation time).

 


~ End ~


Author: Sébastien DECUGIS