NPTL Scalability Limits

Abstract

This document covers the issues in NPTL scalability. The different NPTL functions are treated, and for each the limits it can run into. This document is not exhaustive but can be a great help if you encounter scalability problems.

Revision

Summary

The following functions are treated:

Threads

pthread_create()

I have run into three limits.

The first one was due to the stack, because with the default stack size I couldn't create more than 256 threads.This is due to the default stack size of 64Mb.When you require the minimum stacksize (64Kb) withpthread_attr_setstacksize(), you can create up to 262000 threads before the stack becomes the limitation.

The second limit I ran into was given by 'ulimit -u' defaulting to 2039,so I could only create around 2037 threads. The bash info says that this parameter is the total number of threads a user can run concurrently, but this includes threads as well. The way to change this is either to run as root as the limit is ignored, or to change in /etc/security/limits.conf and add something like:

thedoc      hard    nproc    20000
thedoc      soft    nproc    20000

The last limit I am running into (4079 threads) is directly linked to the amount of memory I have (256Mb). In fact, linux does not allow more than ( num_physpages / 16) concurrent threads. I don't know what is the limit on multi-processors boxes, but this is true for mono-CPU. This is defined in linux sources in kernel/fork.c file (fork_init() function). The num_physpages corresponds to the number appearing at boot time (you can get it through '/var/log/dmesg'. For example I get:

------------------------------------------------------------
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
 BIOS-e820: 000000000fef0000 - 000000000fef8000 (ACPI data)
 BIOS-e820: 000000000fef8000 - 000000000ff00000 (ACPI NVS)
 BIOS-e820: 000000000ff00000 - 0000000010000000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
254MB LOWMEM available.
On node 0 totalpages: 65264
-------------------------------------------------------------

So at most I can have on my box 65264/16 = 4079 concurrent threads/process, all users mixed. I have read there are parameters you can pass to the kernel at boot time to override this value of num_physpages but they will be ignored when no real RAM corresponds -- you can only decrease the amount of memory the kernel uses.

Additionnaly, I just upgraded my box to 512Mb RAM and I can now create more than 8000 threads, which confirm the previous assertion. As NPTL is a 1:1 model (1 kernel thread for 1 user thread), the limitation is actually on the kernel threads side. In my opinion (but I did not verify this), kernel threads data is never swapped because it is constantly used. As a result, if you create too many threads, the whole physical memory will be used and the system will be unusable.