引文:线程模型(Threading Model)默认从进程域 (M:N 模型 ) 改为系统全局域 (1:1 模型 )
在 AIX 5L 中,pthread 线程的默认模型是 m:n 方式,而从 AIX 6.1 开始,默认改为了 1:1 方式。这两种方式在系统中通过 AIXTHREAD_SCOPE 环境变量来进行控制。如果设置 AIXTHREAD_SCOPE=P,则线程模型为进程域(M:N 模型),设置 AIXTHREAD_SCOPE=S 则为系统域(1:1 模型)。
1:1 模型下,每个用户空间的线程都对应于内核中的一个线程,线程的调度由内核在系统全局范围进行;而 M:N 模型下,多个用户线程对应于内核中的多个内核线程,用户线程调度仅限于在本进程范围内进行,而对应的内核线程则交由内核进行调度。许多应用程序例如数据库 和 Java 应用要求设置为 1:1 方式以提供更好的性能,在 AIX 5L 中这些应用程序会要求配置 AIXTHREAD_SCOPE 环境变量,而在 AIX 6.1 中默认即为为 1:1 方式,不再需要进行配置。
原文:
Thread tuning
User threads provide independent flow of control within a process.
If the user threads need to access kernel services (such as system calls), the user threads will be serviced by associated kernel threads. User threads are provided in various software packages with the most notable being the pthreads shared library (libpthreads.a). With the libpthreads implementation, user threads sit on top of virtual processors (VP) which are themselves on top of kernel threads. A multithreaded user process can use one of two models, as follows: 1:1 Thread Model The 1:1 model indicates that each user thread will have exactly one kernel thread mapped to it. This is the default model on AIX® 4.1, AIX 4.2, and AIX 4.3. In this model, each user thread is bound to a VP and linked to exactly one kernel thread. The VP is not necessarily bound to a real CPU (unless binding to a processor was done). A thread which is bound to a VP is said to have system scope because it is directly scheduled with all the other user threads by the kernel scheduler. M:N Thread Model The M:N model was implemented in AIX 4.3.1 and is also now the default model. In this model, several user threads can share the same virtual processor or the same pool of VPs. Each VP can be thought of as a virtual CPU available for executing user code and system calls. A thread which is not bound to a VP is said to be a local or process scope because it is not directly scheduled with all the other threads by the kernel scheduler. The pthreads library will handle the scheduling of user threads to the VP and then the kernel will schedule the associated kernel thread. As of AIX 4.3.2, the default is to have one kernel thread mapped to eight user threads. This is tunable from within the application or through an environment variable.
Depending on the type of application, the administrator can choose to use a different thread model. Tests on AIX 4.3.2 have shown that certain applications can perform much better with the 1:1 model. This is an important point because the default as of AIX 4.3.1 is M:N. By simply setting the environment variable AIXTHREAD_SCOPE=S for that process, we can set the thread model to 1:1 and then compare the performance to its previous performance when the thread model was M:N.
If you see an application creating and deleting threads, it could be the kernel threads are being harvested because of the 8:1 default ratio of user threads to kernel threads. This harvesting along with the overhead of the library scheduling can affect the performance. On the other hand, when thousands of user threads exist, there may be less overhead to schedule them in user space in the library rather than manage thousands of kernel threads. You should always try changing the scope if you encounter a performance problem when using pthreads; in many cases, the system scope can provide better performance.
If an application is running on an SMP system, then if a user thread cannot acquire a mutex, it will attempt to spin for up to 40 times. It could easily be the case that the mutex was available within a short amount of time, so it may be worthwhile to spin for a longer period of time. As you add more CPUs, if the performance goes down, this usually indicates a locking problem. You might want to increase the spin time by setting the environment variable SPINLOOPTIME=n, where n is the number of spins. It is not unusual to set the value as high as in the thousands depending on the speed of the CPUs and the number of CPUs. Once the spin count has been exhausted, the thread can go to sleep waiting for the mutex to become available or it can issue the yield() system call and simply give up the CPU but stay in an executable state rather than going to sleep. By default, it will go to sleep, but by setting the YIELDLOOPTIME environment variable to a number, it will yield up to that many times before going to sleep. Each time it gets the CPU after it yields, it can try to acquire the mutex.
Certain multithreaded user processes that use the malloc subsystem heavily may obtain better performance by exporting the environment variable MALLOCMULTIHEAP=1 before starting the application. The potential performance improvement is particularly likely for multithreaded C++ programs, because these may make use of the malloc subsystem whenever a constructor or destructor is called. Any available performance improvement will be most evident when the multithreaded user process is running on an SMP system, and particularly when system scope threads are used (M:N ratio of 1:1). However, in some cases, improvement may also be evident under other conditions, and on uniprocessors.