CAS原子操作实现无锁及性能分析

 

  • Author:Echo Chen(陈斌)

近期在研究nginx的自旋锁的时候,又见到了GCC CAS原子操作,于是决定动手分析下CAS实现的无锁究竟性能怎样,网上关于CAS实现无锁的文章非常多。但少有研究这样的无锁的性能提升的文章,这里就以实验结果和我自己的理解逐步展开。


  • 1.什么是CAS原子操作

在研究无锁之前。我们须要首先了解一下CAS原子操作——Compare & Set,或是 Compare & Swap,如今差点儿所CAS原子操作实现无锁及性能分析_i++有的CPU指令都支持CAS的原子操作,X86下相应的是 CMPXCHG 汇编指令。

大家应该还记得操作系统里面关于“原子操作”的概念,一个操作是原子的(atomic),假设这个操作所处的层(layer)的更高层不能发现其内部实现与结构。原子操作能够是一个步骤,也能够是多个操作步骤。可是其顺序是不能够被打乱,或者分割掉仅仅运行部分。有了这个原子操作这个保证我们就能够实现无锁了。

CAS原子操作在维基百科中的代码描写叙述例如以下:

1: int compare_and_swap(int* reg, int oldval, int newval)
2: {
3:   ATOMIC();
4:   int old_reg_val = *reg;
5:   if (old_reg_val == oldval)
6:      *reg = newval;
7:   END_ATOMIC();
8:   return old_reg_val;
9: }


也就是检查内存*reg里的值是不是oldval,假设是的话。则对其赋值newval。上面的代码总是返回old_reg_value,调用者假设须要知道是否更新成功还须要做进一步推断,为了方便,它能够变种为直接返回是否更新成功,例如以下:


1: bool compare_and_swap (int *accum, int *dest, int newval)
2: {
3:   if ( *accum == *dest ) {
4:       *dest = newval;
5:       return true;
6:   }
7:   return false;
8: }

  • 除了CAS还有下面原子操作:

1: << atomic >>
2: function FetchAndAdd(address location, int inc) {
3:     int value := *location
4:     *location := value + inc
5:     return value
6: }

 

Test-and-set,写值到某个内存位置并传回其旧值。汇编指令BST。

1: #define LOCKED 1
2:
3: int TestAndSet(int* lockPtr) {
4:     int oldValue;
5:
6:     // Start of atomic segment
7:     // The following statements should be interpreted as pseudocode for
8:     // illustrative purposes only.
9:     // Traditional compilation of this code will not guarantee atomicity, the
10:     // use of shared memory (i.e. not-cached values), protection from compiler
11:     // optimization, or other required properties.
12:     oldValue = *lockPtr;
13:     *lockPtr = LOCKED;
14:     // End of atomic segment
15:
16:     return oldValue;
17: }

 

  • Test and Test-and-set,用来实现多核环境下相互排斥锁。

1: boolean locked := false // shared lock variable
2: procedure EnterCritical() {
3:   do {
4:     while (locked == true) skip // spin until lock seems free
5:   } while TestAndSet(locked) // actual atomic locking
6: }


2.CAS 在各个平台下的实现

 

2.1 Linux GCC 支持的 CAS


GCC4.1+版本号中支持CAS的原子操作(完整的原子操作可參看 GCC Atomic Builtins)

1: bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
2: type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)


2.2  Windows支持的CAS

在Windows下。你能够使用以下的Windows API来完毕CAS:(完整的Windows原子操作可參看MSDN的InterLocked Functions)


1: InterlockedCompareExchange ( __inout LONG volatile *Target,
2:                                 __in LONG Exchange,
3:                                 __in LONG Comperand);


2.3  C++ 11支持的CAS


C++11中的STL中的atomic类的函数能够让你跨平台。(完整的C++11的原子操作可參看 Atomic Operation Library

1: template< class T >
2: bool atomic_compare_exchange_weak( std::atomic<T>* obj,
3:                                    T* expected, T desired );
4: template< class T >
5: bool atomic_compare_exchange_weak( volatile std::atomic<T>* obj,
6:                                    T* expected, T desired );


  •  
  • 3.CAS原子操作实现无锁的性能分析
  • CAS原子操作实现无锁及性能分析_#include_023.1測试方法描写叙述
  •          这里因为仅仅是比較性能,所以採用非常easy的方式。创建10个线程并发运行,每一个线程中循环对全局变量count进行++操作(i++)。循环加2000000次。这必定会涉及到并发相互排斥操作,在同一台机器上分析 加普通相互排斥锁、CAS实现的无锁、Fetch And Add实现的无锁消耗的时间,然后进行分析。

3.2 加普通相互排斥锁代码

1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <pthread.h>
4: #include <time.h>
5: #include "timer.h"
6:
7: pthread_mutex_t mutex_lock;
8: static volatile int count = 0;
9: void *test_func(void *arg)
10: {
11:         int i = 0;
12:         for(i = 0; i < 2000000; i++)
13:         {
14:                 pthread_mutex_lock(&mutex_lock);
15:                 count++;
16:                 pthread_mutex_unlock(&mutex_lock);
17:         }
18:         return NULL;
19: }
20:
21: int main(int argc, const char *argv[])
22: {
23:     Timer timer; // 为了计时,暂时封装的一个类Timer。
24:     timer.Start();    // 计时開始
25:     pthread_mutex_init(&mutex_lock, NULL);
26:     pthread_t thread_ids[10];
27:     int i = 0;
28:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++)
29:     {
30:         pthread_create(&thread_ids[i], NULL, test_func, NULL);
31:     }
32:
33:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++)
34:     {
35:         pthread_join(thread_ids[i], NULL);
36:     }
37:
38:     timer.Stop();// 计时结束
39:     timer.Cost_time();// 打印花费时间
40:     printf("结果:count = %d\n",count);
41:
42:     return 0;
43: }

注:Timer类仅作统计时间用,事实上如今文章最后给出。

3.2 CAS实现的无锁

1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <pthread.h>
4: #include <unistd.h>
5: #include <time.h>
6: #include "timer.h"
7:
8: int mutex = 0;
9: int lock = 0;
10: int unlock = 1;
11:
12: static volatile int count = 0;
13: void *test_func(void *arg)
14: {
15:         int i = 0;
16:         for(i = 0; i < 2000000; i++)
17:     {
18:         while (!(__sync_bool_compare_and_swap (&mutex,lock, 1) ))usleep(100000);
19:          count++;
20:          __sync_bool_compare_and_swap (&mutex, unlock, 0);
21:         }
22:         return NULL;
23: }
24:
25: int main(int argc, const char *argv[])
26: {
27:     Timer timer;
28:     timer.Start();
29:     pthread_t thread_ids[10];
30:     int i = 0;
31:
32:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++)
33:     {
34:             pthread_create(&thread_ids[i], NULL, test_func, NULL);
35:     }
36:
37:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++)
38:     {
39:             pthread_join(thread_ids[i], NULL);
40:     }
41:
42:     timer.Stop();
43:     timer.Cost_time();
44:     printf("结果:count = %d\n",count);
45:
46:     return 0;
47: }
48:

3.4 Fetch And Add 原子操作

1: #include <stdio.h>
2: #include <stdlib.h>
3: #include <pthread.h>
4: #include <unistd.h>
5: #include <time.h>
6: #include "timer.h"
7:
8: static volatile int count = 0;
9: void *test_func(void *arg)
10: {
11:         int i = 0;
12:         for(i = 0; i < 2000000; i++)
13:         {
14:             __sync_fetch_and_add(&count, 1);
15:         }
16:         return NULL;
17: }
18:
19: int main(int argc, const char *argv[])
20: {
21:     Timer timer;
22:     timer.Start();
23:     pthread_t thread_ids[10];
24:     int i = 0;
25:
26:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++){
27:             pthread_create(&thread_ids[i], NULL, test_func, NULL);
28:     }
29:
30:     for(i = 0; i < sizeof(thread_ids)/sizeof(pthread_t); i++){
31:             pthread_join(thread_ids[i], NULL);
32:     }
33:
34:     timer.Stop();
35:     timer.Cost_time();
36:     printf("结果:count = %d\n",count);
37:     return 0;
38: }
39:

4 实验结果和分析

在同一台机器上,各执行以上3份代码10次,并统计平均值。其结果例如以下:(单位微秒)

CAS原子操作实现无锁及性能分析_#include_03

由此可见。无锁操作在性能上远远优于加锁操作,消耗时间仅为加锁操作的1/3左右,无锁编程方式确实能够比传统加锁方式效率高,经上面測试能够发现,能够快到3倍左右。所以在极力推荐在高并发程序中採用无锁编程的方式能够进一步提高程序效率。

5.时间统计类Timer

timer.h

1: #ifndef TIMER_H
2: #define TIMER_H
3:
4: #include <sys/time.h>
5: class Timer
6: {
7: public:
8:     Timer();
9:     // 開始计时时间
10:     void Start();
11:     // 终止计时时间
12:     void Stop();
13:     // 又一次设定
14:     void Reset();
15:     // 耗时时间
16:     void Cost_time();
17: private:
18:     struct timeval t1;
19:     struct timeval t2;
20:     bool b1,b2;
21: };
22: #endif



timer.cpp

1: #include "timer.h"
2: #include <stdio.h>
3:
4: Timer::Timer()
5: {
6:     b1 = false;
7:     b2 = false;
8: }
9: void Timer::Start()
10: {
11:     gettimeofday(&t1,NULL);
12:     b1 = true;
13:     b2 = false;
14: }
15:
16: void Timer::Stop()
17: {
18:     if (b1 == true)
19:     {
20:         gettimeofday(&t2,NULL);
21:         b2 = true;
22:     }
23: }
24:
25: void Timer::Reset()
26: {
27:     b1 = false;
28:     b2 = false;
29: }
30:
31: void Timer::Cost_time()
32: {
33:     if (b1 == false)
34:     {
35:         printf("计时出错,应该先运行Start()。然后运行Stop(),再来运行Cost_time()");
36:         return ;
37:     }
38:     else if (b2 == false)
39:     {
40:         printf("计时出错,应该运行完Stop()。再来运行Cost_time()");
41:         return ;
42:     }
43:     else
44:     {
45:         int usec,sec;
46:         bool borrow = false;
47:         if (t2.tv_usec > t1.tv_usec)
48:         {
49:             usec = t2.tv_usec - t1.tv_usec;
50:         }
51:         else
52:         {
53:             borrow = true;
54:             usec = t2.tv_usec+1000000 - t1.tv_usec;
55:         }
56:
57:         if (borrow)
58:         {
59:             sec = t2.tv_sec-1 - t1.tv_sec;
60:         }
61:         else
62:         {
63:             sec = t2.tv_sec - t1.tv_sec;
64:         }
65:         printf("花费时间:%d秒 %d微秒\n",sec,usec);
66:     }
67: }
68: