操作系统（五）CPU进程和线程调度

长期调度：挑一个进程添加到准备队列里等待执行
短期调度：挑一个进程接下来就执行，并且分配到cpu
短期调度：进程可能没执行完，进入等待状态，又或者到达了RR算法的time quantum，虽然没执行完。但暂时不执行，把它转移到磁盘，在内存运行其他进程。等磁盘里的进程准备好了。再接着挪到准备队列执行。

进程分类

I/O-bound process – spends more time doing I/O than computations, many short CPU bursts

• CPU-bound process – spends more time doing computations; few very long CPU bursts

长期调度为了更好的将这二者规划好。

操作系统（五）CPU进程和线程调度_linux

进程执行

分为两部分：

操作系统（五）CPU进程和线程调度_perl_02

调度算法

分类抢占式调度和非抢占式调度

Preemptive Scheduler 依靠中断，某进程还没执行完，就把执行该进程的资源分配给其他进程。此时，执行中的进程由running->ready

非抢占式调度，只有当cpu所执行的进程终止terminate或者由于io原因处于等待wait状态，才把把执行该进程的资源分配给其他进程。

操作系统（五）CPU进程和线程调度_调度_03

调度原则Criteria

操作系统（五）CPU进程和线程调度_多线程_04

First- Come, First-Served (FCFS)

Non-preemptive Scheduling Algorithms 非抢占式
谁先来，执行谁，执行完了再执行下一个进程。

Shortest-Job-First (SJF)

优先执行cpu执行时间最少的进程。
当一个进程结束，看当前准备队列里，哪个进程执行时间最少，执行那个。执行完毕整个进程再执行下一个。

预测进程执行时间

Assumption: Next CPU burst similar to the previous ones

操作系统（五）CPU进程和线程调度_线程_05

Shortest-remaining-time-first

抢占式算法（非抢占式就退化为SJF）
Preemptive Scheduling (scheduling at each clock interrupt)
每个时间点都检查一下当前进程，谁接下来执行的时间最少，就执行谁。

Priority Scheduling

抢占式算法
Preemptive Scheduling (scheduling at each clock interrupt)
每个进程都有一个整数，代表优先级。每个时间点，都去执行优先级高的进程。

Starvation Problem

Starvation Problem – low priority processes may never execute
等待时间长的进程，优先级会变高。

Round Robin 轮循

Each process gets a small unit of CPU time (time quantum q), usually 10-100 milliseconds.

▪ After this time has elapsed, the process is preempted and added to the end of the ready queue.

▪ That is, the scheduling is performed every quantum

操作系统（五）CPU进程和线程调度_线程_06

Quantum - usually 10 ms to 100 ms, context switch - < 10 us
80% of CPU bursts should be shorter than q也就是说，80%的进程一次就执行完毕。切换太多了，不好

多级队列

不同的进程属于不同的队列，不同队列有不同优先级，执行的调度算法也不一样。

操作系统（五）CPU进程和线程调度_调度_07

多级反馈队列

进程可以升级也可以降级

调度算法评估

Deterministic Evaluation

自己设计一些进程数据去执行算法，计算minimum average waiting time

Simulations

把以前实际发生过的进程任务，抽象出来。实际执行

Implementation

在实际的系统，去真正运行一下。
不太可行，成本太高

tutorial

进程共享的内容

操作系统（五）CPU进程和线程调度_perl_08

Linux进程和线程

对于区分进程线程的Windows等，储存进程的数据结构里，有指针，指向它的线程。

操作系统（五）CPU进程和线程调度_线程_09

操作系统（五）CPU进程和线程调度_linux_10

操作系统（五）CPU进程和线程调度_linux_11

操作系统（五）CPU进程和线程调度_调度_12

fork（）进程后，对自身全局变量的修改

执行后：Output at LINE C is 5 as the thread executes in the
same context as the child process.
• Output at LINE P is 0.

#include <stdio.h>
#include <unistd . h>
#include <stdlib.h>
#include <pthread . h>
int value = 0;
void* runner ( void* param) {
value = 5;
pthread_exit ( 0 ) ;
}
int main ( ) {
int pid ;
pthread_t t id ;
pthread_attr_t at t r ;
pid = fork ( ) ;
i f ( pid == 0) {
pthread_attr_ini t (&at t r ) ;
pthread_create(&tid ,&attr , runner ,NULL) ;
pthread_join (tid ,NULL) ;
print f ( "CHILD: v = %d" , value ) ; /* LINE C */
} else if (pid >0) {
wait (NULL) ;
printf ( "PARENT: v = %d" , value ) ; /* LINE P */
}
return EXIT_SUCCESS;
}

这是因为，子进程修改的全局变量，只在子进程内有效。出了if的语句，子进程就结束了。所以，不影响父进程。
是不是也是因为，进程共享的东西少？

谷歌浏览器使用进程执行某站点

操作系统（五）CPU进程和线程调度_linux_13

firefox就是多线程，一个网页崩溃了，就需要重新打开整个应用。

因为操作系统的很多操作都是针对进程的，一个线程崩溃，就直接杀死进程了。

用户进程用户线程以及cpu核心

选择3，用户进程数 > 内核进程 > cpu核心，因为某用户进程block，内核进程可以换另一个执行，保证内核进程不空闲。
内核进程又大于cpu核心，保证核心不空闲。

lab

pthread数量和任务分析

运行代码的命令：

$ gcc -o pt thread-1.c -pthread -Wall

c代码：

#include <pthread.h>
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define NUM_THREADS 20

int perload = 10;
struct timeval t1, t2;
double elapsedTime;

void *PrintHello(void *threadid)
{
    long tid;
    tid = (long)threadid;
    int i;
    int sum;
    
    for (i = 1; i <= perload; i++){
        
        int j;
                for(j=0; j<=100000000; j++){
                    sum = sum + 1;
        }
        //sum = sum + 1;
        //sleep(1);     

    }
    
    // compute and print the elapsed time in millisec
    gettimeofday(&t2, NULL);
    elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0;      // sec to ms
    elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0;   // us to ms
    printf("time = %.2f, thread #%ld!\n",elapsedTime,tid);
    pthread_exit(NULL);
    //exit(EXIT_SUCCESS);
 }

 int main (int argc, char *argv[])
 {
    gettimeofday(&t1, NULL);
    
    pthread_t threads[NUM_THREADS];
    int rc;
    long t;
    for(t=0; t<NUM_THREADS; t++){
       printf("In main: creating thread %ld\n", t);
       rc = pthread_create(&threads[t], NULL, PrintHello, (void *)t);
       if (rc){
          printf("ERROR; return code from pthread_create() is %d\n", rc);
          exit(-1);
       }
    }

    /* Last thing that main() should do */
    pthread_exit(NULL);
    //exit(0); 
}

从中做的操作以及体会：
Task 1. Create thread using pthread
In this lab, source code thread.c is given. Read the source code to understand how threads can be programed. Then compile and run them. If you encounter problem in compiling. This may be caused by missing -pthread.
• Always use -Wall option when compiling and fix all warnings.
• Use command “lscpu” to show how many CPUs are available on the server.
• Note that “NUM_THREADS x perload” is the total workload. Given the same total
workload, test the impact of thread numbers to the performance at the following
settings:
o NUM_THREADS = 1 and perload = 40
o NUM_THREADS = 2 and perload = 20
o NUM_THREADS = 5 and perload = 8
o NUM_THREADS = 40 and perload = 1
What did you oberve? Share your observation with classmates. • The last line of main function is pthread_exit(NULL). Can you change it to exit(EXIT_SUCCESS)? Why and why not? • From the test above, can you infer whether the sleep(1) stops a particular thread or all threads? If a thread calls sleep(1), will it stops the calling thread or stops all threads?

1个CPU
14/week4$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              1
On-line CPU(s) list: 0
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               165
Model name:          Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz
Stepping:            3
CPU MHz:             2904.002
BogoMIPS:            5808.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0

更改参数后的运行结果：

o NUM_THREADS = 1 and perload = 40

In main: creating thread 0
time = 6984.46, thread #0!


o NUM_THREADS = 2 and perload = 20

In main: creating thread 0
In main: creating thread 1
time = 6894.99, thread #1!
time = 6907.52, thread #0!


o NUM_THREADS = 5 and perload = 8

In main: creating thread 0
In main: creating thread 1
In main: creating thread 2
In main: creating thread 3
In main: creating thread 4
time = 6864.62, thread #3!
time = 6881.98, thread #1!
time = 6869.16, thread #4!
time = 6914.61, thread #2!
time = 6927.19, thread #0!


o NUM_THREADS = 20 and perload = 2


In main: creating thread 0
In main: creating thread 1
In main: creating thread 2
In main: creating thread 3
In main: creating thread 4
In main: creating thread 5
In main: creating thread 6
In main: creating thread 7
In main: creating thread 8
In main: creating thread 9
In main: creating thread 10
In main: creating thread 11
In main: creating thread 12
In main: creating thread 13
In main: creating thread 14
In main: creating thread 15
In main: creating thread 16
In main: creating thread 17
In main: creating thread 18
In main: creating thread 19
time = 6822.61, thread #15!
time = 6830.33, thread #1!
time = 6838.26, thread #3!
time = 6842.09, thread #14!
time = 6846.23, thread #16!
time = 6857.05, thread #17!
time = 6858.85, thread #5!
time = 6861.14, thread #8!
time = 6865.64, thread #0!
time = 6876.42, thread #18!
time = 6877.49, thread #4!
time = 6878.35, thread #11!
time = 6879.14, thread #10!
time = 6918.97, thread #19!
time = 6925.51, thread #9!
time = 6925.63, thread #13!
time = 6930.14, thread #6!
time = 6936.85, thread #12!
time = 6948.82, thread #7!
time = 6951.15, thread #2!

o NUM_THREADS = 40 and perload = 1

In main: creating thread 0
In main: creating thread 1
In main: creating thread 2
In main: creating thread 3
In main: creating thread 4
In main: creating thread 5
In main: creating thread 6
In main: creating thread 7
In main: creating thread 8
In main: creating thread 9
In main: creating thread 10
In main: creating thread 11
In main: creating thread 12
In main: creating thread 13
In main: creating thread 14
In main: creating thread 15
In main: creating thread 16
In main: creating thread 17
In main: creating thread 18
In main: creating thread 19
In main: creating thread 20
In main: creating thread 21
In main: creating thread 22
In main: creating thread 23
In main: creating thread 24
In main: creating thread 25
In main: creating thread 26
In main: creating thread 27
In main: creating thread 28
In main: creating thread 29
In main: creating thread 30
In main: creating thread 31
In main: creating thread 32
In main: creating thread 33
In main: creating thread 34
In main: creating thread 35
In main: creating thread 36
In main: creating thread 37
In main: creating thread 38
In main: creating thread 39
time = 6819.54, thread #13!
time = 6823.27, thread #12!
time = 6827.77, thread #6!
time = 6843.04, thread #31!
time = 6851.18, thread #2!
time = 6854.83, thread #3!
time = 6866.48, thread #28!
time = 6870.44, thread #39!
time = 6878.47, thread #7!
time = 6889.58, thread #10!
time = 6893.31, thread #11!
time = 6895.02, thread #21!
time = 6896.41, thread #9!
time = 6897.87, thread #26!
time = 6899.33, thread #38!
time = 6900.50, thread #18!
time = 6901.51, thread #20!
time = 6905.11, thread #23!
time = 6906.15, thread #33!
time = 6907.17, thread #34!
time = 6918.27, thread #24!
time = 6918.98, thread #4!
time = 6919.59, thread #27!
time = 6920.43, thread #19!
time = 6920.83, thread #22!
time = 6921.87, thread #0!
time = 6922.31, thread #30!
time = 6922.78, thread #25!
time = 6923.20, thread #29!
time = 6924.10, thread #8!
time = 6950.16, thread #5!
time = 6981.98, thread #17!
time = 6969.62, thread #35!
time = 6985.82, thread #15!
time = 6962.67, thread #37!
time = 6986.24, thread #14!
time = 6991.06, thread #1!
time = 6993.15, thread #16!
time = 6998.68, thread #32!
time = 7003.77, thread #36!

首先说我的cpu通过lscpu知道有一个核，并且每个核分配一个线程。那么，对于40X1这么大的工作负载来说。将40分到一个线程，和创建40个线程，每线程1个负载，总体的时间是差不多的。
因为我一个核，所以不是并行，只是并发。

此外，在main函数创建阶段一股脑创建完。然后等待，创建了的线程又一股脑的出现。可以推测，可能是通过Round robin来调度的。即在线程1工作10ms，马上下一个线程，轮一遍又回来。所以，虽然40个线程，每个线程只有1负载，但是他们每个线程的执行时间仍然一样，没有比1线程，40负载快。因为他们等待的时间也长。

exit(EXIT_SUCCESS);和pthread_exit(NULL)

我将main函数中的，pthread_exit(NULL)换为exit(EXIT_SUCCESS);
发现程序在创建完毕线程后，马上结束了。没有线程的输出信息。可见，一个强制结束进程，一个结束线程。
拿5 X8举例：
把每个线程里的sleep()注释掉。发现所有线程时间都增加。

In main: creating thread 0
In main: creating thread 1
In main: creating thread 2
In main: creating thread 3
In main: creating thread 4
time = 14594.83, thread #2!
time = 14735.15, thread #0!
time = 14743.94, thread #4!
time = 14744.71, thread #1!
time = 14744.83, thread #3!

open mp

运行命令：

gcc -Wall -g -o omp1 -fopenmp omp_1.c

114/week4$ ./omp2 
I am in

这个检测符#pragma omp parallel会并行运行块内代码，自动生成线程。但是因为我只有一个核，可以看到输出只有一个。说明这个并行也只生成了一个线程。

gcc -Wall -g -o omp2 -fopenmp omp_2.c

114/week4$ ./omp2 
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in parallel region.
I am not in

这个检测符#pragma omp parallel for会将for循环并行执行，每一次i的迭代会被并行执行，而for语句内部的代码是顺序执行。同样，因为1个核，所以其实还是顺序执行的。
需要在多核上试试。

#pragma omp parallel [clauses]
{
   code_block//这段代码并行
}
#pragma omp [parallel] for [clauses]
{
   for_statement//for循环会被并行执行
}
#pragma omp [parallel] sections [clauses]
{
//..可以有些不并行的部分
   #pragma omp section
   {
      code_block  //这里才并行
   } 
}