设置线程池的大小

线程池的理想大小取决于将要提交的任务类型和所部署系统的特性。

为了正确的定制线程池的大小,你需要理解你的计算环境、资源预算和任务的自身特性。部署系统中安装了多少个CPU?多少内存?任务是计算密集型、I/O密集型还是二者皆可?它们是否需要像JDBC Connection这样的稀缺资源?如果你有不同类别的任务,它们拥有差别很大的行为,那么应该考虑使用多个不同的线程池,这样每个线程池可以根据不同任务的工作负载进行调节。

对于计算密集型的任务,一个有N(cpu) 个处理器的系统通常通过使用一个N(cpu) +1个线程的线程池来获得最优的利用率(计算密集型的线程恰好在某时因为发生一个页错误或者因为其他原因而暂停,刚好有一个“额外”的线程,可以确保在这样的情况下CPU周期不会中断工作)。对于包含了I/O和其他阻塞操作的任务,不是所有的线程都会在所有的时间被调度,因此你需要一个更大的池。为了正确地设置线程池的长度,你必须估算出任务花在等待的时间与用来计算的时间的比率;这个估算值不必十分精确,而且可以通过一些监控工具获得。你还可以选择另一种方法来调节线程池的大小,在一个基准负载下,使用不同大小的线程池运行你的应用程序,并观察CPU利用率的水平。

给定下列定义:

Ncpu = CPU的数量
Ucpu = 目标CPU的使用率,0 ≤ Ucpu≤ 1
W/C = 等待时间与计算时间的比率

为保持处理器达到期望的使用率,最优的池的大小等于:

Nthreads = Ncpu * Ucpu * ( 1 + W/C )

可以使用Runtime来获得CPU的数目:

int N_CPUS = Runtime.getRuntime().availableProcessors();

当然,CPU周期并不是唯一影响线程池大小的资源,还包括内存、文件句柄、套接字句柄和数据库连接等。计算这些资源对线程池的约束条件是更容易的:计算每个任务对该资源的需求量,然后用该资源的可用总量除以每个任务的需求量,所得结果就是线程池大小的上限。

最后来一个“Dark Magic”估算方法(因为我暂时还没有搞懂它的原理),使用下面的类:

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.BlockingQueue;

/**
 * A class that calculates the optimal thread pool boundaries. It takes the
 * desired target utilization and the desired work queue memory consumption as
 * input and retuns thread count and work queue capacity.
 *
 * @author Niklas Schlimm
 * @param <T>
 *
 */
public abstract class PoolSizeCalculator {

    /**
     * The sample queue size to calculate the size of a single {@link Runnable}
     * element.
     */
    private final int SAMPLE_QUEUE_SIZE = 1000;

    /**
     * Accuracy of test run. It must finish within 20ms of the testTime
     * otherwise we retry the test. This could be configurable.
     */
    private final int EPSYLON = 20;

    /**
     * Control variable for the CPU time investigation.
     */
    private volatile boolean expired;

    /**
     * Time (millis) of the test run in the CPU time calculation.
     */
    private final long testtime = 3000;

    /**
     * Calculates the boundaries of a thread pool for a given {@link Runnable}.
     *
     * @param targetUtilization
     *            the desired utilization of the CPUs (0 <= targetUtilization <=
     *            * 1) * @param targetQueueSizeBytes * the desired maximum work
     *            queue size of the thread pool (bytes)
     */
    protected void calculateBoundaries(BigDecimal targetUtilization,
            BigDecimal targetQueueSizeBytes) {
        calculateOptimalCapacity(targetQueueSizeBytes);
        Runnable task = creatTask();
        start(task);
        start(task); // warm up phase
        long cputime = getCurrentThreadCPUTime();
        start(task); // test intervall
        cputime = getCurrentThreadCPUTime() - cputime;
        long waittime = (testtime * 1000000) - cputime;
        calculateOptimalThreadCount(cputime, waittime, targetUtilization);
    }

    private void calculateOptimalCapacity(BigDecimal targetQueueSizeBytes) {
        long mem = calculateMemoryUsage();
        BigDecimal queueCapacity = targetQueueSizeBytes.divide(new BigDecimal(
                mem), RoundingMode.HALF_UP);
        System.out.println("Target queue memory usage (bytes): "
                + targetQueueSizeBytes);
        System.out.println("createTask() produced "
                + creatTask().getClass().getName() + " which took " + mem
                + " bytes in a queue");
        System.out.println("Formula: " + targetQueueSizeBytes + " / " + mem);
        System.out.println("* Recommended queue capacity (bytes): "
                + queueCapacity);
    }

    /**
     * * Brian Goetz' optimal thread count formula, see 'Java Concurrency in *
     * Practice' (chapter 8.2) * * @param cpu * cpu time consumed by considered
     * task * @param wait * wait time of considered task * @param
     * targetUtilization * target utilization of the system
     */
    private void calculateOptimalThreadCount(long cpu, long wait,
            BigDecimal targetUtilization) {
        BigDecimal waitTime = new BigDecimal(wait);
        BigDecimal computeTime = new BigDecimal(cpu);
        BigDecimal numberOfCPU = new BigDecimal(Runtime.getRuntime()
                .availableProcessors());
        BigDecimal optimalthreadcount = numberOfCPU.multiply(targetUtilization)
                .multiply(
                        new BigDecimal(1).add(waitTime.divide(computeTime,
                                RoundingMode.HALF_UP)));
        System.out.println("Number of CPU: " + numberOfCPU);
        System.out.println("Target utilization: " + targetUtilization);
        System.out.println("Elapsed time (nanos): " + (testtime * 1000000));
        System.out.println("Compute time (nanos): " + cpu);
        System.out.println("Wait time (nanos): " + wait);
        System.out.println("Formula: " + numberOfCPU + " * "
                + targetUtilization + " * (1 + " + waitTime + " / "
                + computeTime + ")");
        System.out.println("* Optimal thread count: " + optimalthreadcount);
    }

    /**
     * * Runs the {@link Runnable} over a period defined in {@link #testtime}. *
     * Based on Heinz Kabbutz' ideas *
     * (http://www.javaspecialists.eu/archive/Issue124.html). * * @param task *
     * the runnable under investigation
     */
    public void start(Runnable task) {
        long start = 0;
        int runs = 0;
        do {
            if (++runs > 5) {
                throw new IllegalStateException("Test not accurate");
            }
            expired = false;
            start = System.currentTimeMillis();
            Timer timer = new Timer();
            timer.schedule(new TimerTask() {
                public void run() {
                    expired = true;
                }
            }, testtime);
            while (!expired) {
                task.run();
            }
            start = System.currentTimeMillis() - start;
            timer.cancel();
        } while (Math.abs(start - testtime) > EPSYLON);
        collectGarbage(3);
    }

    private void collectGarbage(int times) {
        for (int i = 0; i < times; i++) {
            System.gc();
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            }
        }
    }

    /**
     * Calculates the memory usage of a single element in a work queue. Based on
     * Heinz Kabbutz' ideas
     * (http://www.javaspecialists.eu/archive/Issue029.html).
     *
     * @return memory usage of a single {@link Runnable} element in the thread
     *         pools work queue
     */
    public long calculateMemoryUsage() {
        BlockingQueue<Runnable> queue = createWorkQueue();
        for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {
            queue.add(creatTask());
        }
        long mem0 = Runtime.getRuntime().totalMemory()
                - Runtime.getRuntime().freeMemory();
        long mem1 = Runtime.getRuntime().totalMemory()
                - Runtime.getRuntime().freeMemory();
        queue = null;
        collectGarbage(15);
        mem0 = Runtime.getRuntime().totalMemory()
                - Runtime.getRuntime().freeMemory();
        queue = createWorkQueue();
        for (int i = 0; i < SAMPLE_QUEUE_SIZE; i++) {
            queue.add(creatTask());
        }
        collectGarbage(15);
        mem1 = Runtime.getRuntime().totalMemory()
                - Runtime.getRuntime().freeMemory();
        return (mem1 - mem0) / SAMPLE_QUEUE_SIZE;
    }

    /**
     * Create your runnable task here.
     *
     * @return an instance of your runnable task under investigation
     */
    protected abstract Runnable creatTask();

    /**
     * Return an instance of the queue used in the thread pool.
     *
     * @return queue instance
     */
    protected abstract BlockingQueue<Runnable> createWorkQueue();

    /**
     * Calculate current cpu time. Various frameworks may be used here,
     * depending on the operating system in use. (e.g.
     * http://www.hyperic.com/products/sigar). The more accurate the CPU time
     * measurement, the more accurate the results for thread count boundaries.
     *
     * @return current cpu time of current thread
     */
    protected abstract long getCurrentThreadCPUTime();

}

然后自己继承这个抽象类并实现它的三个抽象方法,比如下面是我写的一个示例(任务是请求网络数据),其中我指定期望CPU利用率为1.0(即100%),任务队列总大小不超过100,000字节:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.lang.management.ManagementFactory;
import java.math.BigDecimal;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;

public class SimplePoolSizeCaculatorImpl extends PoolSizeCalculator {

    @Override
    protected Runnable creatTask() {
        return new AsyncIOTask();
    }

    @Override
    protected BlockingQueue<Runnable> createWorkQueue() {
        return new LinkedBlockingQueue<Runnable>(1000);
    }

    @Override
    protected long getCurrentThreadCPUTime() {
        return ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();
    }

    public static void main(String[] args) {
        PoolSizeCalculator poolSizeCalculator = new SimplePoolSizeCaculatorImpl();
        poolSizeCalculator.calculateBoundaries(new BigDecimal(1.0),
                new BigDecimal(100000));
    }
}

/**
 * 自定义的异步IO任务
 * 
 * @author Will
 *
 */
class AsyncIOTask implements Runnable {

    @Override
    public void run() {
        HttpURLConnection connection = null;
        BufferedReader reader = null;
        try {
            String getURL = "http://www.baidu.com";
            URL getUrl = new URL(getURL);

            connection = (HttpURLConnection) getUrl.openConnection();
            connection.connect();
            reader = new BufferedReader(new InputStreamReader(
                    connection.getInputStream()));

            String line;
            while ((line = reader.readLine()) != null) {
                // empty loop
            }
        }

        catch (IOException e) {

        } finally {
            if (connection != null) {
                connection.disconnect();
            }
            if (reader != null) {
                try {
                    reader.close();
                } catch (Exception e) {

                }
            }
        }
    }
}

得到的输出如下:

Target queue memory usage (bytes): 100000
createTask() produced concurrent.AsyncIOTask which took 40 bytes in a queue
Formula: 100000 / 40
* Recommended queue capacity (bytes): 2500
Number of CPU: 4
Target utilization: 1
Elapsed time (nanos): 3000000000
Compute time (nanos): 374402400
Wait time (nanos): 2625597600
Formula: 4 * 1 * (1 + 2625597600 / 374402400)
* Optimal thread count: 32

推荐的任务队列大小为2500,线程数为32,有点出乎意料之外。我可以如下构造一个线程池:

ThreadPoolExecutor pool = new ThreadPoolExecutor(32, 32, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(2500));