最近有个同事问起dubbo中的客户端超时和服务端超时的区别,凭印象跟他简单解释了下,讲的过程中发现忘了不少细节,于是去看了dubbo的这部分源码,并且比较了2.7.3与2.5.x这两个版本的不同。
2.5.x 中的超时机制
请求起点:com.alibaba.dubbo.rpc.protocol.dubbo.DubboInvoker
@Override
protected Result doInvoke(final Invocation invocation) throws Throwable {
RpcInvocation inv = (RpcInvocation) invocation;
final String methodName = RpcUtils.getMethodName(invocation);
inv.setAttachment(Constants.PATH_KEY, getUrl().getPath());
inv.setAttachment(Constants.VERSION_KEY, version);
ExchangeClient currentClient;
if (clients.length == 1) {
currentClient = clients[0];
} else {
currentClient = clients[index.getAndIncrement() % clients.length];
}
try {
boolean isAsync = RpcUtils.isAsync(getUrl(), invocation);
boolean isOneway = RpcUtils.isOneway(getUrl(), invocation);
int timeout = getUrl().getMethodParameter(methodName, Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
if (isOneway) {
boolean isSent = getUrl().getMethodParameter(methodName, Constants.SENT_KEY, false);
currentClient.send(inv, isSent);
RpcContext.getContext().setFuture(null);
return new RpcResult();
} else if (isAsync) {
ResponseFuture future = currentClient.request(inv, timeout);
RpcContext.getContext().setFuture(new FutureAdapter<Object>(future));
return new RpcResult();
} else {
RpcContext.getContext().setFuture(null);
//请求发起,非异步,get阻塞获取response
return (Result) currentClient.request(inv, timeout).get();
}
} catch (TimeoutException e) {
throw new RpcException(RpcException.TIMEOUT_EXCEPTION, "Invoke remote method timeout. method: " + invocation.getMethodName() + ", provider: " + getUrl() + ", cause: " + e.getMessage(), e);
} catch (RemotingException e) {
throw new RpcException(RpcException.NETWORK_EXCEPTION, "Failed to invoke remote method: " + invocation.getMethodName() + ", provider: " + getUrl() + ", cause: " + e.getMessage(), e);
}
}
异步调用先放一边,本文主要讲最常用的同步调用
com.alibaba.dubbo.remoting.exchange.support.DefaultFuture
public Object get() throws RemotingException {
return get(timeout);
}
public Object get(int timeout) throws RemotingException {
if (timeout <= 0) {
timeout = Constants.DEFAULT_TIMEOUT;
}
if (!isDone()) {
long start = System.currentTimeMillis();
lock.lock();
try {
//疑问点☆,while这段代码有没有问题?
while (!isDone()) {
//请求未完成,等待response唤醒,等待时间为超时时间
done.await(timeout, TimeUnit.MILLISECONDS);
//注意点☆,System.currentTimeMillis() - start > timeout代表超时,isDone()下面分析
if (isDone() || System.currentTimeMillis() - start > timeout) {
break;
}
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
} finally {
lock.unlock();
}
if (!isDone()) {
throw new TimeoutException(sent > 0, channel, getTimeoutMessage(false));
}
}
return returnFromResponse();
}
private Object returnFromResponse() throws RemotingException {
Response res = response;
if (res == null) {
throw new IllegalStateException("response cannot be null");
}
if (res.getStatus() == Response.OK) {
return res.getResult();
}
if (res.getStatus() == Response.CLIENT_TIMEOUT || res.getStatus() == Response.SERVER_TIMEOUT) {
throw new TimeoutException(res.getStatus() == Response.SERVER_TIMEOUT, channel, res.getErrorMessage());
}
throw new RemotingException(channel, res.getErrorMessage());
}
注意点:isDone==true,并不代表请求没有超时,可以看到returnFromResponse中仍有处理超时Response的逻辑。这个超时response来自下面的代码:
com.alibaba.dubbo.remoting.exchange.support.DefaultFuture
static {
Thread th = new Thread(new RemotingInvocationTimeoutScan(), "DubboResponseTimeoutScanTimer");
th.setDaemon(true);
th.start();
}
......
private static class RemotingInvocationTimeoutScan implements Runnable {
public void run() {
while (true) {
try {
for (DefaultFuture future : FUTURES.values()) {
if (future == null || future.isDone()) {
continue;
}
if (System.currentTimeMillis() - future.getStartTimestamp() > future.getTimeout()) {
// create exception response.
Response timeoutResponse = new Response(future.getId());
// set timeout status.
timeoutResponse.setStatus(future.isSent() ? Response.SERVER_TIMEOUT : Response.CLIENT_TIMEOUT);
timeoutResponse.setErrorMessage(future.getTimeoutMessage(true));
// handle response.
DefaultFuture.received(future.getChannel(), timeoutResponse);
}
}
Thread.sleep(30);
} catch (Throwable e) {
logger.error("Exception when scan the timeout invocation of remoting.", e);
}
}
}
}
public static void received(Channel channel, Response response) {
try {
DefaultFuture future = FUTURES.remove(response.getId());
if (future != null) {
future.doReceived(response);
} else {
//注意点☆,什么情况会进这个分支?
logger.warn("The timeout response finally returned at "
+ (new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").format(new Date()))
+ ", response " + response
+ (channel == null ? "" : ", channel: " + channel.getLocalAddress()
+ " -> " + channel.getRemoteAddress()));
}
} finally {
CHANNELS.remove(response.getId());
}
}
这里启动时就创建了一个daemon线程不停的去轮询FUTURES集合,如果发现超时的request,就构造一个timeoutResponse调用DefaultFuture.received方法。这里的注意点,为什么会进else分支?其实dubbo的超时基本上可以说是client端的单方面行为,client端觉得请求超时了并去处理超时,server端是还是照常在运行你的request的,甚至他还把这个request的response发回给了client端,此时client端会发现根本没有这个request,因为这个request已经被认为是超时然后清除掉了,所以才进了这个else分支。至于为什么要用"基本"而不是全部说是由client端完成,是因为有下面这个东西:
@Activate(group = Constants.PROVIDER)
public class TimeoutFilter implements Filter {
private static final Logger logger = LoggerFactory.getLogger(TimeoutFilter.class);
public Result invoke(Invoker<?> invoker, Invocation invocation) throws RpcException {
long start = System.currentTimeMillis();
Result result = invoker.invoke(invocation);
long elapsed = System.currentTimeMillis() - start;
if (invoker.getUrl() != null
&& elapsed > invoker.getUrl().getMethodParameter(invocation.getMethodName(),
"timeout", Integer.MAX_VALUE)) {
if (logger.isWarnEnabled()) {
logger.warn("invoke time out. method: " + invocation.getMethodName()
+ " arguments: " + Arrays.toString(invocation.getArguments()) + " , url is "
+ invoker.getUrl() + ", invoke elapsed " + elapsed + " ms.");
}
}
return result;
}
}
你说他啥都不做吧,他还是做了点什么的,server端记了下超时请求的日志,是的,你没看错,就是简单记了下日志,然后后面的流程该怎么走还是怎么走。
再回到上面那个疑问点,while这段代码有什么问题?
while (!isDone()) {
done.await(timeout, TimeUnit.MILLISECONDS);
if (isDone() || System.currentTimeMillis() - start > timeout) {
break;
}
}
//我的看法,直接await即可
done.await(timeout, TimeUnit.MILLISECONDS);
个人觉得这里直接await足够,根本不可能出现第二次while循环,原因在于他跳出这个while的条件:1、被唤醒。2、超时。在await带了相同超时时间的情况下,这里的System.currentTimeMillis() - start > timeout就是一个废条件,只有在isDone==false的时候才会判断System.currentTimeMillis() - start > timeout,而isDone==false必定不存在唤醒,不存在唤醒必定await一个满的超时时间,可以得出System.currentTimeMillis() - start > timeout必为true。
以上就是dubbo 2.5.x中的超时机制,接下来讲2.7.3,也就是目前最新的dubbo中,超时机制有了什么样的变化。
2.7.3 中的超时机制
org.apache.dubbo.remoting.exchange.support.DefaultFuture
//dubbo 2.7.3 DefaultFuture
public class DefaultFuture extends CompletableFuture<Object> {
.......
public static final Timer TIME_OUT_TIMER = new HashedWheelTimer(
new NamedThreadFactory("dubbo-future-timeout", true),
30,
TimeUnit.MILLISECONDS);
......
/**
* check time out of the future
*/
private static void timeoutCheck(DefaultFuture future) {
TimeoutCheckTask task = new TimeoutCheckTask(future.getId());
future.timeoutCheckTask = TIME_OUT_TIMER.newTimeout(task, future.getTimeout(), TimeUnit.MILLISECONDS);
}
......
/**
* init a DefaultFuture
* 1.init a DefaultFuture
* 2.timeout check
*
* @param channel channel
* @param request the request
* @param timeout timeout
* @return a new DefaultFuture
*/
public static DefaultFuture newFuture(Channel channel, Request request, int timeout) {
final DefaultFuture future = new DefaultFuture(channel, request, timeout);
// timeout check
timeoutCheck(future);
return future;
}
2.7.3中用到了一个新的类HashedWheelTimer,来处理超时请求,这是一个基于时间轮算法的延迟任务队列。
2.5.x中的超时设计,是开启了一个守护线程不停轮询所有请求,每次轮询sleep 30ms,这和2.7.3中的HashedWheelTimer相比,轮询的时候会多一些的超时判断,个人认为这里2.5.x的实现会比HashedWheelTimer消耗更多的cpu。这也是为什么使用HashedWheelTimer替换2.5.x中做法的原因。