一,开启Eureka Server
我们要开启Eureka Server首先需要在启动类上面加@EnableEurekaServer
注解
@Target(ElementType.TYPE)
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Import(EurekaServerMarkerConfiguration.class)
public @interface EnableEurekaServer {
}
可以看到@EnableEurekaServer
注解目的是导入EurekaServerMarkerConfiguration
类,继续看此类:
@Configuration(proxyBeanMethods = false)
public class EurekaServerMarkerConfiguration {
@Bean
public Marker eurekaServerMarkerBean() {
return new Marker();
}
class Marker {
}
}
EurekaServerMarkerConfiguration
类只是向spring容器中注入一个标记类Marker
,这里先记着,下面马上就要用到,继续来看eureka server包下面的META-INF/spring.factories
文件:
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
org.springframework.cloud.netflix.eureka.server.EurekaServerAutoConfiguration
启动时会自动加载:EurekaServerAutoConfiguration
来向spring容器中添加eureka-server相关功能的bean。
@Configuration(proxyBeanMethods = false)
@Import(EurekaServerInitializerConfiguration.class)
@ConditionalOnBean(EurekaServerMarkerConfiguration.Marker.class)
@EnableConfigurationProperties({ EurekaDashboardProperties.class,
InstanceRegistryProperties.class })
@PropertySource("classpath:/eureka/server.properties")
public class EurekaServerAutoConfiguration implements WebMvcConfigurer {
// 省略部分代码
// 加载EurekaController,SpringCloud 提供了一些额外的接口,用来获取eurekaServer的信息
@Bean
@ConditionalOnProperty(prefix = "eureka.dashboard", name = "enabled",
matchIfMissing = true)
public EurekaController eurekaController() {
return new EurekaController(this.applicationInfoManager);
}
// 接收客户端的注册等请求就是通过InstanceRegistry来处理的,是真正处理业务的类
@Bean
public PeerAwareInstanceRegistry peerAwareInstanceRegistry(
ServerCodecs serverCodecs) {
this.eurekaClient.getApplications(); // force initialization
return new InstanceRegistry(this.eurekaServerConfig, this.eurekaClientConfig,
serverCodecs, this.eurekaClient,
this.instanceRegistryProperties.getExpectedNumberOfClientsSendingRenews(),
this.instanceRegistryProperties.getDefaultOpenForTrafficCount());
}
// 配置服务节点信息,这里的作用主要是为了配置Eureka的peer节点,也就是说当有收到有节点注册上来的时候,需要通知给哪些节点
@Bean
@ConditionalOnMissingBean
public PeerEurekaNodes peerEurekaNodes(PeerAwareInstanceRegistry registry,
ServerCodecs serverCodecs,
ReplicationClientAdditionalFilters replicationClientAdditionalFilters) {
return new RefreshablePeerEurekaNodes(registry, this.eurekaServerConfig,
this.eurekaClientConfig, serverCodecs, this.applicationInfoManager,
replicationClientAdditionalFilters);
}
// EurekaServer的上下文
@Bean
@ConditionalOnMissingBean
public EurekaServerContext eurekaServerContext(ServerCodecs serverCodecs,
PeerAwareInstanceRegistry registry, PeerEurekaNodes peerEurekaNodes) {
return new DefaultEurekaServerContext(this.eurekaServerConfig, serverCodecs,
registry, peerEurekaNodes, this.applicationInfoManager);
}
// 初始化Eureka-server,会同步其他注册中心的数据到当前注册中心
@Bean
public EurekaServerBootstrap eurekaServerBootstrap(PeerAwareInstanceRegistry registry,
EurekaServerContext serverContext) {
return new EurekaServerBootstrap(this.applicationInfoManager,
this.eurekaClientConfig, this.eurekaServerConfig, registry,
serverContext);
}
// eureka-server使用了Jersey实现 对外的 restFull接口
@Bean
public FilterRegistrationBean<?> jerseyFilterRegistration(
javax.ws.rs.core.Application eurekaJerseyApp) {
FilterRegistrationBean<Filter> bean = new FilterRegistrationBean<Filter>();
bean.setFilter(new ServletContainer(eurekaJerseyApp));
bean.setOrder(Ordered.LOWEST_PRECEDENCE);
bean.setUrlPatterns(
Collections.singletonList(EurekaConstants.DEFAULT_PREFIX + "/*"));
return bean;
}
// 添加一些过滤器,类似于过滤请求地址,Path类似于@RequestMapping,Provider类似于@Controller
@Bean
public javax.ws.rs.core.Application jerseyApplication(Environment environment,
ResourceLoader resourceLoader) {
ClassPathScanningCandidateComponentProvider provider = new ClassPathScanningCandidateComponentProvider(
false, environment);
// Filter to include only classes that have a particular annotation.
//
provider.addIncludeFilter(new AnnotationTypeFilter(Path.class));
provider.addIncludeFilter(new AnnotationTypeFilter(Provider.class));
//省略部分代码
}
//省略部分代码
通过@ConditionalOnBean(EurekaServerMarkerConfiguration.Marker.class)
可以看到EurekaServerAutoConfiguration
被注册为Spring Bean
的前提是在Spring
容器中存在EurekaServerMarkerConfiguration.Marker.class
的对象,而这个对象正是我们上面通过@EnableEurekaServer注解导入的。
二,启动Eureka Server
EurekaServerAutoConfiguration
类上面有注解@Import(EurekaServerInitializerConfiguration.class)
导入了EurekaServerInitializerConfiguration
类:
@Configuration(proxyBeanMethods = false)
public class EurekaServerInitializerConfiguration
implements ServletContextAware, SmartLifecycle, Ordered {
// 此处省略部分代码
@Override
public void start() {
new Thread(() -> {
try {
// 初始化EurekaServer,同时启动Eureka Server ,后面着重讲这里
eurekaServerBootstrap.contextInitialized(
EurekaServerInitializerConfiguration.this.servletContext);
log.info("Started Eureka Server");
// 告诉client,可以来注册了
publish(new EurekaRegistryAvailableEvent(getEurekaServerConfig()));
// 设置启动的状态为true
EurekaServerInitializerConfiguration.this.running = true;
publish(new EurekaServerStartedEvent(getEurekaServerConfig()));
}
catch (Exception ex) {
// Help!
log.error("Could not initialize Eureka servlet context", ex);
}
}).start();
}
//此处省略部分代码
}
此类实现了SmartLifecycle
接口,也就是说在spring容器启动完成之后会回调到start()
方法,开启一个线程来完成启动Eureka Server。接下来走进eurekaServerBootstrap.contextInitialized(EurekaServerInitializerConfiguration.this.servletContext);
方法:
public class EurekaServerBootstrap {
// 省略部分代码
public void contextInitialized(ServletContext context) {
try {
// 初始化Eureka的环境变量
initEurekaEnvironment();
// 初始化Eureka的上下文
initEurekaServerContext();
context.setAttribute(EurekaServerContext.class.getName(), this.serverContext);
}
catch (Throwable e) {
log.error("Cannot bootstrap eureka server :", e);
throw new RuntimeException("Cannot bootstrap eureka server :", e);
}
}
protected void initEurekaServerContext() throws Exception {
// For backward compatibility
JsonXStream.getInstance().registerConverter(new V1AwareInstanceInfoConverter(),
XStream.PRIORITY_VERY_HIGH);
XmlXStream.getInstance().registerConverter(new V1AwareInstanceInfoConverter(),
XStream.PRIORITY_VERY_HIGH);
if (isAws(this.applicationInfoManager.getInfo())) {
this.awsBinder = new AwsBinderDelegate(this.eurekaServerConfig,
this.eurekaClientConfig, this.registry, this.applicationInfoManager);
this.awsBinder.start();
}
EurekaServerContextHolder.initialize(this.serverContext);
log.info("Initialized server context");
// 从相邻的eureka 节点复制注册表,集群同步
int registryCount = this.registry.syncUp();
// 默认每30秒发送心跳,1分钟就是2次
// 修改eureka状态为up
// 同时,这里面会开启一个定时任务,用于清理 60秒没有心跳的客户端,自动剔除。
this.registry.openForTraffic(this.applicationInfoManager, registryCount);
// Register all monitoring statistics.
EurekaMonitors.registerAllStats();
}
// 省略部分代码
}
上面两个核心步骤:集群同步和服务剔除,后面我们会详细分析。
三,服务实例注册表
Eureka Server是围绕注册表管理的。有两个InstanceRegistry
。
-
com.netflix.eureka.registry.InstanceRegistry
是euraka server中注册表管理的核心接口。职责是在内存中管理注册到Eureka Server中的服务实例信息。实现类有PeerAwareInstanceRegistryImpl
。 -
org.springframework.cloud.netflix.eureka.server.InstanceRegistry
对PeerAwareInstanceRegistryImpl
进行了继承和扩展,使其适配Spring cloud的使用环境,主要的实现由PeerAwareInstanceRegistryImpl
提供。 -
com.netflix.eureka.registry.InstanceRegistry extends LeaseManager<InstanceInfo>, LookupService<String>
,LeaseManager<InstanceInfo>
是对注册到server中的服务实例租约进行管理。LookupService是提供服务实例的检索查询功能。 -
LeaseManager<InstanceInfo>
接口的作用是对注册到Eureka Server中的服务实例租约进行管理,方法有:服务注册,下线,续约,剔除。此接口管理的类目前是InstanceInfo。InstanceInfo代表服务实例信息。 -
PeerAwareInstanceRegistryImpl
增加了对peer节点的同步复制操作。使得eureka server集群中注册表信息保持一致。
四,接受服务注册
Eureka Client在发起服务注册时会将自身的服务实例元数据封装在InstanceInfo
中,然后将InstanceInfo
发送到Eureka Server。Eureka Server在接收到Eureka Client发送的InstanceInfo
后将会尝试将其放到本地注册表中以供其他Eureka Client进行服务发现。
在EurekaServerAutoConfiguration
中定义了 public FilterRegistrationBean jerseyFilterRegistration
,表名了 表明eureka-server使用了Jersey
实现 对外的 restFull
接口。注册一个 Jersey
的 filter
,配置好相应的Filter
和 url
映射。
在com.netflix.eureka.resources
包下,是Eureka Server对于Eureka client的REST
请求的定义。看ApplicationResource
类(这是一类请求,应用类的请求):
@Produces({"application/xml", "application/json"})
public class ApplicationResource {
// 省略部分代码
private final PeerAwareInstanceRegistry registry;
@POST
@Consumes({"application/json", "application/xml"})
public Response addInstance(InstanceInfo info,
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
logger.debug("Registering instance {} (replication={})", info.getId(), isReplication);
// validate that the instanceinfo contains all the necessary required fields
if (isBlank(info.getId())) {
return Response.status(400).entity("Missing instanceId").build();
} else if (isBlank(info.getHostName())) {
return Response.status(400).entity("Missing hostname").build();
} else if (isBlank(info.getIPAddr())) {
return Response.status(400).entity("Missing ip address").build();
} else if (isBlank(info.getAppName())) {
return Response.status(400).entity("Missing appName").build();
} else if (!appName.equals(info.getAppName())) {
return Response.status(400).entity("Mismatched appName, expecting " + appName + " but was " + info.getAppName()).build();
} else if (info.getDataCenterInfo() == null) {
return Response.status(400).entity("Missing dataCenterInfo").build();
} else if (info.getDataCenterInfo().getName() == null) {
return Response.status(400).entity("Missing dataCenterInfo Name").build();
}
// handle cases where clients may be registering with bad DataCenterInfo with missing data
DataCenterInfo dataCenterInfo = info.getDataCenterInfo();
if (dataCenterInfo instanceof UniqueIdentifier) {
String dataCenterInfoId = ((UniqueIdentifier) dataCenterInfo).getId();
if (isBlank(dataCenterInfoId)) {
boolean experimental = "true".equalsIgnoreCase(serverConfig.getExperimental("registration.validation.dataCenterInfoId"));
if (experimental) {
String entity = "DataCenterInfo of type " + dataCenterInfo.getClass() + " must contain a valid id";
return Response.status(400).entity(entity).build();
} else if (dataCenterInfo instanceof AmazonInfo) {
AmazonInfo amazonInfo = (AmazonInfo) dataCenterInfo;
String effectiveId = amazonInfo.get(AmazonInfo.MetaDataKey.instanceId);
if (effectiveId == null) {
amazonInfo.getMetadata().put(AmazonInfo.MetaDataKey.instanceId.getName(), info.getId());
}
} else {
logger.warn("Registering DataCenterInfo of type {} without an appropriate id", dataCenterInfo.getClass());
}
}
}
registry.register(info, "true".equals(isReplication));
return Response.status(204).build(); // 204 to be backwards compatible
}
// 省略部分代码
}
addInstance()
方法用于接受服务注册,进入PeerAwareInstanceRegistry的register
方法:
@Singleton
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {
// 省略部分代码
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
super.register(info, leaseDuration, isReplication);
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
// 省略部分代码
}
调用到父类AbstractInstanceRegistry
的register
方法,跟进去:
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
// 省略部分代码
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock();
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
REGISTER.increment(isReplication);
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
// 省略部分代码
} finally {
read.unlock();
}
}
}
在register
中,服务实例的InstanceInfo
保存在Lease
中,Lease
在AbstractInstanceRegistry
中统一通过ConcurrentHashMap
保存在内存中。在服务注册过程中,会先获取一个读锁,防止其他线程对registry
注册表进行数据操作,避免数据的不一致。然后从resgitry
查询对应的InstanceInfo
租约是否已经存在注册表中,根据appName
划分服务集群,使用InstanceId
唯一标记服务实例。如果租约存在,比较两个租约中的InstanceInfo
的最后更新时间lastDirtyTimestamp
,保留时间戳大的服务实例信息InstanceInfo
。如果租约不存在,意味这是一次全新的服务注册,将会进行自我保护的统计,创建新的租约保存InstanceInfo
。接着将租约放到resgitry注册表中。
之后将进行一系列缓存操作并根据覆盖状态规则设置服务实例的状态,缓存操作包括将InstanceInfo
加入用于统计Eureka Client增量式获取注册表信息的recentlyChangedQueue
和失效responseCache
中对应的缓存。最后设置服务实例租约的上线时间用于计算租约的有效时间,释放读锁并完成服务注册。
五,接受心跳 续租,renew
在Eureka Client完成服务注册之后,它需要定时向Eureka Server发送心跳请求(默认30秒一次),维持自己在Eureka Server中租约的有效性。
看另一类请求com.netflix.eureka.resources.InstanceResource
。
@Produces({"application/xml", "application/json"})
public class InstanceResource {
// 省略部分代码
private final PeerAwareInstanceRegistry registry;
@PUT
public Response renewLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
@QueryParam("overriddenstatus") String overriddenStatus,
@QueryParam("status") String status,
@QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
boolean isFromReplicaNode = "true".equals(isReplication);
boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);
// Not found in the registry, immediately ask for a register
if (!isSuccess) {
logger.warn("Not Found (Renew): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
// Check if we need to sync based on dirty time stamp, the client
// instance might have changed some value
Response response;
if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
// Store the overridden status since the validation found out the node that replicates wins
if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
&& (overriddenStatus != null)
&& !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
&& isFromReplicaNode) {
registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
}
} else {
response = Response.ok().build();
}
logger.debug("Found (Renew): {} - {}; reply status={}", app.getName(), id, response.getStatus());
return response;
}
// 省略部分代码
}
public Response renewLease()
方法。看到一行boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);
点击renew
的实现:
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
//省略部分代码
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
public boolean renew(String appName, String id, boolean isReplication) {
RENEW.increment(isReplication);
// 根据appName获取服务集群的租约集合
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToRenew = null;
if (gMap != null) {
leaseToRenew = gMap.get(id);
}
if (leaseToRenew == null) {
RENEW_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
return false;
} else {
InstanceInfo instanceInfo = leaseToRenew.getHolder();
if (instanceInfo != null) {
// 查看服务实例状态
InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
instanceInfo, leaseToRenew, isReplication);
if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
+ "; re-register required", instanceInfo.getId());
RENEW_NOT_FOUND.increment(isReplication);
return false;
}
if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
logger.info(
"The instance status {} is different from overridden instance status {} for instance {}. "
+ "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
instanceInfo.getOverriddenStatus().name(),
instanceInfo.getId());
instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
}
}
// 统计每分钟续租次数
renewsLastMin.increment();
// 更新租约更新租约
leaseToRenew.renew();
return true;
}
}
// 省略部分代码
}
此方法中不关注InstanceInfo
,仅关注于租约本身以及租约的服务实例状态。如果根据服务实例的appName
和instanceInfoId
查询出服务实例的租约,并且根据#getOverriddenInstanceStatus
方法得到的instanceStatus
不为InstanceStatus.UNKNOWN
,那么更新租约中的有效时间,即更新租约Lease
中的lastUpdateTimestamp
,达到续约的目的;如果租约不存在,那么返回续租失败的结果。
六,服务剔除
如果Eureka Client
在注册后,既没有续约,也没有下线(服务崩溃或者网络异常等原因),那么服务的状态就处于不可知的状态,不能保证能够从该服务实例中获取到回馈,所以需要服务剔除此方法定时清理这些不稳定的服务。
我们上面分析Eureka Server
启动的时候,EurekaServerInitializerConfiguration
类的start()
方法—>EurekaServerBootstrap
类的initEurekaServerContext()
方法—>PeerAwareInstanceRegistryImpl
类的openForTraffic()
方法—>AbstractInstanceRegistry
类的postInit()
方法,可以看到最后又回到了AbstractInstanceRegistry
类里面。
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
protected void postInit() {
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
evictionTaskRef.get().cancel();
}
// 剔除是定时任务,默认60秒执行一次。延时60秒,间隔60秒
evictionTaskRef.set(new EvictionTask());
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(),
serverConfig.getEvictionIntervalTimerInMs());
}
// 定时任务
class EvictionTask extends TimerTask {
private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);
@Override
public void run() {
try {
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
}
// 剔除服务
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
// 判断是否开启自我保护,如果开启自我保护,不剔除。
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
// 紧接着一个大的for循环,便利注册表register,依次判断租约是否过期。一次性获取所有的过期租约。
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
// 获取注册表租约总数
int registrySize = (int) getLocalRegistrySize();
// 计算注册表租约的阈值 (总数乘以 续租百分比),得出要续租的数量
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
// 总数减去要续租的数量,就是理论要剔除的数量
int evictionLimit = registrySize - registrySizeThreshold;
//求 上面理论剔除数量,和过期租约总数的最小值。就是最终要提出的数量。
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
Random random = new Random(System.currentTimeMillis());
for (int i = 0; i < toEvict; i++) {
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
Lease<InstanceInfo> lease = expiredLeases.get(i);
String appName = lease.getHolder().getAppName();
String id = lease.getHolder().getId();
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
//执行剔除
internalCancel(appName, id, false);
}
}
}
}
- 自我保护期间不清除。
- 分批次清除。
- 服务是逐个随机剔除,剔除均匀分布在所有应用中,防止在同一时间内同一服务集群中的服务全部过期被剔除,造成在大量剔除服务时,并在进行自我保护时,促使程序崩溃。
- 剔除服务是个定时任务,用
EvictionTask
执行,默认60秒执行一次,延时60秒执行。定时剔除过期服务。 - 服务剔除将会遍历registry注册表,找出其中所有的过期租约,然后根据配置文件中续租百分比阀值和当前注册表的租约总数量计算出最大允许的剔除租约的数量(当前注册表中租约总数量减去当前注册表租约阀值),分批次剔除过期的服务实例租约。对过期的服务实例租约调用
AbstractInstanceRegistry#internalCancel
服务下线的方法将其从注册表中清除掉。
自我保护机制:
- 自我保护机制主要在
Eureka Client
和Eureka Server
之间存在网络分区的情况下发挥保护作用,在服务器端和客户端都有对应实现。假设在某种特定的情况下(如网络故障),Eureka Client
和Eureka Server
无法进行通信,此时Eureka Client
无法向Eureka Server
发起注册和续约请求,Eureka Server
中就可能因注册表中的服务实例租约出现大量过期而面临被剔除的危险,然而此时的Eureka Client
可能是处于健康状态的(可接受服务访问),如果直接将注册表中大量过期的服务实例租约剔除显然是不合理的。 - 针对这种情况,
Eureka
设计了“自我保护机制”。在Eureka Server
处,如果出现大量的服务实例过期被剔除的现象,那么该Server节点将进入自我保护模式,保护注册表中的信息不再被剔除,在通信稳定后再退出该模式;在Eureka Client
处,如果向Eureka Server
注册失败,将快速超时并尝试与其他的Eureka Server
进行通信。“自我保护机制”的设计大大提高了Eureka
的可用性。
七,服务下线
Eureka Client
在应用销毁时,会向Eureka Server
发送服务下线请求,清除注册表中关于本应用的租约,避免无效的服务调用。在服务剔除的过程中,也是通过服务下线的逻辑完成对单个服务实例过期租约的清除工作。
在InstanceResource
类中
@Produces({"application/xml", "application/json"})
public class InstanceResource {
@DELETE
public Response cancelLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
try {
boolean isSuccess = registry.cancel(app.getName(), id,
"true".equals(isReplication));
if (isSuccess) {
logger.debug("Found (Cancel): {} - {}", app.getName(), id);
return Response.ok().build();
} else {
logger.info("Not Found (Cancel): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
} catch (Throwable e) {
logger.error("Error (cancel): {} - {}", app.getName(), id, e);
return Response.serverError().build();
}
}
}
boolean isSuccess = registry.cancel(app.getName(), id,"true".equals(isReplication));
跟进去,又又进入到了AbstractInstanceRegistry
类中:
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
//省略部分代码
@Override
public boolean cancel(String appName, String id, boolean isReplication) {
return internalCancel(appName, id, isReplication);
}
protected boolean internalCancel(String appName, String id, boolean isReplication) {
try {
// 先获取读锁,防止被其他线程修改
read.lock();
CANCEL.increment(isReplication);
// 根据appName获取服务实力集群。
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToCancel = null;
// 在内存中取消实例 id的服务
if (gMap != null) {
leaseToCancel = gMap.remove(id);
}
// 添加到最近下线服务的统计队列
recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
// 判断leaseToCancel是否为空,租约不存在,返回false
if (leaseToCancel == null) {
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
// 如果存在
// 设置租约下线时间
leaseToCancel.cancel();
// 获取持有租约的服务信息
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
//标记服务实例为instanceInfo.setActionType(ActionType.DELETED);
instanceInfo.setActionType(ActionType.DELETED);
// 添加到租约变更记录队列,用于eureka client的增量拉取注册表信息。
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
}
} finally {
read.unlock();
}
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to cancel it, reduce the number of clients to send renews.
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
updateRenewsPerMinThreshold();
}
}
return true;
}
// 省略部分代码
}
首先通过registry
根据服务名和服务实例id查询关于服务实例的租约Lease
是否存在,统计最近请求下线的服务实例用于Eureka Server
主页展示。如果租约不存在,返回下线失败;如果租约存在,从registry
注册表中移除,设置租约的下线时间,同时在最近租约变更记录队列中添加新的下线记录,以用于Eureka Client
的增量式获取注册表信息。
八,集群同步
如果Eureka Server
是通过集群的方式进行部署,那么为了维护整个集群中Eureka Server
注册表数据的一致性,势必需要一个机制同步Eureka Server
集群中的注册表数据。
Eureka Server
集群同步包含两个部分:
- 一部分是
Eureka Server
在启动过程中从它的peer节点中拉取注册表信息,并将这些服务实例的信息注册到本地注册表中; - 另一部分是
Eureka Server
每次对本地注册表进行操作时,同时会将操作同步到它的peer
节点中,达到集群注册表数据统一的目的。
1,启动拉取别的peer
上面我们说到,在Eureka Server
启动类中:EurekaServerInitializerConfiguration
的start()
方法中—>eurekaServerBootstrap.contextInitialized()
方法—>initEurekaServerContext()
方法
public class EurekaServerBootstrap {
// 省略部分代码
public void contextInitialized(ServletContext context) {
try {
// 初始化Eureka的环境变量
initEurekaEnvironment();
// 初始化Eureka的上下文
initEurekaServerContext();
context.setAttribute(EurekaServerContext.class.getName(), this.serverContext);
}
catch (Throwable e) {
log.error("Cannot bootstrap eureka server :", e);
throw new RuntimeException("Cannot bootstrap eureka server :", e);
}
}
protected void initEurekaServerContext() throws Exception {
// For backward compatibility
JsonXStream.getInstance().registerConverter(new V1AwareInstanceInfoConverter(),
XStream.PRIORITY_VERY_HIGH);
XmlXStream.getInstance().registerConverter(new V1AwareInstanceInfoConverter(),
XStream.PRIORITY_VERY_HIGH);
if (isAws(this.applicationInfoManager.getInfo())) {
this.awsBinder = new AwsBinderDelegate(this.eurekaServerConfig,
this.eurekaClientConfig, this.registry, this.applicationInfoManager);
this.awsBinder.start();
}
EurekaServerContextHolder.initialize(this.serverContext);
log.info("Initialized server context");
// 从相邻的eureka 节点复制注册表,集群同步
int registryCount = this.registry.syncUp();
// 默认每30秒发送心跳,1分钟就是2次
// 修改eureka状态为up
// 同时,这里面会开启一个定时任务,用于清理 60秒没有心跳的客户端,自动剔除。
this.registry.openForTraffic(this.applicationInfoManager, registryCount);
// Register all monitoring statistics.
EurekaMonitors.registerAllStats();
}
// 省略部分代码
}
int registryCount = this.registry.syncUp();
集群同步,然后再跟进去:
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry{
@Override
public int syncUp() {
// Copy entire entry from neighboring DS node
int count = 0;
// 意思是,如果是i第一次进来,为0,不够等待的代码,直接执行下面的拉取服务实例。
for (int i = 0; ((i < serverConfig.getRegistrySyncRetries()) && (count == 0)); i++) {
if (i > 0) {
try {
Thread.sleep(serverConfig.getRegistrySyncRetryWaitMs());
} catch (InterruptedException e) {
logger.warn("Interrupted during registry transfer..");
break;
}
}
// 将自己作为一个eureka client,拉取注册表。
Applications apps = eurekaClient.getApplications();
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
try {
if (isRegisterable(instance)) {
// 注册到自身的注册表中。
register(instance, instance.getLeaseInfo().getDurationInSecs(), true);
count++;
}
} catch (Throwable t) {
logger.error("During DS init copy", t);
}
}
}
}
return count;
}
}
Eureka Server
也是一个Eureka Client
,在启动的时候也会进行DiscoveryClient
的初始化,会从其对应的Eureka Server
中拉取全量的注册表信息。在Eureka Server
集群部署的情况下,Eureka Server
从它的peer
节点中拉取到注册表信息后,将遍历这个Applications
,将所有的服务实例通过AbstractRegistry#register
方法注册到自身注册表中。
int registryCount = this.registry.syncUp();// 集群同步
this.registry.openForTraffic(this.applicationInfoManager, registryCount);// 定时服务剔除
@Override
public void openForTraffic(ApplicationInfoManager applicationInfoManager, int count) {
// 初始化期望client发送过来的服务数量,即上面获取到的服务数量
this.expectedNumberOfClientsSendingRenews = count;
//计算自我保护的统计参数
updateRenewsPerMinThreshold();
logger.info("Got {} instances from neighboring DS node", count);
logger.info("Renew threshold is: {}", numberOfRenewsPerMinThreshold);
this.startupTime = System.currentTimeMillis();
// 如果count=0,没有拉取到注册表信息,将此值设为true,表示其他peer来取空的实例信息,意味着,将不允许client从此server获取注册表信息。如果count>0,将此值设置为false,允许client来获取注册表。
if (count > 0) {
this.peerInstancesTransferEmptyOnStartup = false;
}
DataCenterInfo.Name selfName = applicationInfoManager.getInfo().getDataCenterInfo().getName();
boolean isAws = Name.Amazon == selfName;
if (isAws && serverConfig.shouldPrimeAwsReplicaConnections()) {
logger.info("Priming AWS connections for all replicas..");
primeAwsReplicas(applicationInfoManager);
}
logger.info("Changing status to UP");
// 服务置为上线
applicationInfoManager.setInstanceStatus(InstanceStatus.UP);
// 开启剔除的定时任务
super.postInit();
}
protected void updateRenewsPerMinThreshold() {
this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
* (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
* serverConfig.getRenewalPercentThreshold());
}
当执行完上面的syncUp
逻辑后,在下面的openForTraffic
,开启此server接受别的client注册,拉取注册表等操作。而在它首次拉取其他peer
节点时,是不允许client
的通信请求的。
当Server
的状态不为UP
时,将拒绝所有的请求。在Client
请求获取注册表信息时,Server
会判断此时是否允许获取注册表中的信息。上述做法是为了避免Eureka Server
在#syncUp
方法中没有获取到任何服务实例信息时(Eureka Server
集群部署的情况下),Eureka Server
注册表中的信息影响到Eureka Client
缓存的注册表中的信息。因为是全量同步,如果server
什么也没同步过来,会导致client
清空注册表。导致服务调用出问题。
2,Server之间注册表信息的同步复制
为了保证Eureka Server集群运行时注册表信息的一致性,每个Eureka Server在对本地注册表进行管理操作时,会将相应的操作同步到所有peer节点中。
在外部调用server的restful
方法时,在com.netflix.eureka.resources
包下的ApplicationResource
资源中,查看每个服务的操作。比如服务注册public Response addInstance()
方法,在PeerAwareInstanceRegistryImpl
类中,看其他操作,cancel
,renew
等中都有replicateToPeers()
方法:
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {
// 省略部分代码
// 下线
@Override
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
if (super.cancel(appName, id, isReplication)) {
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}
// 注册
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
super.register(info, leaseDuration, isReplication);
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
// 续约
public boolean renew(final String appName, final String id, final boolean isReplication) {
if (super.renew(appName, id, isReplication)) {
replicateToPeers(Action.Heartbeat, appName, id, null, null, isReplication);
return true;
}
return false;
}
// 省略部分代码
}
都有replicateToPeers()
方法,它将遍历Eureka Server中peer
节点,向每个peer
节点发送同步请求。
//它将遍历Eureka Server中peer节点,向每个peer节点发送同步请求
private void replicateToPeers(Action action, String appName, String id,
InstanceInfo info /* optional */,
InstanceStatus newStatus /* optional */, boolean isReplication) {
Stopwatch tracer = action.getTimer().start();
try {
if (isReplication) {
numberOfReplicationsLastMin.increment();
}
// If it is a replication already, do not replicate again as this will create a poison replication
if (peerEurekaNodes == Collections.EMPTY_LIST || isReplication) {
return;
}
for (final PeerEurekaNode node : peerEurekaNodes.getPeerEurekaNodes()) {
// If the url represents this host, do not replicate to yourself.
if (peerEurekaNodes.isThisMyUrl(node.getServiceUrl())) {
continue;
}
replicateInstanceActionsToPeers(action, appName, id, info, newStatus, node);
}
} finally {
tracer.stop();
}
}
private void replicateInstanceActionsToPeers(Action action, String appName,
String id, InstanceInfo info, InstanceStatus newStatus,
PeerEurekaNode node) {
try {
InstanceInfo infoFromRegistry;
CurrentRequestVersion.set(Version.V2);
switch (action) {
case Cancel:
node.cancel(appName, id);
break;
case Heartbeat:
InstanceStatus overriddenStatus = overriddenInstanceStatusMap.get(id);
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.heartbeat(appName, id, infoFromRegistry, overriddenStatus, false);
break;
case Register:
node.register(info);
break;
case StatusUpdate:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.statusUpdate(appName, id, newStatus, infoFromRegistry);
break;
case DeleteStatusOverride:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.deleteStatusOverride(appName, id, infoFromRegistry);
break;
}
} catch (Throwable t) {
logger.error("Cannot replicate information to {} for action {}", node.getServiceUrl(), action.name(), t);
} finally {
CurrentRequestVersion.remove();
}
}
此replicateInstanceActionsToPeers
方法中,类PeerEurekaNode
的实例node
的各种方法,cancel
,register
,等,用了batchingDispatcher.process()
,作用是将同一时间段内,相同服务实例的相同操作将使用相同的任务编号,在进行同步复制的时候,将根据任务编号合并操作,减少同步操作的数量和网络消耗,但是同时也造成了同步复制的延时性,不满足CAP中的C(强一致性)。
所以Eureka,只满足AP。
通过Eureka Server在启动过程中初始化本地注册表信息和Eureka Server集群间的同步复制操作,最终达到了集群中Eureka Server注册表信息一致的目的。
九,获取注册表中服务实例信息
Eureka Server中获取注册表的服务实例信息主要通过两个方法实现:
-
AbstractInstanceRegistry#getApplicationsFromMultipleRegions
从多地区获取全量注册表数据。 -
AbstractInstanceRegistry#getApplicationDeltasFromMultipleRegions
从多地区获取增量式注册表数据。
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
// 省略部分源码
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
// 从多地区获取全量注册表数据
public Applications getApplicationsFromMultipleRegions(String[] remoteRegions) {
boolean includeRemoteRegion = null != remoteRegions && remoteRegions.length != 0;
logger.debug("Fetching applications registry with remote regions: {}, Regions argument {}",
includeRemoteRegion, remoteRegions);
if (includeRemoteRegion) {
GET_ALL_WITH_REMOTE_REGIONS_CACHE_MISS.increment();
} else {
GET_ALL_CACHE_MISS.increment();
}
Applications apps = new Applications();
apps.setVersion(1L);
for (Entry<String, Map<String, Lease<InstanceInfo>>> entry : registry.entrySet()) {
Application app = null;
if (entry.getValue() != null) {
for (Entry<String, Lease<InstanceInfo>> stringLeaseEntry : entry.getValue().entrySet()) {
Lease<InstanceInfo> lease = stringLeaseEntry.getValue();
if (app == null) {
app = new Application(lease.getHolder().getAppName());
}
app.addInstance(decorateInstanceInfo(lease));
}
}
if (app != null) {
apps.addApplication(app);
}
}
if (includeRemoteRegion) {
for (String remoteRegion : remoteRegions) {
RemoteRegionRegistry remoteRegistry = regionNameVSRemoteRegistry.get(remoteRegion);
if (null != remoteRegistry) {
Applications remoteApps = remoteRegistry.getApplications();
for (Application application : remoteApps.getRegisteredApplications()) {
if (shouldFetchFromRemoteRegistry(application.getName(), remoteRegion)) {
logger.info("Application {} fetched from the remote region {}",
application.getName(), remoteRegion);
Application appInstanceTillNow = apps.getRegisteredApplications(application.getName());
if (appInstanceTillNow == null) {
appInstanceTillNow = new Application(application.getName());
apps.addApplication(appInstanceTillNow);
}
for (InstanceInfo instanceInfo : application.getInstances()) {
appInstanceTillNow.addInstance(instanceInfo);
}
} else {
logger.debug("Application {} not fetched from the remote region {} as there exists a "
+ "whitelist and this app is not in the whitelist.",
application.getName(), remoteRegion);
}
}
} else {
logger.warn("No remote registry available for the remote region {}", remoteRegion);
}
}
}
apps.setAppsHashCode(apps.getReconcileHashCode());
return apps;
}
// 从多地区获取增量式注册表数据
public Applications getApplicationDeltasFromMultipleRegions(String[] remoteRegions) {
if (null == remoteRegions) {
remoteRegions = allKnownRemoteRegions; // null means all remote regions.
}
boolean includeRemoteRegion = remoteRegions.length != 0;
if (includeRemoteRegion) {
GET_ALL_WITH_REMOTE_REGIONS_CACHE_MISS_DELTA.increment();
} else {
GET_ALL_CACHE_MISS_DELTA.increment();
}
Applications apps = new Applications();
apps.setVersion(responseCache.getVersionDeltaWithRegions().get());
Map<String, Application> applicationInstancesMap = new HashMap<String, Application>();
try {
write.lock();
Iterator<RecentlyChangedItem> iter = this.recentlyChangedQueue.iterator();
logger.debug("The number of elements in the delta queue is :{}", this.recentlyChangedQueue.size());
while (iter.hasNext()) {
Lease<InstanceInfo> lease = iter.next().getLeaseInfo();
InstanceInfo instanceInfo = lease.getHolder();
logger.debug("The instance id {} is found with status {} and actiontype {}",
instanceInfo.getId(), instanceInfo.getStatus().name(), instanceInfo.getActionType().name());
Application app = applicationInstancesMap.get(instanceInfo.getAppName());
if (app == null) {
app = new Application(instanceInfo.getAppName());
applicationInstancesMap.put(instanceInfo.getAppName(), app);
apps.addApplication(app);
}
app.addInstance(new InstanceInfo(decorateInstanceInfo(lease)));
}
if (includeRemoteRegion) {
for (String remoteRegion : remoteRegions) {
RemoteRegionRegistry remoteRegistry = regionNameVSRemoteRegistry.get(remoteRegion);
if (null != remoteRegistry) {
Applications remoteAppsDelta = remoteRegistry.getApplicationDeltas();
if (null != remoteAppsDelta) {
for (Application application : remoteAppsDelta.getRegisteredApplications()) {
if (shouldFetchFromRemoteRegistry(application.getName(), remoteRegion)) {
Application appInstanceTillNow =
apps.getRegisteredApplications(application.getName());
if (appInstanceTillNow == null) {
appInstanceTillNow = new Application(application.getName());
apps.addApplication(appInstanceTillNow);
}
for (InstanceInfo instanceInfo : application.getInstances()) {
appInstanceTillNow.addInstance(new InstanceInfo(instanceInfo));
}
}
}
}
}
}
}
Applications allApps = getApplicationsFromMultipleRegions(remoteRegions);
apps.setAppsHashCode(allApps.getReconcileHashCode());
return apps;
} finally {
write.unlock();
}
}
// 省略部分源码
}
- 全量:上面讲到从节点复制注册信息的时候,用方法
public int syncUp()
,一行Applications apps = eurekaClient.getApplications();
点进去实现类,有一行getApplicationsFromAllRemoteRegions();
下面getApplicationsFromMultipleRegions
,作用从多个地区中获取全量注册表信息,并封装成Applications
返回,它首先会将本地注册表registry
中的所有服务实例信息提取出来封装到Applications
中,再根据是否需要拉取Region
的注册信息,将远程拉取过来的Application
放到上面的Applications
中。最后得到一个全量的Applications
。 - 增量:在前面提到接受服务注册,接受心跳等方法中,都有
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
作用是将新变动的服务放到最近变化的服务实例信息队列中,用于记录增量是注册表信息。getApplicationDeltasFromMultipleRegions
,实现了从远处eureka server中获取增量式注册表信息的能力。在EurekaServer对外restful中,在com.netflix.eureka.resources
下,
@GET
public Response getApplication(@PathParam(“version”) String version,
@HeaderParam(“Accept”) final String acceptHeader,
@HeaderParam(EurekaAccept.HTTP_X_EUREKA_ACCEPT) String eurekaAccept) {
其中有一句:String payLoad = responseCache.get(cacheKey);
在responseCache
初始化的时候,它的构造方法ResponseCacheImpl(EurekaServerConfig serverConfig, ServerCodecs serverCodecs, AbstractInstanceRegistry registry) {中,Value value = generatePayload(key);
点进去有一句:registry.getApplicationDeltasFromMultipleRegions(key.getRegions()));
从远程获取delta增量注册信息。但是这个只是向client提供,不向server提供,因为server可以通过每次变更自动同步到peer。
获取增量式注册表信息将会从recentlyChangedQueue
中获取最近变化的服务实例信息。recentlyChangedQueue
中统计了近3分钟内进行注册、修改和剔除的服务实例信息,在服务注册AbstractInstanceRegistry#registry
、接受心跳请求AbstractInstanceRegistry#renew
和服务下线AbstractInstanceRegistry#internalCancel
等方法中均可见到recentlyChangedQueue
对这些服务实例进行登记,用于记录增量式注册表信息。#getApplicationsFromMultipleRegions
方法同样提供了从远程Region的Eureka Server
获取增量式注册表信息的能力。