文章目录
- flume-ng 核心代码解析
- org.apache.flume.node.Application#main 解析
- PollingPropertiesFileConfigurationProvider 构造函数
- Application 构造函数
- eventBus.register(application)
- 然后我们继续看下 application.start() 方法,不用想,我们的核心入口就在这个 start 方法里了。
- 接下来我们就看下 org.apache.flume.lifecycle.LifecycleSupervisor#supervise 方法
- monitorRunnable 类
- 继续看 org.apache.flume.node.PollingPropertiesFileConfigurationProvider#start 方法
- FileWatcherRunnable 执行
- org.apache.flume.node.Application#handleConfigurationEvent 处理 MaterializedConfiguration
- org.apache.flume.node.Application#startAllComponents 启动组件
- org.apache.flume.channel.MemoryChannel#start channel 组件启动
- LoggerSink, 托管给 SinkRunner 进行启动,org.apache.flume.SinkRunner#start
- NetCatSource ,托管给了 EventDrivenSourceRunner 进行管理运行,org.apache.flume.source.EventDrivenSourceRunner#start
- 核心逻辑代码到这里就已经分析结束了,这里还是有很多细节没有分析,比如热加载的实现,生命周期的管理等,大家有兴趣的可以自己看下
flume-ng 核心代码解析
org.apache.flume.node.Application#main 解析
这个函数很长,不过核心的逻辑很清晰,这里只看核心的内容
List<LifecycleAware> components = Lists.newArrayList();
// 看下是否被加载过配置,如果没有,需要重新加载,第一次进入肯定没有加载
if (reload) {
EventBus eventBus = new EventBus(agentName + "-event-bus");
// 托管给 application 进行管理
PollingPropertiesFileConfigurationProvider configurationProvider =
new PollingPropertiesFileConfigurationProvider(
agentName, configurationFile, eventBus, 30);
components.add(configurationProvider);
application = new Application(components);
eventBus.register(application);
} else {
PropertiesFileConfigurationProvider configurationProvider =
new PropertiesFileConfigurationProvider(agentName, configurationFile);
application = new Application();
application.handleConfigurationEvent(configurationProvider.getConfiguration());
}
// 这块开启了服务线程,有一些线程在后台一直运行
application.start();
final Application appReference = application;
// 这里做了个 shutdown 的钩子函数,交给了 runtime 进行管理,在监听到了关闭的时候,会把 appReference 进行stop
Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {
@Override
public void run() {
("application 服务停止!");
appReference.stop();
}
});不难看出,flume-ng 使用 application 对应用进行了生命周期的管理,而具体的内容是由 PollingPropertiesFileConfigurationProvider 进行管理
PollingPropertiesFileConfigurationProvider 构造函数
public PollingPropertiesFileConfigurationProvider(String agentName,
File file, EventBus eventBus, int interval) {
super(agentName, file);
this.eventBus = eventBus;
this.file = file;
this.interval = interval;
counterGroup = new CounterGroup();
lifecycleState = LifecycleState.IDLE;
}这里记录下了我们的配置文件,eventBus,间隔时间(给定时任务使用),计数器,生命周期状态,然后我们看下父类 PropertiesFileConfigurationProvider
public PropertiesFileConfigurationProvider(String agentName, File file) {
super(agentName);
this.file = file;
}发现这里没做什么,继续看父类 AbstractConfigurationProvider
public AbstractConfigurationProvider(String agentName) {
super();
this.agentName = agentName;
this.sourceFactory = new DefaultSourceFactory();
this.sinkFactory = new DefaultSinkFactory();
this.channelFactory = new DefaultChannelFactory();
channelCache = new HashMap<Class<? extends Channel>, Map<String, Channel>>();
}记录下我们配置文件的代理名字(自己定义的名字),初始化了三个工厂: DefaultSourceFactory,DefaultSinkFactory,DefaultChannelFactory,以及一个 channelCache。
我们先不对 PollingPropertiesFileConfigurationProvider 具体的职责进行深追,我们先走主逻辑。
Application 构造函数
我们继续看下 Application 的构造函数
public Application(List<LifecycleAware> components) {
this.components = components;
supervisor = new LifecycleSupervisor();
}这里的组件就是我们的 PollingPropertiesFileConfigurationProvider,然后初始化了一个生命周期管理器 LifecycleSupervisor,
我们这里看下 LifecycleSupervisor 的构造函数
public LifecycleSupervisor() {
lifecycleState = LifecycleState.IDLE;
supervisedProcesses = new HashMap<LifecycleAware, Supervisoree>();
monitorFutures = new HashMap<LifecycleAware, ScheduledFuture<?>>();
monitorService = new ScheduledThreadPoolExecutor(10,
new ThreadFactoryBuilder().setNameFormat(
"lifecycleSupervisor-" + Thread.currentThread().getId() + "-%d")
.build());
monitorService.setMaximumPoolSize(20);
monitorService.setKeepAliveTime(30, TimeUnit.SECONDS);
purger = new Purger();
needToPurge = false;
}大致可以看出大致是 维护的生命周期状态 , 监控器 和 管理者执行器。这里先不详细看,先回到我们的主逻辑
eventBus.register(application)
使用 enentBus 进行事件发布通知,其中 application 是它的订阅者。eventBus 又托管给了 PollingPropertiesFileConfigurationProvider,后续估计有相应的操作在里面。
eventBus 不太熟悉的话可以看我 wiki:eventbus源码解析【小明同学】
然后我们继续看下 application.start() 方法,不用想,我们的核心入口就在这个 start 方法里了。
// 启动入口
public void start() {
lifecycleLock.lock();
try {
for (LifecycleAware component : components) {
// 把服务组件交给了生命周期管理者进行监督,
// component 只有一个 new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
supervisor.supervise(component,
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
}
} finally {
lifecycleLock.unlock();
}
}这里主要是把所有的组件交给生命周期管理者进行管理。
接下来我们就看下 org.apache.flume.lifecycle.LifecycleSupervisor#supervise 方法
public synchronized void supervise(LifecycleAware lifecycleAware,
SupervisorPolicy policy, LifecycleState desiredState) {
if (this.monitorService.isShutdown()
|| this.monitorService.isTerminated()
|| this.monitorService.isTerminating()) {
throw new FlumeException("Supervise called on " + lifecycleAware + " " +
"after shutdown has been initiated. " + lifecycleAware + " will not" +
" be started");
}
Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),
"Refusing to supervise " + lifecycleAware + " more than once");
if (logger.isDebugEnabled()) {
logger.debug("Supervising service:{} policy:{} desiredState:{}",
new Object[] { lifecycleAware, policy, desiredState });
}
// 创建了一个主管维护状态
Supervisoree process = new Supervisoree();
process.status = new Status();
process.policy = policy;
process.status.desiredState = desiredState;
process.status.error = false;
MonitorRunnable monitorRunnable = new MonitorRunnable();
monitorRunnable.lifecycleAware = lifecycleAware;
monitorRunnable.supervisoree = process;
monitorRunnable.monitorService = monitorService;
// 这里维护组件和组件状态的关系
supervisedProcesses.put(lifecycleAware, process);
// 把服务组件封装成了 monitorRunnable ,然后托管给定时任务
ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
monitorRunnable, 0, 3, TimeUnit.SECONDS);
// 把定时任务进行管理,在关闭服务的时候还会用到
monitorFutures.put(lifecycleAware, future);
}通过这个可以看出,最后我们把组件托管给了 monitorRunnable,让一个定时任务进行后台运行,默认每隔 3s 执行一次。
monitorRunnable 类
public static class MonitorRunnable implements Runnable {
public ScheduledExecutorService monitorService;
// 初始化 lifecycleAware 是 new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
public LifecycleAware lifecycleAware;
// 初始化 supervisoree 的域成员 new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START
public Supervisoree supervisoree;
// 开辟的线程进行后台运行
@Override
public void run() {
logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
supervisoree);
long now = System.currentTimeMillis();
try {
if (supervisoree.status.firstSeen == null) {
logger.debug("first time seeing {}", lifecycleAware);
supervisoree.status.firstSeen = now;
}
supervisoree.status.lastSeen = now;
synchronized (lifecycleAware) {
if (supervisoree.status.discard) {
// Unsupervise has already been called on this.
("Component has already been stopped {}", lifecycleAware);
return;
} else if (supervisoree.status.error) {
("Component {} is in error state, and Flume will not"
+ "attempt to change its state", lifecycleAware);
return;
}
supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();
if (!lifecycleAware.getLifecycleState().equals(
supervisoree.status.desiredState)) {
logger.debug("Want to transition {} from {} to {} (failures:{})",
new Object[] { lifecycleAware, supervisoree.status.lastSeenState,
supervisoree.status.desiredState,
supervisoree.status.failures });
switch (supervisoree.status.desiredState) {
case START:
try {
// 状态如果是 start,那就会让组件进行start
lifecycleAware.start();
} catch (Throwable e) {
logger.error("Unable to start " + lifecycleAware
+ " - Exception follows.", e);
if (e instanceof Error) {
// This component can never recover, shut it down.
supervisoree.status.desiredState = LifecycleState.STOP;
try {
lifecycleAware.stop();
logger.warn("Component {} stopped, since it could not be"
+ "successfully started due to missing dependencies",
lifecycleAware);
} catch (Throwable e1) {
logger.error("Unsuccessful attempt to "
+ "shutdown component: {} due to missing dependencies."
+ " Please shutdown the agent"
+ "or disable this component, or the agent will be"
+ "in an undefined state.", e1);
supervisoree.status.error = true;
if (e1 instanceof Error) {
throw (Error) e1;
}
// Set the state to stop, so that the conf poller can
// proceed.
}
}
supervisoree.status.failures++;
}
break;
case STOP:
try {
lifecycleAware.stop();
} catch (Throwable e) {
logger.error("Unable to stop " + lifecycleAware
+ " - Exception follows.", e);
if (e instanceof Error) {
throw (Error) e;
}
supervisoree.status.failures++;
}
break;
default:
logger.warn("I refuse to acknowledge {} as a desired state",
supervisoree.status.desiredState);
}
if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
logger.error(
"Policy {} of {} has been violated - supervisor should exit!",
supervisoree.policy, lifecycleAware);
}
}
}
} catch (Throwable t) {
logger.error("Unexpected error", t);
}
logger.debug("Status check complete");
}
}这么一大堆其实核心逻辑就是调用了 lifecycleAware.start() ,这里的 lifecycleAware 就是我们的 PollingPropertiesFileConfigurationProvider
继续看 org.apache.flume.node.PollingPropertiesFileConfigurationProvider#start 方法
/**
* 最终还是自己承担了所有,最终将由自己对配置文件进行解析,与Application进行指责划分,Application用于管理应用(启动,停止),
* PollingPropertiesFileConfigurationProvider用来真正进行文件解析和日志管理
*/
@Override
public void start() {
("Configuration provider starting");
Preconditions.checkState(file != null,
"The parameter file must not be null");
executorService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat("conf-file-poller-%d")
.build());
FileWatcherRunnable fileWatcherRunnable =
new FileWatcherRunnable(file, counterGroup);
// 应该是这个定时任务进行的执行
executorService.scheduleWithFixedDelay(fileWatcherRunnable, 0, interval,
TimeUnit.SECONDS);
lifecycleState = LifecycleState.START;
LOGGER.debug("Configuration provider started");
}这里我们看出,又起来个定时任务进行了 FileWatcherRunnable 的执行,所以接下来我们就需要看 fileWatcherRunnable
FileWatcherRunnable 执行
@Override
public void run() {
LOGGER.debug("Checking file:{} for changes", file);
counterGroup.incrementAndGet("file.checks");
long lastModified = file.lastModified();
// 这里有版本控制,可想到的是应该是支持热加载
if (lastModified > lastChange) {
("Reloading configuration file:{}", file);
counterGroup.incrementAndGet("file.loads");
lastChange = lastModified;
try {
// 发现这个很熟悉的地方,终于用到了,这里发布事件
eventBus.post(getConfiguration());
} catch (Exception e) {
LOGGER.error("Failed to load configuration data. Exception follows.",
e);
} catch (NoClassDefFoundError e) {
LOGGER.error("Failed to start agent because dependencies were not " +
"found in classpath. Error follows.", e);
} catch (Throwable t) {
// caught because the caller does not handle or log Throwables
LOGGER.error("Unhandled error", t);
}
}
}
}继续我们看下 getConfiguration(),这里大概能够猜到是对配置文件进行解析的
// 这里对配置文件进行解析和关系建立
public MaterializedConfiguration getConfiguration() {
MaterializedConfiguration conf = new SimpleMaterializedConfiguration();
FlumeConfiguration fconfig = getFlumeConfiguration();
AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());
if (agentConf != null) {
Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();
Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();
Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();
try {
// 应该是核心的三个方法了,加载 channel,source,sink 以及维护他们之间的关系
loadChannels(agentConf, channelComponentMap);
loadSources(agentConf, channelComponentMap, sourceRunnerMap);
loadSinks(agentConf, channelComponentMap, sinkRunnerMap);
Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
for (String channelName : channelNames) {
ChannelComponent channelComponent = channelComponentMap.get(channelName);
if (channelComponent.components.isEmpty()) {
LOGGER.warn(String.format("Channel %s has no components connected" +
" and has been removed.", channelName));
channelComponentMap.remove(channelName);
Map<String, Channel> nameChannelMap =
channelCache.get(channelComponent.channel.getClass());
if (nameChannelMap != null) {
nameChannelMap.remove(channelName);
}
} else {
(String.format("Channel %s connected to %s",
channelName, channelComponent.components.toString()));
conf.addChannel(channelName, channelComponent.channel);
}
}
for (Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {
conf.addSourceRunner(entry.getKey(), entry.getValue());
}
for (Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {
conf.addSinkRunner(entry.getKey(), entry.getValue());
}
} catch (InstantiationException ex) {
LOGGER.error("Failed to instantiate component", ex);
} finally {
channelComponentMap.clear();
sourceRunnerMap.clear();
sinkRunnerMap.clear();
}
} else {
LOGGER.warn("No configuration found for this host:{}", getAgentName());
}
return conf;
}可以看出这里使用 channel 把 source 和 sink 进行了关联,把构造出的 source, channel 和 sink 组件都封装到了 MaterializedConfiguration 里
最后通过 eventBus 进行 MaterializedConfiguration 的事件发布,让订阅者进行运行,这样就解耦了 解析配置 和 运行组件。
根据前文我们可知这里的 eventBus 的 subscriber 就是 Application 类,所以我们最终还是需要回到 Application 类里去 Subscirbe 注解的方法
org.apache.flume.node.Application#handleConfigurationEvent 处理 MaterializedConfiguration
@Subscribe
public void handleConfigurationEvent(MaterializedConfiguration conf) {
try {
lifecycleLock.lockInterruptibly();
// 停止所有的组件,如果是热发的话,是有运行中的组件
stopAllComponents();
// 开启配置下的组件
startAllComponents(conf);
} catch (InterruptedException e) {
("Interrupted while trying to handle configuration event");
return;
} finally {
// If interrupted while trying to lock, we don't own the lock, so must not attempt to unlock
if (lifecycleLock.isHeldByCurrentThread()) {
lifecycleLock.unlock();
}
}
}这里主要是为了支持热发
org.apache.flume.node.Application#startAllComponents 启动组件
private void startAllComponents(MaterializedConfiguration materializedConfiguration) {
("Starting new configuration:{}", materializedConfiguration);
this.materializedConfiguration = materializedConfiguration;
for (Entry<String, Channel> entry :
materializedConfiguration.getChannels().entrySet()) {
try {
("Starting Channel " + entry.getKey());
// 首先启动 channel 组件,先把通道铺好
supervisor.supervise(entry.getValue(),
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
} catch (Exception e) {
logger.error("Error while starting {}", entry.getValue(), e);
}
}
/*
* Wait for all channels to start.
* 等待所有的 channel 启动,因为组件启动都是以单独线程启动,所以需要等待
*/
for (Channel ch : materializedConfiguration.getChannels().values()) {
while (ch.getLifecycleState() != LifecycleState.START
&& !supervisor.isComponentInErrorState(ch)) {
try {
("Waiting for channel: " + ch.getName() +
" to start. Sleeping for 500 ms");
Thread.sleep(500);
} catch (InterruptedException e) {
logger.error("Interrupted while waiting for channel to start.", e);
Throwables.propagate(e);
}
}
}
for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) {
try {
("Starting Sink " + entry.getKey());
// 然后启动 sink 组件,这样就可以处理 channel 里的数据了
supervisor.supervise(entry.getValue(),
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
} catch (Exception e) {
logger.error("Error while starting {}", entry.getValue(), e);
}
}
for (Entry<String, SourceRunner> entry :
materializedConfiguration.getSourceRunners().entrySet()) {
try {
("Starting Source " + entry.getKey());
// 最后启动 source 组件,这样就把数据入口打通,可以接收数据
supervisor.supervise(entry.getValue(),
new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
} catch (Exception e) {
logger.error("Error while starting {}", entry.getValue(), e);
}
}
this.loadMonitoring();
}这里可以清晰的看到各个组件依赖生命周期管理者进行依次启动,启动顺序是: channel -> sink -> source,这样保证了提供服务的时候就可以消费数据。
其实到这里我们已经大致把核心框架梳理完了,最后剩下的就是各个组件的启动和处理细节问题了
我们下面将以我们配置的三个组件分别看下启动的逻辑
org.apache.flume.channel.MemoryChannel#start channel 组件启动
@Override
public synchronized void start() {
channelCounter.start();
channelCounter.setChannelSize(queue.size());
channelCounter.setChannelCapacity(Long.valueOf(
queue.size() + queue.remainingCapacity()));
super.start();
}这里主要是初始化了内存存储队列,用来存储数据,毕竟 channel 本身就是为了传输数据,清洗数据,存储数据
既然是 queue,肯定会提供 入队 和 出队 的方法,我们继续看这个类就会发现
@Override
protected void doPut(Event event) throws InterruptedException {
channelCounter.incrementEventPutAttemptCount();
int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
if (!putList.offer(event)) {
throw new ChannelException(
"Put queue for MemoryTransaction of capacity " +
putList.size() + " full, consider committing more frequently, " +
"increasing capacity or increasing thread count");
}
putByteCounter += eventByteSize;
}
@Override
protected Event doTake() throws InterruptedException {
channelCounter.incrementEventTakeAttemptCount();
if (takeList.remainingCapacity() == 0) {
throw new ChannelException("Take list for MemoryTransaction, capacity " +
takeList.size() + " full, consider committing more frequently, " +
"increasing capacity, or increasing thread count");
}
if (!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
return null;
}
Event event;
synchronized (queueLock) {
event = queue.poll();
}
Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
"signalling existence of entry");
takeList.put(event);
int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
takeByteCounter += eventByteSize;
return event;
}LoggerSink, 托管给 SinkRunner 进行启动,org.apache.flume.SinkRunner#start
@Override
public void start() {
SinkProcessor policy = getPolicy();
// 启动 sink,这里没有做实质的内容,只是启动了生命周期的维护
policy.start();
runner = new PollingRunner();
// 把 policy 托管给 runner,让 runner 进行运行
runner.policy = policy;
runner.counterGroup = counterGroup;
runner.shouldStop = new AtomicBoolean();
runnerThread = new Thread(runner);
runnerThread.setName("SinkRunner-PollingRunner-" +
policy.getClass().getSimpleName());
// 启动线程,sinkRunner 开始启动
runnerThread.start();
lifecycleState = LifecycleState.START;
}这里可以看到其实 LoggerSink 组件在 start 的时候并没有做实质的内容,而是通过托管给 PollingRunner 进行运行
然后我们看下 PollingRunner 的运行
@Override
public void run() {
logger.debug("Polling sink runner starting");
while (!shouldStop.get()) {
try {
// 这里调用了 sink 的 process 方法去数据
if (policy.process().equals(Sink.Status.BACKOFF)) {
counterGroup.incrementAndGet("runner.backoffs");
Thread.sleep(Math.min(
counterGroup.incrementAndGet("runner.backoffs.consecutive")
* backoffSleepIncrement, maxBackoffSleep));
} else {
counterGroup.set("runner.backoffs.consecutive", 0L);
}
} catch (InterruptedException e) {
logger.debug("Interrupted while processing an event. Exiting.");
counterGroup.incrementAndGet("runner.interruptions");
} catch (Exception e) {
logger.error("Unable to deliver event. Exception follows.", e);
if (e instanceof EventDeliveryException) {
counterGroup.incrementAndGet("runner.deliveryErrors");
} else {
counterGroup.incrementAndGet("runner.errors");
}
try {
Thread.sleep(maxBackoffSleep);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}
logger.debug("Polling runner exiting. Metrics:{}", counterGroup);
}这里主要是通过 policy.process 方法从 channel 取数据,这里调用了 LoggerSink 的 process 方法
我们看下 org.apache.flume.sink.LoggerSink#process
@Override
public Status process() throws EventDeliveryException {
Status result = Status.READY;
Channel channel = getChannel();
Transaction transaction = channel.getTransaction();
Event event = null;
try {
transaction.begin();
// 从 channel 中取数据
event = channel.take();
if (event != null) {
if (logger.isInfoEnabled()) {
("Event: " + EventHelper.dumpEvent(event, maxBytesToLog));
}
} else {
// No event found, request back-off semantics from the sink runner
result = Status.BACKOFF;
}
transaction.commit();
} catch (Exception ex) {
transaction.rollback();
throw new EventDeliveryException("Failed to log event: " + event, ex);
} finally {
transaction.close();
}
return result;
}这里调用了 channel 的 take 方法,有就是 从 channel 的队列里取数,去玩之后进行了日志打印,结束了 LoggerSink 的职责。
NetCatSource ,托管给了 EventDrivenSourceRunner 进行管理运行,org.apache.flume.source.EventDrivenSourceRunner#start
@Override
public void start() {
// 取出我们的 NetCatSource
Source source = getSource();
// 获取我们解析配置文件时绑定的 ChannelProcessor,这里主要是有一个 selector 和 一个 数据过滤链
ChannelProcessor cp = source.getChannelProcessor();
// 初始化 ChannelProcessor,也是初始化 数据过滤链
cp.initialize();
// 启动 NetCatSource
source.start();
lifecycleState = LifecycleState.START;
}这里主要是进行了 channelProcessor 的初始化 和 NetCatSource 的启动
我们看下 NetCatSource#start 主要做了什么
public void start() {
("Source starting");
counterGroup.incrementAndGet("open.attempts");
try {
SocketAddress bindPoint = new InetSocketAddress(hostName, port);
// 启动 serverSocket 服务,绑定端口
serverSocket = ServerSocketChannel.open();
serverSocket.socket().setReuseAddress(true);
serverSocket.socket().bind(bindPoint);
("Created serverSocket:{}", serverSocket);
} catch (IOException e) {
counterGroup.incrementAndGet("open.errors");
logger.error("Unable to bind to socket. Exception follows.", e);
stop();
throw new FlumeException(e);
}
// 启动一个线程池进行 socket 数据接收
handlerService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
.setNameFormat("netcat-handler-%d").build());
// 使用 AcceptHandler 进行接收数据处理
AcceptHandler acceptRunnable = new AcceptHandler(maxLineLength);
acceptThreadShouldStop.set(false);
acceptRunnable.counterGroup = counterGroup;
acceptRunnable.handlerService = handlerService;
acceptRunnable.shouldStop = acceptThreadShouldStop;
acceptRunnable.ackEveryEvent = ackEveryEvent;
acceptRunnable.source = this;
acceptRunnable.serverSocket = serverSocket;
acceptRunnable.sourceEncoding = sourceEncoding;
acceptThread = new Thread(acceptRunnable);
// 启动 netcat 接收数据
acceptThread.start();
logger.debug("Source started");
super.start();
}这里主要是进行了 socket 服务启动 和 数据接收线程启动
我们这里具体看一下 org.apache.flume.source.NetcatSource.NetcatSocketHandler#run
public void run() {
logger.debug("Starting connection handler");
Event event = null;
try {
Reader reader = Channels.newReader(socketChannel, sourceEncoding);
Writer writer = Channels.newWriter(socketChannel, sourceEncoding);
CharBuffer buffer = CharBuffer.allocate(maxLineLength);
buffer.flip(); // flip() so fill() sees buffer as initially empty
while (true) {
// this method blocks until new data is available in the socket
int charsRead = fill(buffer, reader);
logger.debug("Chars read = {}", charsRead);
// attempt to process all the events in the buffer
// 这里是调用我们 channelProcessor 的地方,也是把我们数据写入 channel 的地方
int eventsProcessed = processEvents(buffer, writer);
logger.debug("Events processed = {}", eventsProcessed);
if (charsRead == -1) {
// if we received EOF before last event processing attempt, then we
// have done everything we can
break;
} else if (charsRead == 0 && eventsProcessed == 0) {
if (buffer.remaining() == buffer.capacity()) {
// If we get here it means:
// 1. Last time we called fill(), no new chars were buffered
// 2. After that, we failed to process any events => no newlines
// 3. The unread data in the buffer == the size of the buffer
// Therefore, we are stuck because the client sent a line longer
// than the size of the buffer. Response: Drop the connection.
logger.warn("Client sent event exceeding the maximum length");
counterGroup.incrementAndGet("events.failed");
writer.write("FAILED: Event exceeds the maximum length (" +
buffer.capacity() + " chars, including newline)\n");
writer.flush();
break;
}
}
}
socketChannel.close();
counterGroup.incrementAndGet("sessions.completed");
} catch (IOException e) {
counterGroup.incrementAndGet("sessions.broken");
try {
socketChannel.close();
} catch (IOException ex) {
logger.error("Unable to close socket channel. Exception follows.", ex);
}
}
logger.debug("Connection handler exiting");
}这里调用了 processEvents 方法来进行 socket 数据输入 chennel 的过程
private int processEvents(CharBuffer buffer, Writer writer)
throws IOException {
int numProcessed = 0;
boolean foundNewLine = true;
while (foundNewLine) {
foundNewLine = false;
int limit = buffer.limit();
for (int pos = buffer.position(); pos < limit; pos++) {
if (buffer.get(pos) == '\n') {
// parse event body bytes out of CharBuffer
buffer.limit(pos); // temporary limit
ByteBuffer bytes = Charsets.UTF_8.encode(buffer);
buffer.limit(limit); // restore limit
// build event object
byte[] body = new byte[bytes.remaining()];
bytes.get(body);
Event event = EventBuilder.withBody(body);
// process event
ChannelException ex = null;
try {
// 这里就是调用 channel 数据过滤链 和 数据写入 channel 的 queue 的地方
source.getChannelProcessor().processEvent(event);
} catch (ChannelException chEx) {
ex = chEx;
}
if (ex == null) {
counterGroup.incrementAndGet("events.processed");
numProcessed++;
if (true == ackEveryEvent) {
writer.write("OK\n");
}
} else {
counterGroup.incrementAndGet("events.failed");
logger.warn("Error processing event. Exception follows.", ex);
writer.write("FAILED: " + ex.getMessage() + "\n");
}
writer.flush();
// advance position after data is consumed
buffer.position(pos + 1); // skip newline
foundNewLine = true;
break;
}
}
}
return numProcessed;
}继续看 org.apache.flume.channel.ChannelProcessor#processEvent 方法,我们就会发现我们猜想到的内容
public void processEvent(Event event) {
event = interceptorChain.intercept(event);
if (event == null) {
return;
}
// Process required channels
List<Channel> requiredChannels = selector.getRequiredChannels(event);
for (Channel reqChannel : requiredChannels) {
Transaction tx = reqChannel.getTransaction();
Preconditions.checkNotNull(tx, "Transaction object must not be null");
try {
tx.begin();
// 写入 channel 数据
reqChannel.put(event);
tx.commit();
} catch (Throwable t) {
tx.rollback();
if (t instanceof Error) {
LOG.error("Error while writing to required channel: " + reqChannel, t);
throw (Error) t;
} else if (t instanceof ChannelException) {
throw (ChannelException) t;
} else {
throw new ChannelException("Unable to put event on required " +
"channel: " + reqChannel, t);
}
} finally {
if (tx != null) {
tx.close();
}
}
}
// Process optional channels
List<Channel> optionalChannels = selector.getOptionalChannels(event);
for (Channel optChannel : optionalChannels) {
Transaction tx = null;
try {
tx = optChannel.getTransaction();
tx.begin();
optChannel.put(event);
tx.commit();
} catch (Throwable t) {
tx.rollback();
LOG.error("Unable to put event on optional channel: " + optChannel, t);
if (t instanceof Error) {
throw (Error) t;
}
} finally {
if (tx != null) {
tx.close();
}
}
}核心逻辑代码到这里就已经分析结束了,这里还是有很多细节没有分析,比如热加载的实现,生命周期的管理等,大家有兴趣的可以自己看下
















