文章目录

  • flume-ng 核心代码解析
  • org.apache.flume.node.Application#main 解析
  • PollingPropertiesFileConfigurationProvider 构造函数
  • Application 构造函数
  • eventBus.register(application)
  • 然后我们继续看下 application.start() 方法,不用想,我们的核心入口就在这个 start 方法里了。
  • 接下来我们就看下 org.apache.flume.lifecycle.LifecycleSupervisor#supervise 方法
  • monitorRunnable 类
  • 继续看 org.apache.flume.node.PollingPropertiesFileConfigurationProvider#start 方法
  • FileWatcherRunnable 执行
  • org.apache.flume.node.Application#handleConfigurationEvent 处理 MaterializedConfiguration
  • org.apache.flume.node.Application#startAllComponents 启动组件
  • org.apache.flume.channel.MemoryChannel#start channel 组件启动
  • LoggerSink, 托管给 SinkRunner 进行启动,org.apache.flume.SinkRunner#start
  • NetCatSource ,托管给了 EventDrivenSourceRunner 进行管理运行,org.apache.flume.source.EventDrivenSourceRunner#start
  • 核心逻辑代码到这里就已经分析结束了,这里还是有很多细节没有分析,比如热加载的实现,生命周期的管理等,大家有兴趣的可以自己看下


flume-ng 核心代码解析

org.apache.flume.node.Application#main 解析

这个函数很长,不过核心的逻辑很清晰,这里只看核心的内容

List<LifecycleAware> components = Lists.newArrayList();

                // 看下是否被加载过配置,如果没有,需要重新加载,第一次进入肯定没有加载
                if (reload) {
                    EventBus eventBus = new EventBus(agentName + "-event-bus");
                    // 托管给 application 进行管理
                    PollingPropertiesFileConfigurationProvider configurationProvider =
                            new PollingPropertiesFileConfigurationProvider(
                                    agentName, configurationFile, eventBus, 30);
                    components.add(configurationProvider);
                    application = new Application(components);
                    eventBus.register(application);
                } else {
                    PropertiesFileConfigurationProvider configurationProvider =
                            new PropertiesFileConfigurationProvider(agentName, configurationFile);
                    application = new Application();
                    application.handleConfigurationEvent(configurationProvider.getConfiguration());
                }
                 // 这块开启了服务线程,有一些线程在后台一直运行
            application.start();

            final Application appReference = application;

            // 这里做了个 shutdown 的钩子函数,交给了 runtime 进行管理,在监听到了关闭的时候,会把 appReference 进行stop
            Runtime.getRuntime().addShutdownHook(new Thread("agent-shutdown-hook") {
                @Override
                public void run() {
                    ("application 服务停止!");
                    appReference.stop();
                }
            });

不难看出,flume-ng 使用 application 对应用进行了生命周期的管理,而具体的内容是由 PollingPropertiesFileConfigurationProvider 进行管理

PollingPropertiesFileConfigurationProvider 构造函数

public PollingPropertiesFileConfigurationProvider(String agentName,
                                                      File file, EventBus eventBus, int interval) {
        super(agentName, file);
        this.eventBus = eventBus;
        this.file = file;
        this.interval = interval;
        counterGroup = new CounterGroup();
        lifecycleState = LifecycleState.IDLE;
    }

这里记录下了我们的配置文件,eventBus,间隔时间(给定时任务使用),计数器,生命周期状态,然后我们看下父类 PropertiesFileConfigurationProvider

public PropertiesFileConfigurationProvider(String agentName, File file) {
        super(agentName);
        this.file = file;
    }

发现这里没做什么,继续看父类 AbstractConfigurationProvider

public AbstractConfigurationProvider(String agentName) {
        super();
        this.agentName = agentName;
        this.sourceFactory = new DefaultSourceFactory();
        this.sinkFactory = new DefaultSinkFactory();
        this.channelFactory = new DefaultChannelFactory();

        channelCache = new HashMap<Class<? extends Channel>, Map<String, Channel>>();
    }

记录下我们配置文件的代理名字(自己定义的名字),初始化了三个工厂: DefaultSourceFactory,DefaultSinkFactory,DefaultChannelFactory,以及一个 channelCache。

我们先不对 PollingPropertiesFileConfigurationProvider 具体的职责进行深追,我们先走主逻辑。

Application 构造函数

我们继续看下 Application 的构造函数

public Application(List<LifecycleAware> components) {
        this.components = components;
        supervisor = new LifecycleSupervisor();
    }

这里的组件就是我们的 PollingPropertiesFileConfigurationProvider,然后初始化了一个生命周期管理器 LifecycleSupervisor,

我们这里看下 LifecycleSupervisor 的构造函数

public LifecycleSupervisor() {
        lifecycleState = LifecycleState.IDLE;
        supervisedProcesses = new HashMap<LifecycleAware, Supervisoree>();
        monitorFutures = new HashMap<LifecycleAware, ScheduledFuture<?>>();
        monitorService = new ScheduledThreadPoolExecutor(10,
                new ThreadFactoryBuilder().setNameFormat(
                        "lifecycleSupervisor-" + Thread.currentThread().getId() + "-%d")
                        .build());
        monitorService.setMaximumPoolSize(20);
        monitorService.setKeepAliveTime(30, TimeUnit.SECONDS);
        purger = new Purger();
        needToPurge = false;
    }

大致可以看出大致是 维护的生命周期状态 , 监控器 和 管理者执行器。这里先不详细看,先回到我们的主逻辑

eventBus.register(application)

使用 enentBus 进行事件发布通知,其中 application 是它的订阅者。eventBus 又托管给了 PollingPropertiesFileConfigurationProvider,后续估计有相应的操作在里面。

eventBus 不太熟悉的话可以看我 wiki:eventbus源码解析【小明同学】

然后我们继续看下 application.start() 方法,不用想,我们的核心入口就在这个 start 方法里了。

// 启动入口
    public void start() {
        lifecycleLock.lock();
        try {
            for (LifecycleAware component : components) {
                // 把服务组件交给了生命周期管理者进行监督,
                // component 只有一个 new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
                supervisor.supervise(component,
                        new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
            }
        } finally {
            lifecycleLock.unlock();
        }
    }

这里主要是把所有的组件交给生命周期管理者进行管理。

接下来我们就看下 org.apache.flume.lifecycle.LifecycleSupervisor#supervise 方法

public synchronized void supervise(LifecycleAware lifecycleAware,
                                       SupervisorPolicy policy, LifecycleState desiredState) {
        if (this.monitorService.isShutdown()
                || this.monitorService.isTerminated()
                || this.monitorService.isTerminating()) {
            throw new FlumeException("Supervise called on " + lifecycleAware + " " +
                    "after shutdown has been initiated. " + lifecycleAware + " will not" +
                    " be started");
        }

        Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),
                "Refusing to supervise " + lifecycleAware + " more than once");

        if (logger.isDebugEnabled()) {
            logger.debug("Supervising service:{} policy:{} desiredState:{}",
                    new Object[] { lifecycleAware, policy, desiredState });
        }

        // 创建了一个主管维护状态
        Supervisoree process = new Supervisoree();
        process.status = new Status();

        process.policy = policy;
        process.status.desiredState = desiredState;
        process.status.error = false;

        MonitorRunnable monitorRunnable = new MonitorRunnable();
        monitorRunnable.lifecycleAware = lifecycleAware;
        monitorRunnable.supervisoree = process;
        monitorRunnable.monitorService = monitorService;

        // 这里维护组件和组件状态的关系
        supervisedProcesses.put(lifecycleAware, process);

        // 把服务组件封装成了 monitorRunnable ,然后托管给定时任务
        ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
                monitorRunnable, 0, 3, TimeUnit.SECONDS);

        // 把定时任务进行管理,在关闭服务的时候还会用到
        monitorFutures.put(lifecycleAware, future);
    }

通过这个可以看出,最后我们把组件托管给了 monitorRunnable,让一个定时任务进行后台运行,默认每隔 3s 执行一次。

monitorRunnable 类

public static class MonitorRunnable implements Runnable {

        public ScheduledExecutorService monitorService;
        // 初始化 lifecycleAware 是 new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
        public LifecycleAware lifecycleAware;
        // 初始化 supervisoree 的域成员 new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START
        public Supervisoree supervisoree;

        // 开辟的线程进行后台运行
        @Override
        public void run() {
            logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
                    supervisoree);

            long now = System.currentTimeMillis();

            try {
                if (supervisoree.status.firstSeen == null) {
                    logger.debug("first time seeing {}", lifecycleAware);

                    supervisoree.status.firstSeen = now;
                }

                supervisoree.status.lastSeen = now;
                synchronized (lifecycleAware) {
                    if (supervisoree.status.discard) {
                        // Unsupervise has already been called on this.
                        ("Component has already been stopped {}", lifecycleAware);
                        return;
                    } else if (supervisoree.status.error) {
                        ("Component {} is in error state, and Flume will not"
                                + "attempt to change its state", lifecycleAware);
                        return;
                    }

                    supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();

                    if (!lifecycleAware.getLifecycleState().equals(
                            supervisoree.status.desiredState)) {

                        logger.debug("Want to transition {} from {} to {} (failures:{})",
                                new Object[] { lifecycleAware, supervisoree.status.lastSeenState,
                                        supervisoree.status.desiredState,
                                        supervisoree.status.failures });

                        switch (supervisoree.status.desiredState) {
                            case START:
                                try {
                                    // 状态如果是 start,那就会让组件进行start
                                    lifecycleAware.start();

                                } catch (Throwable e) {
                                    logger.error("Unable to start " + lifecycleAware
                                            + " - Exception follows.", e);
                                    if (e instanceof Error) {
                                        // This component can never recover, shut it down.
                                        supervisoree.status.desiredState = LifecycleState.STOP;
                                        try {
                                            lifecycleAware.stop();
                                            logger.warn("Component {} stopped, since it could not be"
                                                            + "successfully started due to missing dependencies",
                                                    lifecycleAware);
                                        } catch (Throwable e1) {
                                            logger.error("Unsuccessful attempt to "
                                                    + "shutdown component: {} due to missing dependencies."
                                                    + " Please shutdown the agent"
                                                    + "or disable this component, or the agent will be"
                                                    + "in an undefined state.", e1);
                                            supervisoree.status.error = true;
                                            if (e1 instanceof Error) {
                                                throw (Error) e1;
                                            }
                                            // Set the state to stop, so that the conf poller can
                                            // proceed.
                                        }
                                    }
                                    supervisoree.status.failures++;
                                }
                                break;
                            case STOP:
                                try {
                                    lifecycleAware.stop();
                                } catch (Throwable e) {
                                    logger.error("Unable to stop " + lifecycleAware
                                            + " - Exception follows.", e);
                                    if (e instanceof Error) {
                                        throw (Error) e;
                                    }
                                    supervisoree.status.failures++;
                                }
                                break;
                            default:
                                logger.warn("I refuse to acknowledge {} as a desired state",
                                        supervisoree.status.desiredState);
                        }

                        if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
                            logger.error(
                                    "Policy {} of {} has been violated - supervisor should exit!",
                                    supervisoree.policy, lifecycleAware);
                        }
                    }
                }
            } catch (Throwable t) {
                logger.error("Unexpected error", t);
            }
            logger.debug("Status check complete");
        }
    }

这么一大堆其实核心逻辑就是调用了 lifecycleAware.start() ,这里的 lifecycleAware 就是我们的 PollingPropertiesFileConfigurationProvider

继续看 org.apache.flume.node.PollingPropertiesFileConfigurationProvider#start 方法

/**
     * 最终还是自己承担了所有,最终将由自己对配置文件进行解析,与Application进行指责划分,Application用于管理应用(启动,停止),
     * PollingPropertiesFileConfigurationProvider用来真正进行文件解析和日志管理
     */
    @Override
    public void start() {
        ("Configuration provider starting");

        Preconditions.checkState(file != null,
                "The parameter file must not be null");

        executorService = Executors.newSingleThreadScheduledExecutor(
                new ThreadFactoryBuilder().setNameFormat("conf-file-poller-%d")
                        .build());

        FileWatcherRunnable fileWatcherRunnable =
                new FileWatcherRunnable(file, counterGroup);

        // 应该是这个定时任务进行的执行
        executorService.scheduleWithFixedDelay(fileWatcherRunnable, 0, interval,
                TimeUnit.SECONDS);

        lifecycleState = LifecycleState.START;

        LOGGER.debug("Configuration provider started");
    }

这里我们看出,又起来个定时任务进行了 FileWatcherRunnable 的执行,所以接下来我们就需要看 fileWatcherRunnable

FileWatcherRunnable 执行

@Override
        public void run() {
            LOGGER.debug("Checking file:{} for changes", file);

            counterGroup.incrementAndGet("file.checks");

            long lastModified = file.lastModified();

            // 这里有版本控制,可想到的是应该是支持热加载
            if (lastModified > lastChange) {
                ("Reloading configuration file:{}", file);

                counterGroup.incrementAndGet("file.loads");

                lastChange = lastModified;

                try {
                	// 发现这个很熟悉的地方,终于用到了,这里发布事件
                    eventBus.post(getConfiguration());
                } catch (Exception e) {
                    LOGGER.error("Failed to load configuration data. Exception follows.",
                            e);
                } catch (NoClassDefFoundError e) {
                    LOGGER.error("Failed to start agent because dependencies were not " +
                            "found in classpath. Error follows.", e);
                } catch (Throwable t) {
                    // caught because the caller does not handle or log Throwables
                    LOGGER.error("Unhandled error", t);
                }
            }
        }
    }

继续我们看下 getConfiguration(),这里大概能够猜到是对配置文件进行解析的

// 这里对配置文件进行解析和关系建立
    public MaterializedConfiguration getConfiguration() {
        MaterializedConfiguration conf = new SimpleMaterializedConfiguration();
        FlumeConfiguration fconfig = getFlumeConfiguration();
        AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());
        if (agentConf != null) {
            Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();
            Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();
            Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();
            try {
                // 应该是核心的三个方法了,加载 channel,source,sink 以及维护他们之间的关系
                loadChannels(agentConf, channelComponentMap);
                loadSources(agentConf, channelComponentMap, sourceRunnerMap);
                loadSinks(agentConf, channelComponentMap, sinkRunnerMap);
                Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
                for (String channelName : channelNames) {
                    ChannelComponent channelComponent = channelComponentMap.get(channelName);
                    if (channelComponent.components.isEmpty()) {
                        LOGGER.warn(String.format("Channel %s has no components connected" +
                                " and has been removed.", channelName));
                        channelComponentMap.remove(channelName);
                        Map<String, Channel> nameChannelMap =
                                channelCache.get(channelComponent.channel.getClass());
                        if (nameChannelMap != null) {
                            nameChannelMap.remove(channelName);
                        }
                    } else {
                        (String.format("Channel %s connected to %s",
                                channelName, channelComponent.components.toString()));
                        conf.addChannel(channelName, channelComponent.channel);
                    }
                }
                for (Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {
                    conf.addSourceRunner(entry.getKey(), entry.getValue());
                }
                for (Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {
                    conf.addSinkRunner(entry.getKey(), entry.getValue());
                }
            } catch (InstantiationException ex) {
                LOGGER.error("Failed to instantiate component", ex);
            } finally {
                channelComponentMap.clear();
                sourceRunnerMap.clear();
                sinkRunnerMap.clear();
            }
        } else {
            LOGGER.warn("No configuration found for this host:{}", getAgentName());
        }
        return conf;
    }

可以看出这里使用 channel 把 source 和 sink 进行了关联,把构造出的 source, channel 和 sink 组件都封装到了 MaterializedConfiguration 里

最后通过 eventBus 进行 MaterializedConfiguration 的事件发布,让订阅者进行运行,这样就解耦了 解析配置 和 运行组件。

根据前文我们可知这里的 eventBus 的 subscriber 就是 Application 类,所以我们最终还是需要回到 Application 类里去 Subscirbe 注解的方法

org.apache.flume.node.Application#handleConfigurationEvent 处理 MaterializedConfiguration

@Subscribe
    public void handleConfigurationEvent(MaterializedConfiguration conf) {
        try {
            lifecycleLock.lockInterruptibly();
            // 停止所有的组件,如果是热发的话,是有运行中的组件
            stopAllComponents();
            // 开启配置下的组件
            startAllComponents(conf);
        } catch (InterruptedException e) {
            ("Interrupted while trying to handle configuration event");
            return;
        } finally {
            // If interrupted while trying to lock, we don't own the lock, so must not attempt to unlock
            if (lifecycleLock.isHeldByCurrentThread()) {
                lifecycleLock.unlock();
            }
        }
    }

这里主要是为了支持热发

org.apache.flume.node.Application#startAllComponents 启动组件

private void startAllComponents(MaterializedConfiguration materializedConfiguration) {
        ("Starting new configuration:{}", materializedConfiguration);

        this.materializedConfiguration = materializedConfiguration;

        for (Entry<String, Channel> entry :
                materializedConfiguration.getChannels().entrySet()) {
            try {
                ("Starting Channel " + entry.getKey());
                // 首先启动 channel 组件,先把通道铺好
                supervisor.supervise(entry.getValue(),
                        new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
            } catch (Exception e) {
                logger.error("Error while starting {}", entry.getValue(), e);
            }
        }

        /*
         * Wait for all channels to start.
         * 等待所有的 channel 启动,因为组件启动都是以单独线程启动,所以需要等待
         */
        for (Channel ch : materializedConfiguration.getChannels().values()) {
            while (ch.getLifecycleState() != LifecycleState.START
                    && !supervisor.isComponentInErrorState(ch)) {
                try {
                    ("Waiting for channel: " + ch.getName() +
                            " to start. Sleeping for 500 ms");
                    Thread.sleep(500);
                } catch (InterruptedException e) {
                    logger.error("Interrupted while waiting for channel to start.", e);
                    Throwables.propagate(e);
                }
            }
        }

        for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) {
            try {
                ("Starting Sink " + entry.getKey());
                // 然后启动 sink 组件,这样就可以处理 channel 里的数据了
                supervisor.supervise(entry.getValue(),
                        new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
            } catch (Exception e) {
                logger.error("Error while starting {}", entry.getValue(), e);
            }
        }

        for (Entry<String, SourceRunner> entry :
                materializedConfiguration.getSourceRunners().entrySet()) {
            try {
                ("Starting Source " + entry.getKey());
                // 最后启动 source 组件,这样就把数据入口打通,可以接收数据
                supervisor.supervise(entry.getValue(),
                        new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
            } catch (Exception e) {
                logger.error("Error while starting {}", entry.getValue(), e);
            }
        }

        this.loadMonitoring();
    }

这里可以清晰的看到各个组件依赖生命周期管理者进行依次启动,启动顺序是: channel -> sink -> source,这样保证了提供服务的时候就可以消费数据。

其实到这里我们已经大致把核心框架梳理完了,最后剩下的就是各个组件的启动和处理细节问题了

我们下面将以我们配置的三个组件分别看下启动的逻辑

org.apache.flume.channel.MemoryChannel#start channel 组件启动

@Override
  public synchronized void start() {
    channelCounter.start();
    channelCounter.setChannelSize(queue.size());
    channelCounter.setChannelCapacity(Long.valueOf(
            queue.size() + queue.remainingCapacity()));
    super.start();
  }

这里主要是初始化了内存存储队列,用来存储数据,毕竟 channel 本身就是为了传输数据,清洗数据,存储数据

既然是 queue,肯定会提供 入队 和 出队 的方法,我们继续看这个类就会发现

@Override
    protected void doPut(Event event) throws InterruptedException {
      channelCounter.incrementEventPutAttemptCount();
      int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);

      if (!putList.offer(event)) {
        throw new ChannelException(
            "Put queue for MemoryTransaction of capacity " +
            putList.size() + " full, consider committing more frequently, " +
            "increasing capacity or increasing thread count");
      }
      putByteCounter += eventByteSize;
    }

    @Override
    protected Event doTake() throws InterruptedException {
      channelCounter.incrementEventTakeAttemptCount();
      if (takeList.remainingCapacity() == 0) {
        throw new ChannelException("Take list for MemoryTransaction, capacity " +
            takeList.size() + " full, consider committing more frequently, " +
            "increasing capacity, or increasing thread count");
      }
      if (!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
        return null;
      }
      Event event;
      synchronized (queueLock) {
        event = queue.poll();
      }
      Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
          "signalling existence of entry");
      takeList.put(event);

      int eventByteSize = (int) Math.ceil(estimateEventSize(event) / byteCapacitySlotSize);
      takeByteCounter += eventByteSize;

      return event;
    }

LoggerSink, 托管给 SinkRunner 进行启动,org.apache.flume.SinkRunner#start

@Override
  public void start() {
    SinkProcessor policy = getPolicy();
    // 启动 sink,这里没有做实质的内容,只是启动了生命周期的维护
    policy.start();

    runner = new PollingRunner();
    // 把 policy 托管给 runner,让 runner 进行运行
    runner.policy = policy;
    runner.counterGroup = counterGroup;
    runner.shouldStop = new AtomicBoolean();

    runnerThread = new Thread(runner);
    runnerThread.setName("SinkRunner-PollingRunner-" +
        policy.getClass().getSimpleName());
    // 启动线程,sinkRunner 开始启动
    runnerThread.start();

    lifecycleState = LifecycleState.START;
  }

这里可以看到其实 LoggerSink 组件在 start 的时候并没有做实质的内容,而是通过托管给 PollingRunner 进行运行

然后我们看下 PollingRunner 的运行

@Override
    public void run() {
      logger.debug("Polling sink runner starting");

      while (!shouldStop.get()) {
        try {
          // 这里调用了 sink 的 process 方法去数据
          if (policy.process().equals(Sink.Status.BACKOFF)) {
            counterGroup.incrementAndGet("runner.backoffs");

            Thread.sleep(Math.min(
                counterGroup.incrementAndGet("runner.backoffs.consecutive")
                * backoffSleepIncrement, maxBackoffSleep));
          } else {
            counterGroup.set("runner.backoffs.consecutive", 0L);
          }
        } catch (InterruptedException e) {
          logger.debug("Interrupted while processing an event. Exiting.");
          counterGroup.incrementAndGet("runner.interruptions");
        } catch (Exception e) {
          logger.error("Unable to deliver event. Exception follows.", e);
          if (e instanceof EventDeliveryException) {
            counterGroup.incrementAndGet("runner.deliveryErrors");
          } else {
            counterGroup.incrementAndGet("runner.errors");
          }
          try {
            Thread.sleep(maxBackoffSleep);
          } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
          }
        }
      }
      logger.debug("Polling runner exiting. Metrics:{}", counterGroup);
    }

这里主要是通过 policy.process 方法从 channel 取数据,这里调用了 LoggerSink 的 process 方法

我们看下 org.apache.flume.sink.LoggerSink#process

@Override
  public Status process() throws EventDeliveryException {
    Status result = Status.READY;
    Channel channel = getChannel();
    Transaction transaction = channel.getTransaction();
    Event event = null;

    try {
      transaction.begin();
      // 从 channel 中取数据
      event = channel.take();

      if (event != null) {
        if (logger.isInfoEnabled()) {
          ("Event: " + EventHelper.dumpEvent(event, maxBytesToLog));
        }
      } else {
        // No event found, request back-off semantics from the sink runner
        result = Status.BACKOFF;
      }
      transaction.commit();
    } catch (Exception ex) {
      transaction.rollback();
      throw new EventDeliveryException("Failed to log event: " + event, ex);
    } finally {
      transaction.close();
    }

    return result;
  }

这里调用了 channel 的 take 方法,有就是 从 channel 的队列里取数,去玩之后进行了日志打印,结束了 LoggerSink 的职责。

NetCatSource ,托管给了 EventDrivenSourceRunner 进行管理运行,org.apache.flume.source.EventDrivenSourceRunner#start

@Override
  public void start() {
  	// 取出我们的 NetCatSource
    Source source = getSource();
    // 获取我们解析配置文件时绑定的 ChannelProcessor,这里主要是有一个 selector 和 一个 数据过滤链
    ChannelProcessor cp = source.getChannelProcessor();
    // 初始化 ChannelProcessor,也是初始化 数据过滤链
    cp.initialize();
    // 启动 NetCatSource
    source.start();
    lifecycleState = LifecycleState.START;
  }

这里主要是进行了 channelProcessor 的初始化 和 NetCatSource 的启动

我们看下 NetCatSource#start 主要做了什么

public void start() {

    ("Source starting");

    counterGroup.incrementAndGet("open.attempts");

    try {
      SocketAddress bindPoint = new InetSocketAddress(hostName, port);
      // 启动 serverSocket 服务,绑定端口
      serverSocket = ServerSocketChannel.open();
      serverSocket.socket().setReuseAddress(true);
      serverSocket.socket().bind(bindPoint);

      ("Created serverSocket:{}", serverSocket);
    } catch (IOException e) {
      counterGroup.incrementAndGet("open.errors");
      logger.error("Unable to bind to socket. Exception follows.", e);
      stop();
      throw new FlumeException(e);
    }

    // 启动一个线程池进行 socket 数据接收
    handlerService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
        .setNameFormat("netcat-handler-%d").build());
    // 使用 AcceptHandler 进行接收数据处理
    AcceptHandler acceptRunnable = new AcceptHandler(maxLineLength);
    acceptThreadShouldStop.set(false);
    acceptRunnable.counterGroup = counterGroup;
    acceptRunnable.handlerService = handlerService;
    acceptRunnable.shouldStop = acceptThreadShouldStop;
    acceptRunnable.ackEveryEvent = ackEveryEvent;
    acceptRunnable.source = this;
    acceptRunnable.serverSocket = serverSocket;
    acceptRunnable.sourceEncoding = sourceEncoding;

    acceptThread = new Thread(acceptRunnable);
    // 启动 netcat 接收数据
    acceptThread.start();

    logger.debug("Source started");
    super.start();
  }

这里主要是进行了 socket 服务启动 和 数据接收线程启动

我们这里具体看一下 org.apache.flume.source.NetcatSource.NetcatSocketHandler#run

public void run() {
      logger.debug("Starting connection handler");
      Event event = null;

      try {
        Reader reader = Channels.newReader(socketChannel, sourceEncoding);
        Writer writer = Channels.newWriter(socketChannel, sourceEncoding);
        CharBuffer buffer = CharBuffer.allocate(maxLineLength);
        buffer.flip(); // flip() so fill() sees buffer as initially empty

        while (true) {
          // this method blocks until new data is available in the socket
          int charsRead = fill(buffer, reader);
          logger.debug("Chars read = {}", charsRead);

          // attempt to process all the events in the buffer
          // 这里是调用我们 channelProcessor 的地方,也是把我们数据写入 channel 的地方
          int eventsProcessed = processEvents(buffer, writer);
          logger.debug("Events processed = {}", eventsProcessed);

          if (charsRead == -1) {
            // if we received EOF before last event processing attempt, then we
            // have done everything we can
            break;
          } else if (charsRead == 0 && eventsProcessed == 0) {
            if (buffer.remaining() == buffer.capacity()) {
              // If we get here it means:
              // 1. Last time we called fill(), no new chars were buffered
              // 2. After that, we failed to process any events => no newlines
              // 3. The unread data in the buffer == the size of the buffer
              // Therefore, we are stuck because the client sent a line longer
              // than the size of the buffer. Response: Drop the connection.
              logger.warn("Client sent event exceeding the maximum length");
              counterGroup.incrementAndGet("events.failed");
              writer.write("FAILED: Event exceeds the maximum length (" +
                  buffer.capacity() + " chars, including newline)\n");
              writer.flush();
              break;
            }
          }
        }

        socketChannel.close();

        counterGroup.incrementAndGet("sessions.completed");
      } catch (IOException e) {
        counterGroup.incrementAndGet("sessions.broken");
        try {
          socketChannel.close();
        } catch (IOException ex) {
          logger.error("Unable to close socket channel. Exception follows.", ex);
        }
      }

      logger.debug("Connection handler exiting");
    }

这里调用了 processEvents 方法来进行 socket 数据输入 chennel 的过程

private int processEvents(CharBuffer buffer, Writer writer)
        throws IOException {

      int numProcessed = 0;

      boolean foundNewLine = true;
      while (foundNewLine) {
        foundNewLine = false;

        int limit = buffer.limit();
        for (int pos = buffer.position(); pos < limit; pos++) {
          if (buffer.get(pos) == '\n') {

            // parse event body bytes out of CharBuffer
            buffer.limit(pos); // temporary limit
            ByteBuffer bytes = Charsets.UTF_8.encode(buffer);
            buffer.limit(limit); // restore limit

            // build event object
            byte[] body = new byte[bytes.remaining()];
            bytes.get(body);
            Event event = EventBuilder.withBody(body);

            // process event
            ChannelException ex = null;
            try {
              // 这里就是调用 channel 数据过滤链 和 数据写入 channel 的 queue 的地方
              source.getChannelProcessor().processEvent(event);
            } catch (ChannelException chEx) {
              ex = chEx;
            }

            if (ex == null) {
              counterGroup.incrementAndGet("events.processed");
              numProcessed++;
              if (true == ackEveryEvent) {
                writer.write("OK\n");
              }
            } else {
              counterGroup.incrementAndGet("events.failed");
              logger.warn("Error processing event. Exception follows.", ex);
              writer.write("FAILED: " + ex.getMessage() + "\n");
            }
            writer.flush();

            // advance position after data is consumed
            buffer.position(pos + 1); // skip newline
            foundNewLine = true;

            break;
          }
        }

      }

      return numProcessed;
    }

继续看 org.apache.flume.channel.ChannelProcessor#processEvent 方法,我们就会发现我们猜想到的内容

public void processEvent(Event event) {

    event = interceptorChain.intercept(event);
    if (event == null) {
      return;
    }

    // Process required channels
    List<Channel> requiredChannels = selector.getRequiredChannels(event);
    for (Channel reqChannel : requiredChannels) {
      Transaction tx = reqChannel.getTransaction();
      Preconditions.checkNotNull(tx, "Transaction object must not be null");
      try {
        tx.begin();
        // 写入 channel 数据
        reqChannel.put(event);

        tx.commit();
      } catch (Throwable t) {
        tx.rollback();
        if (t instanceof Error) {
          LOG.error("Error while writing to required channel: " + reqChannel, t);
          throw (Error) t;
        } else if (t instanceof ChannelException) {
          throw (ChannelException) t;
        } else {
          throw new ChannelException("Unable to put event on required " +
              "channel: " + reqChannel, t);
        }
      } finally {
        if (tx != null) {
          tx.close();
        }
      }
    }

    // Process optional channels
    List<Channel> optionalChannels = selector.getOptionalChannels(event);
    for (Channel optChannel : optionalChannels) {
      Transaction tx = null;
      try {
        tx = optChannel.getTransaction();
        tx.begin();

        optChannel.put(event);

        tx.commit();
      } catch (Throwable t) {
        tx.rollback();
        LOG.error("Unable to put event on optional channel: " + optChannel, t);
        if (t instanceof Error) {
          throw (Error) t;
        }
      } finally {
        if (tx != null) {
          tx.close();
        }
      }
    }

核心逻辑代码到这里就已经分析结束了,这里还是有很多细节没有分析,比如热加载的实现,生命周期的管理等,大家有兴趣的可以自己看下