1. Kettle转换执行流程
Kettle转换执行流程体现在Trans类的execute()方法,代码如下所示:
public void execute( String[] arguments ) throws KettleException {
prepareExecution( arguments );
startThreads();
}
1.1 prepareExecution流程分析
prepareExecution方法完成的主要工作是:
- 设置转换参数、变量
- 处理步骤之间数据传递是分发还是复制
- 初始化设置转换日志表、步骤日志表、性能记录表、添加一系列TransListener
- 构造StepMetaDataCombi对象集合,该对象负责传递Step,StepMeta以及StepData对象给RunThread
- 调用每个Step的init方法进行步骤初始化,确保所有Step的初始化都顺利进行,出错就提前结束所有步骤
- 步骤监控的初始化
1.2 startThreads流程分析
startThreads方法完成的主要工作是:
- TransListener和StepListener构造
- 当listener准备好之后就根据转换类型启动线程跑转换里的步骤,通过构造RunThread对象来执行步骤
- 当所有步骤完成会触发StepListener的stepFinished方法里面的fireTransFinishedListeners(),通知转换,步骤已经全部执行结束,转换可以结束了,设置相关参数并清理现场
核心StepListener构造片段代码如下所示:
StepListener stepListener = new StepListener() {
.....
public void stepFinished( Trans trans, StepMeta stepMeta, StepInterface step ) {
synchronized ( Trans.this ) {
nrOfFinishedSteps++;
if ( nrOfFinishedSteps >= steps.size() ) {
// Set the finished flag
//
setFinished( true );
// Grab the performance statistics one last time (if enabled)
//
addStepPerformanceSnapShot();
try {
fireTransFinishedListeners();
} catch ( Exception e ) {
step.setErrors( step.getErrors() + 1L );
log.logError( getName()
+ " : " + BaseMessages.getString( PKG, "Trans.Log.UnexpectedErrorAtTransformationEnd" ), e );
}
}
// If a step fails with an error, we want to kill/stop the others
// too...
//
if ( step.getErrors() > 0 ) {
log.logMinimal( BaseMessages.getString( PKG, "Trans.Log.TransformationDetectedErrors" ) );
log.logMinimal( BaseMessages.getString(
PKG, "Trans.Log.TransformationIsKillingTheOtherSteps" ) );
killAllNoWait();
}
}
}
};
// Make sure this is called first!
//
if ( sid.step instanceof BaseStep ) {
( (BaseStep) sid.step ).getStepListeners().add( 0, stepListener );
} else {
sid.step.addStepListener( stepListener );
}
fireTransFinishedListeners方法代码如下所示:
/**
* Make attempt to fire all registered listeners if possible.
*
* @throws KettleException
* if any errors occur during notification
*/
protected void fireTransFinishedListeners() throws KettleException {
// PDI-5229 sync added
synchronized ( transListeners ) {
if ( transListeners.size() == 0 ) {
return;
}
//prevent Exception from one listener to block others execution
List<KettleException> badGuys = new ArrayList<KettleException>( transListeners.size() );
for ( TransListener transListener : transListeners ) {
try {
transListener.transFinished( this );
} catch ( KettleException e ) {
badGuys.add( e );
}
}
// Signal for the the waitUntilFinished blocker...
transFinishedBlockingQueue.add( new Object() );
if ( !badGuys.isEmpty() ) {
//FIFO
throw new KettleException( badGuys.get( 0 ) );
}
}
}
fireTransFinishedListeners方法负责通知所有的TransListener转换已经结束,通过调用
transListener.transFinished(this)
来完成通知。这个方法还负责解除waitUntilFinished方法调用的阻塞状态。waitUntilFinished方法在execute方法执行后调用可以等待转换结束或者出错才返回。这是因为该方法利用了一个阻塞队列(BlockingQueue)transFinishedBlockingQueue的poll方法来进行阻塞,而只有当上面讲到的fireTransFinishedListeners方法触发了,才会执行
// Signal for the the waitUntilFinished blocker...
transFinishedBlockingQueue.add( new Object() );
来解除阻塞队列的阻塞状态。
1.3 RunThread.run()流程分析
一般情况下转换里的每个步骤隔离到单独的线程执行,步骤执行的逻辑代码表现在类RunThread(实现了Runnable接口)的run方法里面,核心逻辑很简单,调用step的processRow方法直到没有输入数据要处理表示该步骤已经结束,结束的时候会调用step.dispose清理现场资源,最后调用step.markStop()标记步骤已经停止,在markStop方法里面会回调所有StepListener的stepFinished方法(实现在BaseStep类里面),因此在Trans类的startThreads方法里面构造的核心StepListener方法将被调用。
RunThread的run方法代码如下所示:
public void run() {
try {
step.setRunning( true );
step.getLogChannel().snap( Metrics.METRIC_STEP_EXECUTION_START );
if ( log.isDetailed() ) {
log.logDetailed( BaseMessages.getString( "System.Log.StartingToRun" ) );
}
// Wait
while ( step.processRow( meta, data ) ) {
if ( step.isStopped() ) {
break;
}
}
} catch ( Throwable t ) {
try {
// check for OOME
if ( t instanceof OutOfMemoryError ) {
// Handle this different with as less overhead as possible to get an error message in the log.
// Otherwise it crashes likely with another OOME in Me$$ages.getString() and does not log
// nor call the setErrors() and stopAll() below.
log.logError( "UnexpectedError: ", t );
} else {
t.printStackTrace();
log.logError( BaseMessages.getString( "System.Log.UnexpectedError" ), t );
}
String logChannelId = log.getLogChannelId();
LoggingObjectInterface loggingObject = LoggingRegistry.getInstance().getLoggingObject( logChannelId );
String parentLogChannelId = loggingObject.getParent().getLogChannelId();
List<String> logChannelChildren = LoggingRegistry.getInstance().getLogChannelChildren( parentLogChannelId );
int childIndex = Const.indexOfString( log.getLogChannelId(), logChannelChildren );
System.out.println( "child index = "
+ childIndex + ", logging object : " + loggingObject.toString() + " parent=" + parentLogChannelId );
KettleLogStore.getAppender().getBuffer( "2bcc6b3f-c660-4a8b-8b17-89e8cbd5b29b", false );
// baseStep.logError(Const.getStackTracker(t));
} catch ( OutOfMemoryError e ) {
e.printStackTrace();
} finally {
step.setErrors( 1 );
step.stopAll();
}
} finally {
step.dispose( meta, data );
step.getLogChannel().snap( Metrics.METRIC_STEP_EXECUTION_STOP );
try {
long li = step.getLinesInput();
long lo = step.getLinesOutput();
long lr = step.getLinesRead();
long lw = step.getLinesWritten();
long lu = step.getLinesUpdated();
long lj = step.getLinesRejected();
long e = step.getErrors();
if ( li > 0 || lo > 0 || lr > 0 || lw > 0 || lu > 0 || lj > 0 || e > 0 ) {
log.logBasic( BaseMessages.getString( PKG, "BaseStep.Log.SummaryInfo", String.valueOf( li ),
String.valueOf( lo ), String.valueOf( lr ), String.valueOf( lw ),
String.valueOf( lu ), String.valueOf( e + lj ) ) );
} else {
log.logDetailed( BaseMessages.getString( PKG, "BaseStep.Log.SummaryInfo", String.valueOf( li ),
String.valueOf( lo ), String.valueOf( lr ), String.valueOf( lw ),
String.valueOf( lu ), String.valueOf( e + lj ) ) );
}
} catch ( Throwable t ) {
//
// it's likely an OOME, so we don't want to introduce overhead by using BaseMessages.getString(), see above
//
log.logError( "UnexpectedError: " + Const.getStackTracker( t ) );
} finally {
step.markStop();
}
}
}