上一篇我们讲到了ClosureCleaner的clean函数,这一篇我们继续往下分析,在clean函数中又调用了另外一个clean函数clean(func, level, checkSerializable, Collections.newSetFromMap(new IdentityHashMap<>()));代码如下:
private static void clean(Object func, ExecutionConfig.ClosureCleanerLevel level, boolean checkSerializable, Set<Object> visited) {
if (func == null) {
return;
}
//将func保存起来
if (!visited.add(func)) {
return;
}
//获取Object的Class对象
final Class<?> cls = func.getClass();
/*判断是否为基础类型还是包装类型,所谓基础类型就是int、float、long等这种非类对象类型
包装类型就是基础类型对应的Integer、Float、Long等类对象类型。
由于基础类型或者非基础类型
*/
if (ClassUtils.isPrimitiveOrWrapper(cls)) {
return;
}
//通过反射判断cls中是否有序列化函数writeReplace、writeObject,关于这两个函数的使用可以自己搜索一下,这里不详解
if (usesCustomSerialization(cls)) {
return;
}
// First find the field name of the "this$0" field, this can
// be "this$x" depending on the nesting
boolean closureAccessed = false;
//开始遍历cls中的所有成员变量
for (Field f: cls.getDeclaredFields()) {
//这部分代码我们下面详细讲解一下
if (f.getName().startsWith("this$")) {
// found a closure referencing field - now try to clean
closureAccessed |= cleanThis0(func, cls, f.getName());
} else {
Object fieldObject;
try {
//将该变量设置为可访问
f.setAccessible(true);
//获取该变量类对象
fieldObject = f.get(func);
} catch (IllegalAccessException e) {
throw new RuntimeException(String.format("Can not access to the %s field in Class %s", f.getName(), func.getClass()));
}
/*
* we should do a deep clean when we encounter an anonymous class, inner class and local class, but should
* skip the class with custom serialize method.
*
* There are five kinds of classes (or interfaces):
* a) Top level classes
* b) Nested classes (static member classes)
* c) Inner classes (non-static member classes)
* d) Local classes (named classes declared within a method)
* e) Anonymous classes
*/
if (level == ExecutionConfig.ClosureCleanerLevel.RECURSIVE && needsRecursion(f, fieldObject)) {
if (LOG.isDebugEnabled()) {
LOG.debug("Dig to clean the {}", fieldObject.getClass().getName());
}
clean(fieldObject, ExecutionConfig.ClosureCleanerLevel.RECURSIVE, true, visited);
}
}
}
//如果需要检查func类对象的可序列化性
if (checkSerializable) {
try {
//开始尝试序列化func,如果无法序列化那么就会抛出异常
InstantiationUtil.serializeObject(func);
}
catch (Exception e) {
String functionType = getSuperClassOrInterfaceName(func.getClass());
String msg = functionType == null ?
(func + " is not serializable.") :
("The implementation of the " + functionType + " is not serializable.");
if (closureAccessed) {
msg += " The implementation accesses fields of its enclosing class, which is " +
"a common reason for non-serializability. " +
"A common solution is to make the function a proper (non-inner) class, or " +
"a static inner class.";
} else {
msg += " The object probably contains or references non serializable fields.";
}
throw new InvalidProgramException(msg, e);
}
}
}
我们来分析一下函数 needsRecursion,里面的代码如下:
private static boolean needsRecursion(Field f, Object fo) {
return (fo != null &&
//判断f是否为静态变量
!Modifier.isStatic(f.getModifiers()) &&
//判断f是否为Transient
!Modifier.isTransient(f.getModifiers()));
}
这个函数主要用来判断f对象是否需要序列化,有以下几种情况不需要序列化:
1、该对象为null
2、该对象是一个静态变量
3、该对象设置来Transient,这个是不序列化的标志
关于Static和Transient的序列化问题,可以查看这里
接下来我们开始讲解最重要的部分,代码如下:
if (f.getName().startsWith("this$")) {
// found a closure referencing field - now try to clean
closureAccessed |= cleanThis0(func, cls, f.getName());
}
如果Field类对象f的名称是this$开头,那么就开始进行清除操作,那么这里是什么意思呢?这个就涉及到内部类和外部类的情况,内部类对象中会自动包含一个引用,这个引用是它的外部类对象,也就是说如果在func类对象中找到类以this$开头的成员变量,那么就说明func类中还存在外部类,此时我们需要进行一些清除操作关于this$的资料
我们进入到cleanThis0函数中看看,代码如下:
private static boolean cleanThis0(Object func, Class<?> cls, String this0Name) {
This0AccessFinder this0Finder = new This0AccessFinder(this0Name);
getClassReader(cls).accept(this0Finder, 0);
final boolean accessesClosure = this0Finder.isThis0Accessed();
if (LOG.isDebugEnabled()) {
LOG.debug(this0Name + " is accessed: " + accessesClosure);
}
if (!accessesClosure) {
Field this0;
try {
this0 = func.getClass().getDeclaredField(this0Name);
} catch (NoSuchFieldException e) {
// has no this$0, just return
throw new RuntimeException("Could not set " + this0Name + ": " + e);
}
try {
this0.setAccessible(true);
//将内部类的外部类引用变量设置为null,这样就能防止外部类无法序列化而导致最终序列化失败
this0.set(func, null);
}
catch (Exception e) {
// should not happen, since we use setAccessible
throw new RuntimeException("Could not set " + this0Name + " to null. " + e.getMessage(), e);
}
}
return accessesClosure;
}
好了,到此ClosureCleaner的clean函数就讲完了,我们继续回到map函数中,代码如下:
/**
* Applies a Map transformation on a {@link DataStream}. The transformation
* calls a {@link MapFunction} for each element of the DataStream. Each
* MapFunction call returns exactly one element. The user can also extend
* {@link RichMapFunction} to gain access to other features provided by the
* {@link org.apache.flink.api.common.functions.RichFunction} interface.
*
* @param mapper
* The MapFunction that is called for each element of the
* DataStream.
* @param <R>
* output type
* @return The transformed {@link DataStream}.
*/
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
// 通过java reflection抽出mapper的返回值类型
TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
Utils.getCallLocationName(), true);
// 返回一个新的DataStream,SteramMap 为 StreamOperator 的实现类
return transform("Map", outType, new StreamMap<>(clean(mapper)));
}
clean函数总结一下,就是将mapper中的外部类引用设置为null,防止由于外部类无法序列化导致最终序列化失败,然后最终会对清理之后的mapper进行序列化验证。接下来我们继续分析,函数map是text类对象(数据源)调用的,而text的生成方式是
DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");
接下来我们先讲讲text的创建,也就是数据源的创建流程