上一篇我们讲到了ClosureCleaner的clean函数,这一篇我们继续往下分析,在clean函数中又调用了另外一个clean函数clean(func, level, checkSerializable, Collections.newSetFromMap(new IdentityHashMap<>()));代码如下:

private static void clean(Object func, ExecutionConfig.ClosureCleanerLevel level, boolean checkSerializable, Set<Object> visited) {
		if (func == null) {
			return;
		}
        //将func保存起来
		if (!visited.add(func)) {
			return;
		}
        //获取Object的Class对象
		final Class<?> cls = func.getClass();
        /*判断是否为基础类型还是包装类型,所谓基础类型就是int、float、long等这种非类对象类型
          包装类型就是基础类型对应的Integer、Float、Long等类对象类型。
          由于基础类型或者非基础类型
        */
		if (ClassUtils.isPrimitiveOrWrapper(cls)) {
			return;
		}
        //通过反射判断cls中是否有序列化函数writeReplace、writeObject,关于这两个函数的使用可以自己搜索一下,这里不详解
		if (usesCustomSerialization(cls)) {
			return;
		}

		// First find the field name of the "this$0" field, this can
		// be "this$x" depending on the nesting
		boolean closureAccessed = false;
        //开始遍历cls中的所有成员变量
		for (Field f: cls.getDeclaredFields()) {
            //这部分代码我们下面详细讲解一下
			if (f.getName().startsWith("this$")) {
				// found a closure referencing field - now try to clean
				closureAccessed |= cleanThis0(func, cls, f.getName());
			} else {
				Object fieldObject;
				try {
                    //将该变量设置为可访问
					f.setAccessible(true);
                    //获取该变量类对象
					fieldObject = f.get(func);
				} catch (IllegalAccessException e) {
					throw new RuntimeException(String.format("Can not access to the %s field in Class %s", f.getName(), func.getClass()));
				}

				/*
				 * we should do a deep clean when we encounter an anonymous class, inner class and local class, but should
				 * skip the class with custom serialize method.
				 *
				 * There are five kinds of classes (or interfaces):
				 * a) Top level classes
				 * b) Nested classes (static member classes)
				 * c) Inner classes (non-static member classes)
				 * d) Local classes (named classes declared within a method)
				 * e) Anonymous classes
				 */
				if (level == ExecutionConfig.ClosureCleanerLevel.RECURSIVE && needsRecursion(f, fieldObject)) {
					if (LOG.isDebugEnabled()) {
						LOG.debug("Dig to clean the {}", fieldObject.getClass().getName());
					}

					clean(fieldObject, ExecutionConfig.ClosureCleanerLevel.RECURSIVE, true, visited);
				}
			}
		}
        //如果需要检查func类对象的可序列化性
		if (checkSerializable) {
			try {
                //开始尝试序列化func,如果无法序列化那么就会抛出异常
				InstantiationUtil.serializeObject(func);
			}
			catch (Exception e) {
				String functionType = getSuperClassOrInterfaceName(func.getClass());

				String msg = functionType == null ?
						(func + " is not serializable.") :
						("The implementation of the " + functionType + " is not serializable.");

				if (closureAccessed) {
					msg += " The implementation accesses fields of its enclosing class, which is " +
							"a common reason for non-serializability. " +
							"A common solution is to make the function a proper (non-inner) class, or " +
							"a static inner class.";
				} else {
					msg += " The object probably contains or references non serializable fields.";
				}

				throw new InvalidProgramException(msg, e);
			}
		}
	}

我们来分析一下函数 needsRecursion,里面的代码如下:

private static boolean needsRecursion(Field f, Object fo) {
		return (fo != null &&
                //判断f是否为静态变量
				!Modifier.isStatic(f.getModifiers()) &&
                 //判断f是否为Transient
				!Modifier.isTransient(f.getModifiers()));
	}

这个函数主要用来判断f对象是否需要序列化,有以下几种情况不需要序列化:

1、该对象为null

2、该对象是一个静态变量

3、该对象设置来Transient,这个是不序列化的标志

关于Static和Transient的序列化问题,可以查看这里

接下来我们开始讲解最重要的部分,代码如下:

if (f.getName().startsWith("this$")) {
				// found a closure referencing field - now try to clean
				closureAccessed |= cleanThis0(func, cls, f.getName());
			}

如果Field类对象f的名称是this$开头,那么就开始进行清除操作,那么这里是什么意思呢?这个就涉及到内部类和外部类的情况,内部类对象中会自动包含一个引用,这个引用是它的外部类对象,也就是说如果在func类对象中找到类以this$开头的成员变量,那么就说明func类中还存在外部类,此时我们需要进行一些清除操作关于this$的资料

我们进入到cleanThis0函数中看看,代码如下:

private static boolean cleanThis0(Object func, Class<?> cls, String this0Name) {

		This0AccessFinder this0Finder = new This0AccessFinder(this0Name);
		getClassReader(cls).accept(this0Finder, 0);

		final boolean accessesClosure = this0Finder.isThis0Accessed();

		if (LOG.isDebugEnabled()) {
			LOG.debug(this0Name + " is accessed: " + accessesClosure);
		}

		if (!accessesClosure) {
			Field this0;
			try {
				this0 = func.getClass().getDeclaredField(this0Name);
			} catch (NoSuchFieldException e) {
				// has no this$0, just return
				throw new RuntimeException("Could not set " + this0Name + ": " + e);
			}

			try {
				this0.setAccessible(true);
                //将内部类的外部类引用变量设置为null,这样就能防止外部类无法序列化而导致最终序列化失败
				this0.set(func, null);
			}
			catch (Exception e) {
				// should not happen, since we use setAccessible
				throw new RuntimeException("Could not set " + this0Name + " to null. " + e.getMessage(), e);
			}
		}

		return accessesClosure;
	}

好了,到此ClosureCleaner的clean函数就讲完了,我们继续回到map函数中,代码如下:

/**
	 * Applies a Map transformation on a {@link DataStream}. The transformation
	 * calls a {@link MapFunction} for each element of the DataStream. Each
	 * MapFunction call returns exactly one element. The user can also extend
	 * {@link RichMapFunction} to gain access to other features provided by the
	 * {@link org.apache.flink.api.common.functions.RichFunction} interface.
	 *
	 * @param mapper
	 *            The MapFunction that is called for each element of the
	 *            DataStream.
	 * @param <R>
	 *            output type
	 * @return The transformed {@link DataStream}.
	 */
	public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
        // 通过java reflection抽出mapper的返回值类型
		TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
				Utils.getCallLocationName(), true);
        // 返回一个新的DataStream,SteramMap 为 StreamOperator 的实现类
		return transform("Map", outType, new StreamMap<>(clean(mapper)));
	}

clean函数总结一下,就是将mapper中的外部类引用设置为null,防止由于外部类无法序列化导致最终序列化失败,然后最终会对清理之后的mapper进行序列化验证。接下来我们继续分析,函数map是text类对象(数据源)调用的,而text的生成方式是

DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");

接下来我们先讲讲text的创建,也就是数据源的创建流程