Python源码分析

上一篇分析了函数参数的分析后,本文分析函数闭包的实现。函数闭包即函数定义和函数表达式位于另一个函数的函数体内。而且,这些内部函数可以访问它们所在的外部函数中声明的所有局部变量、参数和声明的其他内部函数。
我们看看函数闭包在Python中的实现。

分析

先查看脚本文件代码;

def get_func():
    value = 'inner'
    def inner_func():
        print(value)
    return inner_func

show_func = get_func()
show_func()

该脚本文件会生成三个PyCodeObject,其中脚本字节码为;

1           0 LOAD_CONST               0 (<code object get_func at 0x106f78430, file "test_closure.py", line 1>)
              3 MAKE_FUNCTION            0
              6 STORE_NAME               0 (get_func)

  7           9 LOAD_NAME                0 (get_func)
             12 CALL_FUNCTION            0
             15 STORE_NAME               1 (show_func)

  8          18 LOAD_NAME                1 (show_func)
             21 CALL_FUNCTION            0
             24 POP_TOP             
             25 LOAD_CONST               1 (None)
             28 RETURN_VALUE

get_func对应的字节码如下;

2           0 LOAD_CONST               1 ('inner')
              3 STORE_DEREF              0 (value)

  3           6 LOAD_CLOSURE             0 (value)
              9 BUILD_TUPLE              1
             12 LOAD_CONST               2 (<code object inner_func at 0x102de3d30, file "<stdin>", line 3>)
             15 MAKE_CLOSURE             0
             18 STORE_FAST               0 (inner_func)

  5          21 LOAD_FAST                0 (inner_func)
             24 RETURN_VALUE

对应inner_func的字节码如下;

4           0 LOAD_GLOBAL              0 (pirnt)
              3 LOAD_DEREF               0 (value)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

在Python中闭包的创建时基于嵌套函数实现的,所以闭包相关参数会存放在PyCodeObject中,其中主要的两个值为co_cellvars和co_freevars,其中co_cellvars表示的是嵌套作用域内的变量集合,通常是tuple;co_freevars表示的是嵌套作用域外的变量集合,通常是tuple,
此时我们先查看脚本文件字节码;

7           9 LOAD_NAME                0 (get_func)
             12 CALL_FUNCTION            0

此时就是调用了get_func函数,此时会调用字节码解释器解释执行get_func对应的字节码,此时调用会进入PyEval_EvalCodeEx执行,

PyObject *
PyEval_EvalCodeEx(PyCodeObject *co, PyObject *globals, PyObject *locals,
       PyObject **args, int argcount, PyObject **kws, int kwcount,
       PyObject **defs, int defcount, PyObject *closure)
{
    register PyFrameObject *f;
    register PyObject *retval = NULL;
    register PyObject **fastlocals, **freevars;
    PyThreadState *tstate = PyThreadState_GET();
    PyObject *x, *u;
    ...
    if (PyTuple_GET_SIZE(co->co_cellvars)) {   // 获取嵌套变量数量
        int i, j, nargs, found;
        char *cellname, *argname;
        PyObject *c;

        nargs = co->co_argcount;              // 获取函数输入参数
        if (co->co_flags & CO_VARARGS)       //获取嵌套的默认位置参数
            nargs++;
        if (co->co_flags & CO_VARKEYWORDS)
            nargs++;                          // 获取嵌套的默认键参数

        /* Initialize each cell var, taking into account
           cell vars that are initialized from arguments.

           Should arrange for the compiler to put cellvars
           that are arguments at the beginning of the cellvars
           list so that we can march over it more efficiently?
        */
        for (i = 0; i < PyTuple_GET_SIZE(co->co_cellvars); ++i) {                       // 获取嵌套函数名的名称
            cellname = PyString_AS_STRING(
                PyTuple_GET_ITEM(co->co_cellvars, i));
            found = 0;
            ...
            if (found == 0) {         // 如果没有找到就生成一个Null的PyCell对象
                c = PyCell_New(NULL);
                if (c == NULL)
                    goto fail;
                SETLOCAL(co->co_nlocals + i, c);  // 将对象设置到运行栈中
            }
        }
    }
    ...
}

可以看到在运行栈中把嵌套设置的值生成了PyCell对象,

typedef struct {
    PyObject_HEAD
    PyObject *ob_ref;   /* Content of the cell or NULL when empty */
} PyCellObject;

对象生成中保存一个ob_ref对象,该ob_ref指向任意的类型,此时也是讲PyCell设置到运行栈中,然后执行gun_func字节码;

2           0 LOAD_CONST               1 ('inner')
              3 STORE_DEREF              0 (value)

  3           6 LOAD_CLOSURE             0 (value)
              9 BUILD_TUPLE              1
             12 LOAD_CONST               2 (<code object inner_func at 0x102de3d30, file "<stdin>", line 3>)
             15 MAKE_CLOSURE             0
             18 STORE_FAST               0 (inner_func)

  5          21 LOAD_FAST                0 (inner_func)
             24 RETURN_VALUE

先执行LOAD_CONST,将inner值加入到栈中,然后出现了STORE_DEREF,

case STORE_DEREF:
            w = POP();
            x = freevars[oparg];
            PyCell_Set(x, w);
            Py_DECREF(w);
            continue;

其中freevars就是对应的作用域外的变量集合,此时就调用PyCell_Set

#define PyCell_SET(op, v) (((PyCellObject *)(op))->ob_ref = v)
int
PyCell_Set(PyObject *op, PyObject *obj)
{
    if (!PyCell_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    Py_XDECREF(((PyCellObject*)op)->ob_ref);
    Py_XINCREF(obj);
    PyCell_SET(op, obj);
    return 0;
}

此时就在脚本字节码执行中生成的PyCell的ob_ref指向了w,也就是’inner’, 然后执行LOAD_CLOSURE;

case LOAD_CLOSURE:
            x = freevars[oparg];
            Py_INCREF(x);
            PUSH(x);
            if (x != NULL) continue;
            break;

此时将压入栈中的参数就压入栈中,然后调用BUILD_TUPLE 1;

case BUILD_TUPLE:
            x = PyTuple_New(oparg);
            if (x != NULL) {
                for (; --oparg >= 0;) {
                    w = POP();
                    PyTuple_SET_ITEM(x, oparg, w);
                }
                PUSH(x);
                continue;
            }
            break;

将刚刚压栈的PyCell对象压入一个生成的元组中,并将生成的元组压栈;然后执行LOAD_CONST 2 将inner_func对应的字节码压栈,然后执行MAKE_CLOSURE 0;

case MAKE_CLOSURE:
        {
            v = POP(); /* code object */
            x = PyFunction_New(v, f->f_globals);     // 获取执行函数的字节码,生成PyFunction类型
            Py_DECREF(v);
            if (x != NULL) {
                v = POP();                           // 获取压栈生成的元组对象,该元组对象的元素为PyCell对象
                err = PyFunction_SetClosure(x, v);    // 设置到压栈中
                Py_DECREF(v);
            }
            if (x != NULL && oparg > 0) {
                v = PyTuple_New(oparg);
                if (v == NULL) {
                    Py_DECREF(x);
                    x = NULL;
                    break;
                }
                while (--oparg >= 0) {
                    w = POP();
                    PyTuple_SET_ITEM(v, oparg, w);
                }
                err = PyFunction_SetDefaults(x, v);
                Py_DECREF(v);
            }
            PUSH(x);
            break;
        }

先获取函数的字节码然后再,根据压入的元组将包含PyCell的元组调用PyFunction_SetClosure处理;

int
PyFunction_SetClosure(PyObject *op, PyObject *closure)
{
    if (!PyFunction_Check(op)) {
        PyErr_BadInternalCall();
        return -1;
    }
    if (closure == Py_None)
        closure = NULL;
    else if (PyTuple_Check(closure)) {
        Py_INCREF(closure);
    }
    else {
        PyErr_Format(PyExc_SystemError, 
                 "expected tuple for closure, got '%.100s'",
                 closure->ob_type->tp_name);
        return -1;
    }
    Py_XDECREF(((PyFunctionObject *) op) -> func_closure);
    ((PyFunctionObject *) op) -> func_closure = closure;    // 设置到函数对应的闭包值
    return 0;
}

此时会将生成的f返回然后脚本文件的字节码会执行到

8          18 LOAD_NAME                1 (show_func)
             21 CALL_FUNCTION            0

此时就相当于执行刚刚带闭包函数的f,此时会执行inner_func对应的字节码,

PyObject *
PyEval_EvalCodeEx(PyCodeObject *co, PyObject *globals, PyObject *locals,
       PyObject **args, int argcount, PyObject **kws, int kwcount,
       PyObject **defs, int defcount, PyObject *closure)
{
    ...
    if (PyTuple_GET_SIZE(co->co_freevars)) {   // 获取外层作用域中的变量表
        int i;
        for (i = 0; i < PyTuple_GET_SIZE(co->co_freevars); ++i) {              // 获取闭包中对应的值,然后设置到freevars对应的位置中,freevars也对应于运行栈中的freevars
            PyObject *o = PyTuple_GET_ITEM(closure, i);
            Py_INCREF(o);
            freevars[PyTuple_GET_SIZE(co->co_cellvars) + i] = o;
        }
    }
    ...
}

然后执行inner_func对应的字节码

4           0 LOAD_GLOBAL              0 (pirnt)
              3 LOAD_DEREF               0 (value)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

在inner_func执行到
此时会执行到LOAD_DEREF 0;

case LOAD_DEREF:
            x = freevars[oparg];   // 从运行栈获取对应的变量值
            w = PyCell_Get(x);     // 获取PyCell的内容
            if (w != NULL) {
                PUSH(w);
                continue;
            }
            ...

由于freevars对应于运行栈,所以也是通过索引值进行了访问,此时oparg为0,便可直接获取’inner’的值,然后执行余下的字节码,然后打印值。至此函数闭包的基础实现便完成了,闭包复杂的用户可以自行分析。