Python Closure Referenced Before Assignment

On By In 1

写下这篇博客,起源于Tornado邮件群组的这个问题 how to use outer variable in inner method,这里面老外的回答很有参考价值,关键点基本都说到了。我在这里用一些有趣的例子来做些解析,简要的阐述下Python的闭包规则,首先看一个经典的例子:

deffoo():a=1defbar():a=a+1# print a + 1# b = a + 1# a = 1printid(a)bar()printa,id(a)

在Python2.x上运行这个函数会报UnboundLocalError: local variable 'a' referenced before assignment即本地变量在引用前未定义,如何来理解这个错误呢?PEP 227里面介绍到,Python解析器在搜索一个变量的定义时是根据如下三级规则来查找的:

The Python 2.0 definition specifies exactly three namespaces to check for each name -- the local namespace, the global namespace, and the builtin namespace.

这里的local实际上可能还有多级,上面的代码就是一个例子,下面通过对代码做些简单的修改来一步步理解这里面的规律:

  • 如果将a = a + 1这句换成print a + 1或者b = a + 1,是不会有问题的,即在内部函数bar内,外部函数foo里的a实际是可见的,可以引用。
  • 将a = a + 1换成 a = 1也是没有问题的,但是如果你将两处出现的a的id打印出来你会发现,其实这两个a不是一回事,在内部函数bar里面,本地的a = 1定义了在bar函数范围内的新的一个局部变量,因为名字和外部函数foo里面的变量a名字相同,导致外部函数foo里的a在内部函数bar里实际已不可见。
  • 再来说a = a + 1出错是怎么回事,首先a = xxx这种形式,Python解析器认为要在内部函数bar内创建一个新的局部变量a,同时外部函数foo里的a在bar里已不可见,而解析器对接下来对右边的a + 1的解析就是用本地的变量a加1,而这时左边的a即本地的变量a还没有创建(等右边赋值呢),因此就这就产生了一个是鸡生蛋还是蛋生鸡的问题,导致了上面说的UnboundLocalError的错误。

要解决这个问题,在Python2.x里主要有两个方案:

  1. 用别名替代比如b = a + 1,内部函数bar内只引用外部函数foo里的a。

  2. 将foo里的a设成一个容器,如list

    deffoo():a=[1,]defbar():a[0]=a[0]+1bar()printa[0]

当然这有些时候还是很不方便,因此在Python3.x中引入了一个nonloacal的关键字来解决这个问题,只要在a = a + 1前加一句nonloacal a即可,即显式的指定a不是内部函数bar内的本地变量,这样就可以在bar内正常的使用和再赋值外部函数foo内的变量a了。

在搜索Python闭包相关的材料中,我在StackOverflow上发现一个有趣的有关Python闭包的问题,有兴趣的可以思考思考做做看,结果应该是什么?你预期的结果是什么,若不一致,如果要得到你预期的结果应该怎么改?

flist=[]foriinxrange(3):deffunc(x):returnx*iflist.append(func)forfinflist:printf(2)

扩展阅读:

  1. PEP 227 -- Statically Nested Scopes
  2. PEP 3104 -- Access to Names in Outer Scopes
  3. Lexical closures in Python

If you're closely following the Python tag on StackOverflow, you'll notice that the same question comes up at least once a week. The question goes on like this:

x = 10deffoo(): x += 1print x foo()

Why, when run, this results in the following error:

Traceback (most recent call last): File "unboundlocalerror.py", line 8, in <module> foo() File "unboundlocalerror.py", line 4, in foo x += 1 UnboundLocalError: local variable 'x' referenced before assignment

There are a few variations on this question, with the same core hiding underneath. Here's one:

lst = [1, 2, 3] deffoo(): lst.append(5) # OK#lst += [5] # ERROR here foo() print lst

Running the statement successfully appends 5 to the list. However, substitute it for , and it raises , although at first sight it should accomplish the same.

Although this exact question is answered in Python's official FAQ (right here), I decided to write this article with the intent of giving a deeper explanation. It will start with a basic FAQ-level answer, which should satisfy one only wanting to know how to "solve the damn problem and move on". Then, I will dive deeper, looking at the formal definition of Python to understand what's going on. Finally, I'll take a look what happens behind the scenes in the implementation of CPython to cause this behavior.

The simple answer

As mentioned above, this problem is covered in the Python FAQ. For completeness, I want to explain it here as well, quoting the FAQ when necessary.

Let's take the first code snippet again:

x = 10deffoo(): x += 1print x foo()

So where does the exception come from? Quoting the FAQ:

This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope.

But is similar to , so it should first read , perform the addition and then assign back to . As mentioned in the quote above, Python considers a variable local to , so we have a problem - a variable is read (referenced) before it's been assigned. Python raises the exception in this case [1].

So what do we do about this? The solution is very simple - Python has the global statement just for this purpose:

x = 10deffoo(): global x x += 1print x foo()

This prints , without any errors. The statement tells Python that inside , refers to the global variable , even if it's assigned in .

Actually, there is another variation on the question, for which the answer is a bit different. Consider this code:

defexternal(): x = 10definternal(): x += 1print(x) internal() external()

This kind of code may come up if you're into closures and other techniques that use Python's lexical scoping rules. The error this generates is the familiar . However, applying the "global fix":

defexternal(): x = 10definternal(): global x x += 1print(x) internal() external()

Doesn't help - another error is generated: . Python is right here - after all, there's no global variable named , there's only an in . It may be not local to , but it's not global. So what can you do in this situation? If you're using Python 3, you have the keyword. Replacing by in the last snippet makes everything work as expected. is a new statement in Python 3, and there is no equivalent in Python 2 [2].

The formal answer

Assignments in Python are used to bind names to values and to modify attributes or items of mutable objects. I could find two places in the Python (2.x) documentation where it's defined how an assignment to a local variable works.

One is section 6.2 "Assignment statements" in the Simple Statements chapter of the language reference:

Assignment of an object to a single target is recursively defined as follows. If the target is an identifier (name):

  • If the name does not occur in a global statement in the current code block: the name is bound to the object in the current local namespace.
  • Otherwise: the name is bound to the object in the current global namespace.

Another is section 4.1 "Naming and binding" of the Execution model chapter:

If a name is bound in a block, it is a local variable of that block.

[...]

When a name is used in a code block, it is resolved using the nearest enclosing scope. [...] If the name refers to a local variable that has not been bound, a UnboundLocalError exception is raised.

This is all clear, but still, another small doubt remains. All these rules apply to assignments of the form which clearly bind to . But the code snippets we're having a problem with here have the assignment. Shouldn't that just modify the bound value, without re-binding it?

Well, no. and its cousins (, , etc.) are what Python calls "augmented assignment statements" [emphasis mine]:

An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once.

An augmented assignment expression like can be rewritten as to achieve a similar, but not exactly equal effect. In the augmented version, is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations.

So when earlier I said that is similar to, I wasn't telling all the truth, but it was accurate with respect to binding. Apart for possible optimization, counts exactly as when binding is considered. If you think carefully about it, it's unavoidable, because some types Python works with are immutable. Consider strings, for example:

x = "abc" x += "def"

The first line binds to the value "abc". The second line doesn't modify the value "abc" to be "abcdef". Strings are immutable in Python. Rather, it creates the new value "abcdef" somewhere in memory, and re-binds to it. This can be seen clearly when examining the object ID for before and after the :

>>> x = "abc" >>> id(x) 11173824 >>> x += "def" >>> id(x) 32831648 >>> x 'abcdef'

Note that some types in Python are mutable. For example, lists can actually be modified in-place:

>>> y = [1, 2] >>> id(y) 32413376 >>> y += [2, 3] >>> id(y) 32413376 >>> y [1, 2, 2, 3]

didn't change after , because the object referenced was just modified. Still, Python re-bound to the same object [3].

The "too much information" answer

This section is of interest only to those curious about the implementation internals of Python itself.

One of the stages in the compilation of Python into bytecode is building the symbol table [4]. An important goal of building the symbol table is for Python to be able to mark the scope of variables it encounters - which variables are local to functions, which are global, which are free (lexically bound) and so on.

When the symbol table code sees a variable is assigned in a function, it marks it as local. Note that it doesn't matter if the assignment was done before usage, after usage, or maybe not actually executed due to a condition in code like this:

x = 10deffoo(): if something_false_at_runtime: x = 20print(x)

We can use the module to examine the symbol table information gathered on some Python code during compilation:

importsymtable code = '''x = 10def foo(): x += 1 print(x)''' table = symtable.symtable(code, '<string>', 'exec') foo_namespace = table.lookup('foo').get_namespace() sym_x = foo_namespace.lookup('x') print(sym_x.get_name()) print(sym_x.is_local())

This prints:

So we see that was marked as local in . Marking variables as local turns out to be important for optimization in the bytecode, since the compiler can generate a special instruction for it that's very fast to execute. There's an excellent article here explaining this topic in depth; I'll just focus on the outcome.

The function in handles variable name references. To generate the correct opcode, it queries the symbol table function . For our , this returns a bitfield with in it. Having seen , generates a . We can see this in the disassembly of :

35 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_FAST 0 (x) 36 10 LOAD_GLOBAL 0 (print) 13 LOAD_FAST 0 (x) 16 CALL_FUNCTION 1 19 POP_TOP 20 LOAD_CONST 0 (None) 23 RETURN_VALUE

The first block of instructions shows what was compiled to. You will note that already here (before it's actually assigned), is used to retrieve the value of .

This is the instruction that will cause the exception to be raised at runtime, because it is actually executed before any is done for . The gory details are in the bytecode interpreter code in :

TARGET(LOAD_FAST) x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); FAST_DISPATCH(); } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Ignoring the macro-fu for the moment, what this basically says is that once is seen, the value of is obtained from an indexed array of objects [5]. If no was done before, this value is still , the branch is not taken [6] and the exception is raised.

You may wonder why Python waits until runtime to raise this exception, instead of detecting it in the compiler. The reason is this code:

x = 10deffoo(): if something_true(): x = 1 x += 1print(x)

Suppose is a function that returns , possibly due to some user input. In this case, binds locally, so the reference to it in is no longer unbound. This code will then run without exceptions. Of course if actually turns out to return , the exception will be raised. Python has no way to resolve this at compile time, so the error detection is postponed to runtime.


0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *