How Pony (ORM) does its tricks?

Pony ORM does the nice trick of converting a generator expression into SQL. Example:

>>> select(p for p in Person if p.name.startswith('Paul'))
.order_by(Person.name)[:2]


SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2


[Person[3], Person[1]]
>>>

I know Python has wonderful introspection and metaprogramming builtin, but how this library is able to translate the generator expression without preprocessing? It looks like magic.

[update]

Blender wrote:

Here is the file that you're after. It seems to reconstruct the generator using some introspection wizardry. I'm not sure if it supports 100% of Python's syntax, but this is pretty cool. – Blender

I was thinking they were exploring some feature from the generator expression protocol, but looking this file, and seeing the ast module involved... No, they are not inspecting the program source on the fly, are they? Mind-blowing...

@BrenBarn: If I try to call the generator outside the select function call, the result is:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "<interactive input>", line 1, in <genexpr>
File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
% self.entity.__name__)
File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

Seems like they are doing more arcane incantations like inspecting the select function call and processing the Python abstract syntax grammar tree on the fly.

I still would like to see someone explaining it, the source is way beyond my wizardry level.

16086 次浏览

小马 ORM 的作者在这里。

Pony 通过三个步骤将 Python 生成器转换为 SQL 查询:

  1. Decompiling of generator bytecode and rebuilding generator AST (抽象语法树)
  2. 将 PythonAST 翻译成“抽象 SQL”——通用 SQL 查询的基于列表的表示形式
  3. Converting abstract SQL representation into specific 依赖数据库的 SQL 方言

最复杂的部分是第二步,小马必须 理解 Python 表达式的“意义”。似乎你是最 对第一步感兴趣,那么让我来解释一下反编译是如何工作的。

Let's consider this query:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

它将被翻译成以下 SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

下面是这个查询的结果,将被打印出来:

id|email              |password|name          |country|address
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode. 我们可以使用标准 python dis模块获得这个生成器的字节码指令:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
1           0 LOAD_FAST                0 (.0)
>>    3 FOR_ITER                26 (to 32)
6 STORE_FAST               1 (c)
9 LOAD_FAST                1 (c)
12 LOAD_ATTR                0 (country)
15 LOAD_CONST               0 ('USA')
18 COMPARE_OP               2 (==)
21 POP_JUMP_IF_FALSE        3
24 LOAD_FAST                1 (c)
27 YIELD_VALUE
28 POP_TOP
29 JUMP_ABSOLUTE            3
>>   32 LOAD_CONST               1 (None)
35 RETURN_VALUE

Pony ORM 在模块 pony.orm.decompiling中具有函数 decompile(),它可以 restore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

现在让我们看看 decompile()函数是如何工作的。

decompile()函数创建一个实现 Visitor 模式的 Decompiler对象。 The decompiler instance gets bytecode instructions one-by-one. 对于每条指令,反编译器对象调用自己的方法。 此方法的名称等于当前字节码指令的名称。

当 Python 计算一个表达式时,它使用栈,其中存储一个中间 反编译器对象也有自己的堆栈, 但是这个堆栈不存储表达式计算的结果, 而是表达式的 AST 节点。

当调用下一个字节码指令的反编译器方法时, 它从堆栈中获取 AST 节点,并将它们组合在一起 然后将这个节点放在堆栈的顶部。

例如,让我们看看如何计算子表达式 c.country == 'USA' 相应的字节码片段是:

              9 LOAD_FAST                1 (c)
12 LOAD_ATTR                0 (country)
15 LOAD_CONST               0 ('USA')
18 COMPARE_OP               2 (==)

因此,反编译器对象执行以下操作:

  1. Calls decompiler.LOAD_FAST('c'). 此方法将 Name('c')节点放在反编译器堆栈的顶部。
  2. 呼叫 decompiler.LOAD_ATTR('country')。 此方法从堆栈获取 Name('c')节点, 创建 Geattr(Name('c'), 'country')节点并将其放在堆栈的顶部。
  3. Calls decompiler.LOAD_CONST('USA'). 此方法将 Const('USA')节点放在堆栈的顶部。
  4. 呼叫 decompiler.COMPARE_OP('==')。 此方法从堆栈获取两个节点(Getattr 和 Const) , 然后把 Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) on the top of the stack.

处理完所有字节码指令后,反编译器堆栈包含 a single AST node which corresponds to the whole generator expression.

因为小马 ORM 需要反编译生成器 而且只有 Lambda 这没那么复杂,因为 生成器的指令流相对简单 它只是一堆嵌套的循环。

目前,Pony ORM 覆盖了整个生成器指令集,除了两件事:

  1. Inline if expressions: a if b else c
  2. 复合比较: a < b < c

如果小马遇到这样的表达式,它提出了 NotImplementedError异常。但即使在 在这种情况下,您可以通过将生成器表达式作为字符串传递来使其工作。 当以字符串形式传递生成器时,Pony 不使用反编译器模块 它使用标准的 Python compiler.parse函数获取 AST。

希望这能回答你的问题。