理解__ init_subclass__

我最终升级了我的 python 版本,并且发现了新添加的特性。在其他方面,我对新的 __init_subclass__方法感到困惑。来自文件:

只要包含类的子类为.cls,就会调用此方法 然后是新的子类。如果定义为普通的实例方法,则此 方法隐式转换为类方法。

因此,我开始按照文档中的例子,对它进行一些处理:

class Philosopher:
def __init_subclass__(cls, default_name, **kwargs):
super().__init_subclass__(**kwargs)
print(f"Called __init_subclass({cls}, {default_name})")
cls.default_name = default_name


class AustralianPhilosopher(Philosopher, default_name="Bruce"):
pass


class GermanPhilosopher(Philosopher, default_name="Nietzsche"):
default_name = "Hegel"
print("Set name to Hegel")


Bruce = AustralianPhilosopher()
Mistery = GermanPhilosopher()
print(Bruce.default_name)
print(Mistery.default_name)

产生以下输出:

Called __init_subclass(<class '__main__.AustralianPhilosopher'>, 'Bruce')
'Set name to Hegel'
Called __init_subclass(<class '__main__.GermanPhilosopher'>, 'Nietzsche')
'Bruce'
'Nietzsche'

我知道这个方法叫做 之后子类定义,但是我的问题主要是关于这个特性的用法。我也读了 PEP 487的文章,但帮助不大。这种方法在哪些方面有帮助?是为了:

  • 在创建时注册子类的超类?
  • 强制子类在定义时设置字段?

还有,我是否需要了解 __set_name__才能完全理解它的用法?

56887 次浏览

__init_subclass__ and __set_name__ are orthogonal mechanisms - they're not tied to each other, just described in the same PEP. Both are features that needed a full-featured metaclass before. The PEP 487 addresses two of the most common uses of metaclasses:

  • how to let the parent know when it is being subclassed (__init_subclass__)
  • how to let a descriptor class know the name of the property it is used for (__set_name__)

As PEP 487 says:

While there are many possible ways to use a metaclass, the vast majority of use cases falls into just three categories: some initialization code running after class creation, the initialization of descriptors and keeping the order in which class attributes were defined.

The first two categories can easily be achieved by having simple hooks into the class creation:

  • An __init_subclass__ hook that initializes all subclasses of a given class.
  • upon class creation, a __set_name__ hook is called on all the attribute (descriptors) defined in the class, and

The third category is the topic of another PEP, PEP 520.

Notice also, that while __init_subclass__ is a replacement for using a metaclass in this class's inheritance tree, __set_name__ in a descriptor class is a replacement for using a metaclass for the class that has an instance of the descriptor as an attribute.

PEP 487 sets out to take two common metaclass usecases and make them more accessible without having to understand all the ins and outs of metaclasses. The two new features, __init_subclass__ and __set_name__ are otherwise independent, they don't rely on one another.

__init_subclass__ is just a hook method. You can use it for anything you want. It is useful for both registering subclasses in some way, and for setting default attribute values on those subclasses.

We recently used this to provide 'adapters' for different version control systems, for example:

class RepositoryType(Enum):
HG = auto()
GIT = auto()
SVN = auto()
PERFORCE = auto()


class Repository():
_registry = {t: {} for t in RepositoryType}


def __init_subclass__(cls, scm_type=None, name=None, **kwargs):
super().__init_subclass__(**kwargs)
if scm_type is not None:
cls._registry[scm_type][name] = cls
    

class MainHgRepository(Repository, scm_type=RepositoryType.HG, name='main'):
pass


class GenericGitRepository(Repository, scm_type=RepositoryType.GIT):
pass

This trivially let us define handler classes for specific repositories without having to resort to using a metaclass or decorators.

The main point of __init_subclass__ was, as the title of the PEP suggest, to offer a simpler form of customization for classes.

It's a hook that allows you to tinker with classes w/o the need to know about metaclasses, keep track of all aspects of class construction or worry about metaclass conflicts down the line. As a message by Nick Coghlan on the early phase of this PEP states:

The main intended readability/maintainability benefit is from the perspective of more clearly distinguishing the "customises subclass initialisation" case from the "customises runtime behaviour of subclasses" case.

A full custom metaclass doesn't provide any indication of the scope of impact, while __init_subclass__ more clearly indicates that there's no persistent effects on behaviour post-subclass creation.

Metaclasses are considered magic for a reason, you don't know what their effects will be after the class will be created. __init_subclass__, on the other hand, is just another class method, it runs once and then it's done. (see its documentation for exact functionality.)


The whole point of PEP 487 is about simplifying (i.e removing the need to use) metaclasses for some common uses.

__init_subclass__ takes care of post-class initialization while __set_name__ (which makes sense only for descriptor classes) was added to simplify initializing descriptors. Beyond that, they aren't related.

The third common case for metaclasses (keeping definition order) which is mentioned, was also simplified. This was addressed w/o a hook, by using an ordered mapping for the namespace (which in Python 3.6 is a dict, but that's an implementation detail :-)

I would like to add some references related to metaclasses and __init_subclass__ that may be helpful.

Background

__init_subclass__ was introduced as an alternative to creating metaclasses. Here is a 2-minute summary of PEP 487 in a talk by one of the core developers, Brett Cannon.

Recommended References

  • Guido van Rossum's blog post on the early history of metaclasses in Python
  • Jake Vanderplas's blog post looking more deeply on implementing metaclasses

You can also use it to perform once-only expensive initializations on a class.

For example, I want to replace paths starting with my user with the standard tilde shorthand for home.

/Users/myuser/.profile -> ~/.profile.

Simple, I can write this:


from pathlib import Path


class Replacer:
def __init__(self):
self.home = str(Path("~").expanduser())


def replace(self, value):
if isinstance(value,str) and value.startswith(self.home):
value = value.replace(self.home,"~")
return value


replacer = Replacer()
print(replacer.replace("/Users/myuser/.profile"))


But, for any run the home path is constant and there is no need to compute it every time a Replacer gets created.

Using __init_subclass, I can do it only once for the class. Yes, I could also assign the variable to the class at module initialization time:

class Replacer:


home = str(Path("~").expanduser())
...


b̵u̵t̵ ̵t̵h̵e̵r̵e̵ ̵m̵a̵y̵ ̵b̵e̵ ̵r̵e̵a̵s̵o̵n̵s̵ ̵t̵o̵ ̵w̵a̵n̵t̵ ̵t̵o̵ ̵d̵e̵f̵e̵r̵ ̵t̵h̵a̵t̵ ̵c̵o̵m̵p̵u̵t̵a̵t̵i̵o̵n̵ ̵u̵n̵t̵i̵l̵ ̵t̵h̵e̵ ̵c̵l̵a̵s̵s̵ ̵a̵c̵t̵u̵a̵l̵l̵y̵ ̵g̵e̵t̵s̵ ̵u̵s̵e̵d̵ ̵f̵o̵r̵ ̵t̵h̵e̵ ̵f̵i̵r̵s̵t̵ ̵t̵i̵m̵e̵.̵ ̵ ̵F̵o̵r̵ ̵e̵x̵a̵m̵p̵l̵e̵,̵ ̵w̵h̵e̵n̵ ̵w̵o̵r̵k̵i̵n̵g̵ ̵w̵i̵t̵h̵ ̵D̵j̵a̵n̵g̵o̵,̵ ̵u̵n̵d̵e̵r̵ ̵s̵o̵m̵e̵ ̵c̵o̵n̵d̵i̵t̵i̵o̵n̵s̵ ̵w̵h̵e̵n̵ ̵y̵o̵u̵ ̵i̵m̵p̵o̵r̵t̵ ̵̵m̵o̵d̵e̵l̵s̵.̵p̵y̵̵,̵ ̵D̵j̵a̵n̵g̵o̵ ̵m̵a̵y̵ ̵n̵o̵t̵ ̵h̵a̵v̵e̵ ̵f̵u̵l̵l̵y̵ ̵i̵n̵i̵t̵i̵a̵l̵i̵z̵e̵d̵ ̵i̵t̵s̵e̵l̵f̵ ̵y̵e̵t̵.

The above is incorrect. The init subclass is executed at class definition, not first use.

However, unlike the home = assignment in the class body, the Replacer class actually exists when init subclass is called and is provided as the cls argument to that method.

class UselessAncestorNeededToHouseInitSubclass:
"do-nothing"


def __init_subclass__(cls, /, **kwargs):
print("__init_subclass__")
super().__init_subclass__(**kwargs)
cls.home = str(Path("~").expanduser())


class Replacer(UselessAncestorNeededToHouseInitSubclass):
"""__init_subclass__ wont work if defined here.  It has to be on
an ancestor
"""


def replace(self, value):
if isinstance(value,str) and value.startswith(self.home):
value = value.replace(self.home,"~")
return value


for ix in range(0,10):
replacer = Replacer()
print(replacer.replace("/Users/myuser/.profile"))

Output: (notice how subclass_init only gets called once):

__init_subclass__
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile
~/.profile