在 Python 中导入模块——最佳实践

我是 Python 的新手,因为我想扩展我使用 R 学到的技能。 在 R 中,我倾向于加载大量的库,有时会导致函数名冲突。

Python 中的最佳实践是什么? 我已经看到了一些具体的变化,但我没有看到它们之间的区别

import pandasfrom pandas import *,和 from pandas import DataFrame

前两者之间有什么区别? 我应该只导入我需要的东西吗。 另外,对于那些编写小程序来处理数据和计算简单统计数据的人来说,最坏的结果是什么。

更新

我找到了这个 很好的向导解释了一切。

64144 次浏览

import pandas imports the pandas module under the pandas namespace, so you would need to call objects within pandas using pandas.foo.

from pandas import * imports all objects from the pandas module into your current namespace, so you would call objects within pandas using only foo. Keep in mind this could have unexepcted consequences if there are any naming conflicts between your current namespace and the pandas namespace.

from pandas import DataFrame is the same as above, but only imports DataFrame (instead of everything) into your current namespace.

In my opinion the first is generally best practice, as it keeps the different modules nicely compartmentalized in your code.

In general it is better to do explicit imports. As in:

import pandas
frame = pandas.DataFrame()

Or:

from pandas import DataFrame
frame = DataFrame()

Another option in Python, when you have conflicting names, is import x as y:

from pandas import DataFrame as PDataFrame
from bears import DataFrame as BDataFrame
frame1 = PDataFrame()
frame2 = BDataFrame()

They are all suitable in different contexts (which is why they are all available). There's no deep guiding principle, other than generic motherhood statements around clarity, maintainability and simplicity. Some examples from my own code:

  1. import sys, os, re, itertools avoids name collisions and provides a very succinct way to import a bunch of standard modules.
  2. from math import * lets me write sin(x) instead of math.sin(x) in math-heavy code. This gets a bit dicey when I also import numpy, which doubles up on some of these, but it doesn't overly concern me, since they are generally the same functions anyway. Also, I tend to follow the numpy documentation — import numpy as np — which sidesteps the issue entirely.
  3. I favour from PIL import Image, ImageDraw just because that's the way the PIL documentation presents its examples.
from A import B

essentially equals following three statements

import A
B = A.B
del A

That's it, that is it all.

Disadvantage of each form

When reading other people's code (and those people use very different importing styles), I noticed the following problems with each of the styles:

import modulewithaverylongname will clutter the code further down with the long module name (e.g. concurrent.futures or django.contrib.auth.backends) and decrease readability in those places.

from module import * gives me no chance to see syntactically that, for instance, classA and classB come from the same module and have a lot to do with each other. It makes reading the code hard. (That names from such an import may shadow names from an earlier import is the least part of that problem.)

from module import classA, classB, functionC, constantD, functionE overloads my short-term memory with too many names that I mentally need to assign to module in order to coherently understand the code.

import modulewithaverylongname as mwvln is sometimes insufficiently mnemonic to me.

A suitable compromise

Based on the above observations, I have developed the following style in my own code:

import module is the preferred style if the module name is short as for example most of the packages in the standard library. It is also the preferred style if I need to use names from the module in only two or three places in my own module; clarity trumps brevity then ("Readability counts").

import longername as ln is the preferred style in almost every other case. For instance, I might import django.contrib.auth.backends as djcab. By definition of criterion 1 above, the abbreviation will be used frequently and is therefore sufficiently easy to memorize.

Only these two styles are fully pythonic as per the "Explicit is better than implicit." rule.

from module import xx still occurs sometimes in my code. I use it in cases where even the as format appears exaggerated, the most famous example being from datetime import datetime (but if I need more elements, I will import datetime as dt).

Here are some recommendations from PEP8 Style Guide.

  1. Imports should usually be on separate lines, e.g.:

    Yes: import os
    import sys
    
    
    No:  import sys, os
    

    but it is okay to

    from subprocess import Popen, PIPE
    
  2. Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

    • Imports should be grouped in the following order:
      1. standard library imports
      2. related third party imports
      3. local application/library specific imports
    • You should put a blank line between each group of imports.
  3. Absolute imports are recommended
    They are more readable and make debugging easier by giving better error messages in case you mess up import system.

    import mypkg.sibling
    from mypkg import sibling
    from mypkg.sibling import example
    

    or explicit relative imports

    from . import sibling
    from .sibling import example
    
  4. Implicit relative imports should never be used and is removed in Python 3.

    No:  from ..grand_parent_package import uncle_package
    
  5. Wildcard imports ( from <module> import * ) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.


Some recommendations about lazy imports from python speed performance tips.

Import Statement Overhead

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

the given below is a scenario explained at the page,

>>> def doit1():
... import string
... string.lower('Python')
...
>>> import string
>>> def doit2():
... string.lower('Python')
...
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623