Setuptools: 包数据文件夹位置

我使用 setuptools 分发我的 python 包。

根据我从 setuptools 文档中收集的信息,我需要将数据文件放在包目录中。但是,我更希望将我的数据文件放在根目录的子目录中。

我想避免的是:

/ #root
|- src/
|  |- mypackage/
|  |  |- data/
|  |  |  |- resource1
|  |  |  |- [...]
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

我想要的是:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

我只是不喜欢有这么多子目录,如果它不是必不可少的。我没有找到原因,为什么我/必须/把文件放在包目录中。在我看来,使用这么多嵌套的子目录工作也很麻烦。或者有什么好的理由证明这种限制是合理的?

56885 次浏览

I think that you can basically give anything as an argument *data_files* to setup().

Option 1: Install as package data

The main advantage of placing data files inside the root of your Python package is that it lets you avoid worrying about where the files will live on a user's system, which may be Windows, Mac, Linux, some mobile platform, or inside an Egg. You can always find the directory data relative to your Python package root, no matter where or how it is installed.

For example, if I have a project layout like so:

project/
foo/
__init__.py
data/
resource1/
foo.txt

You can add a function to __init__.py to locate an absolute path to a data file:

import os


_ROOT = os.path.abspath(os.path.dirname(__file__))
def get_data(path):
return os.path.join(_ROOT, 'data', path)


print get_data('resource1/foo.txt')

Outputs:

/Users/pat/project/foo/data/resource1/foo.txt

After the project is installed as an Egg the path to data will change, but the code doesn't need to change:

/Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt

Option 2: Install to fixed location

The alternative would be to place your data outside the Python package and then either:

  1. Have the location of data passed in via a configuration file, command line arguments or
  2. Embed the location into your Python code.

This is far less desirable if you plan to distribute your project. If you really want to do this, you can install your data wherever you like on the target system by specifying the destination for each group of files by passing in a list of tuples:

from setuptools import setup
setup(
...
data_files=[
('/var/data1', ['data/foo.txt']),
('/var/data2', ['data/bar.txt'])
]
)

Updated: Example of a shell function to recursively grep Python files:

atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; }
atlas% grep_py ": \["
./setup.py:9:    package_data={'foo': ['data/resource1/foo.txt']}

I Think I found a good compromise which will allow you to mantain the following structure:

/ #root
|- data/
|  |- resource1
|  |- [...]
|- src/
|  |- mypackage/
|  |  |- __init__.py
|  |  |- [...]
|- setup.py

You should install data as package_data, to avoid the problems described in samplebias answer, but in order to mantain the file structure you should add to your setup.py:

try:
os.symlink('../../data', 'src/mypackage/data')
setup(
...
package_data = {'mypackage': ['data/*']}
...
)
finally:
os.unlink('src/mypackage/data')

This way we create the appropriate structure "just in time", and mantain our source tree organized.

To access such data files within your code, you 'simply' use:

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

I still don't like having to specify 'mypackage' in the code, as the data could have nothing to do necessarally with this module, but i guess its a good compromise.

I could use importlib_resources or importlib.resources (depending on python version).

https://importlib-resources.readthedocs.io/en/latest/using.html