为开发快照只转储部分表的 Postgres 转储

在生产中,我们的数据库只有几百 GB 大小。对于开发和测试,我们需要创建这个数据库的快照,这些快照在功能上是等价的,但是只有10或20GB 大小。

挑战在于,我们的业务实体的数据分散在许多表中。我们希望创建某种筛选快照,以便转储中只包含实体的 一些。这样我们就可以每个月左右为开发和测试获得新的快照。

例如,假设我们有一些具有多对多关系的实体:

  • 公司有 N 个部门
  • 组织有 N 个员工
  • 员工有 N 次出勤记录

大约有1000家公司,2500个部门,175000名员工,以及数千万的出勤记录。我们想要一种可复制的方式来拉动,比如说,前100家公司 及其所有组成部门、雇员和出勤记录

目前,我们对模式使用 pg _ dump,然后使用—— able-touch 和—— data-运行 pg _ dump,以便从较小的表中获取所有数据。我们不希望必须编写自定义脚本来提取部分数据,因为我们有一个快速的开发周期,并且担心自定义脚本会很脆弱,可能会过期。

我们要怎么做?是否有第三方工具可以帮助从数据库中提取逻辑分区?这些工具叫什么?

任何一般性的建议也非常感谢!

44640 次浏览

I don't know about any software which already does this, but I can think of 3 alternative solutions. Unfortunately, they all require some custom coding.

  1. Re-create all the tables in a separate schema, then copy into those tables only the subset of data you would like to dump, using INSERT INTO copy.tablename SELECT * FROM tablename WHERE ... and dump that.

  2. Write your own script for dumping data as SQL statements. I have used this approach in the past and it only took something like 20-30 lines of PHP.

  3. Modify pg_dump so it accepts a condition along with the -t switch when dumping a single table.

On your larger tables you can use the COPY command to pull out subsets...

COPY (SELECT * FROM mytable WHERE ...) TO '/tmp/myfile.tsv'


COPY mytable FROM 'myfile.tsv'

https://www.postgresql.org/docs/current/static/sql-copy.html

You should consider maintaining a set of development data rather than just pulling a subset of your production. In the case that you're writing unit tests, you could use the same data that is required for the tests, trying to hit all of the possible use cases.