PostgreSQL 临时表

我需要执行250万次查询。这个查询生成一些我需要 AVG(column)的行,然后使用这个 AVG从低于平均值的所有值中筛选表。然后我需要把 INSERT这些过滤后的结果放到一个表中。

要以合理的效率做到这一点,唯一的方法似乎是为每个 query-postmaster python-thread 创建一个 TEMPORARY TABLE。我只是希望这些 TEMPORARY TABLE将不会被持久化到硬盘驱动器(根本) ,并将留在内存(RAM) ,除非他们没有工作内存,当然。

我想知道一个临时表是否会引起磁盘写操作(这会干扰 INSERTS,即整个进程慢下来)

94844 次浏览

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.

EDIT: If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.

Temporary table are, however, dropped at the end of a database session:

Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction.

There are multiple considerations you have to take into account:

  • If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
  • In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
  • While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
  • Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).

Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.