Postgreql 错误 PANIC: 找不到有效的检查点记录

当我加载 postgres 服务器(v9.0.1)时,我得到了一个阻止它启动的恐慌:

恐慌: 无法找到有效的检查点记录

我该怎么补救?

116557 次浏览

It's looking for a checkpoint record in the transaction log that probably doesn't exist or is corrupted. You can determine if this is the case by running:

# Postgres >= 10
pg_resetwal DATADIR


# Postgres < 10
pg_resetxlog DATADIR

If the transaction log is corrupt, you'll see a message like:

The database server was not shut down cleanly.
Resetting the transaction log might cause data to be lost.
If you want to proceed anyway, use `-f` to force reset.

You can then follow the instructions and run with -f to force the update:

# Postgres >= 10
pg_resetwal -f DATADIR


# Postgres < 10
pg_resetxlog -f DATADIR

That should reset the transaction log. However, it could leave your database in an indeterminate state as explained in the PostgreSQL documentation on pg_resetwal:

If pg_resetwal complains that it cannot determine valid data for pg_control, you can force it to proceed anyway by specifying the -f (force) option. In this case plausible values will be substituted for the missing data. Most of the fields can be expected to match, but manual assistance might be needed for the next OID, next transaction ID and epoch, next multitransaction ID and offset, and WAL starting location fields. These fields can be set using the options discussed below. If you are not able to determine correct values for all these fields, -f can still be used, but the recovered database must be treated with even more suspicion than usual: an immediate dump and reload is imperative. Do not execute any data-modifying operations in the database before you dump, as any such action is likely to make the corruption worse.

Do you do continuous archiving? If you are backing up at the time, you may find it more prudent to remove backup_label. pg_resetxlog is a severe thing.

just like the log saying : could not locate a valid checkpoint record.Postgres can't find a properly WAL under the $PGDATA/pg_xlog/ directory. Try to use pg_resetxlog

I'm running 9.1.7 and i find ran the following successfully:

/usr/lib/postgresql/9.1/bin/pg_resetxlog -f /var/lib/postgresql/9.1/main

Your final argument to the pg_resetxlog command should be the location on disk where postgres stores your database data.

As indicated here pg_resetxlog should not be run. The answers that refer to this is bad advice. Assuming the error occured in a context of copy/replication instance, the link provides a more succinct way of doing copy/replication with pg_basebackup

I came across here with a Docker Postgresql-13 which did not start again. I fixed it by finding the volume (for the data) and running

Being in the volume data folder, e.g., /var/lib/docker/volumes/c4c8d637d9eee086265d732b2974690b731abcb23f47ca61bf75fe28526e31ce/_data

run as owner of the directory (for me it was the systemd-coredump user)

sudo -u systemd-coredump /usr/lib/postgresql/13/bin/pg_resetwal -f .

For sure you need the same Postgresql version installed (if the pg_resetwal is not part of the volume)

worked

This answer is for Postgres 14. I was getting the same error in the standby logs after the following steps:

  1. Login to stand-by.
  2. Create back up with the below command:

pg_basebackup -D $APP_BACKUP_PATH -F t -P -v -U replicator -w --no-password -h 10.29.51.98

  1. Extract generated $APP_BACKUP_PATH/base.tar to standby data directory.

  2. Restart the standby. The start up fails with : PANIC: could not locate a valid checkpoint record.

  3. So, it turns out that the backup was not generated right. It needs to be generated with an additional option -X stream.

  4. After regenerating and applying the updated back up to the stand by data directory, the stand-by came up without this error.

In case of docker,

This error will result in the container getting constantly killed and restarted. The first step is to get the container up and running so that we can exec into the container and run pg_resetwal or pg_resetxlog . In the postgres docker layer info, we can see that

ENTRYPOINT is ["docker-entrypoint.sh"] and CMD is ["postgres"]

docker-entrypoint.sh script will run any linux command passed as argument.

If you are on docker then passing /bin/bash will override default CMD and give you access to container shell,

docker run -it -v /my_data:/var/lib/postgresql/data postgres:9.6.22 /bin/bash

here /var/lib/postgresql/data is the postgres data directory inside the container.

Once inside the continer, run below commands based on your postgres version. This will reset the transaction logs (WAL)

On postgres >= 10

pg_resetwal /var/lib/postgresql/data

On postgres < 10

pg_resetxlog /var/lib/postgresql/data

Top rated answer on this thread explains more on the pg_resetwal commands.

Finally, you can exit this container and start the postgres DB container with its original CMD.


Some Additional Info

If you see below error, it might be because the data directory you have specified above might be incorrect.

pg_resetxlog: could not open file "PG_VERSION" for reading: No such file or directory

You can check the PGDATA env variable for the right path.

root@4650984c476b:/# printenv | grep PGDATA
PGDATA=/var/lib/postgresql/data

With older vesion of postgres below error might occur.

pg_resetxlog: cannot be executed by "root"

This can be resolved by running below command.

gosu postgres pg_resetxlog /var/lib/postgresql/data

In case of any container orchestrator like kubernetes, rancher v1 (since we cant run docker commands directly), we will have to start the container with a process like sleep. Pass the below as cmd or args in your orchestrator manifest.

sleep infinity

or

sh -c 'while sleep 3600; do :; done'

Then enter the container using tool like kubectl exec. Once you are inside, pg_resetwal/pg_resetxlog commands can be run.