两阶段提交如何防止最后一秒失败?

我正在研究两阶段提交如何跨分布式事务工作。我的理解是,在阶段的最后一部分,事务协调器询问每个节点是否准备好提交。如果每个人都同意,那么它告诉他们继续前进,并承诺。

如何防止以下故障?

  1. 所有节点都响应它们是 准备好承诺了
  2. 交易 协调员告诉他们“继续” 和提交”,但其中一个节点 在收到这个之前崩溃 信息
  3. 所有其他节点都成功提交,但现在分布式事务已损坏
  4. 我的理解是,当崩溃的节点返回时,它的事务将被回滚(因为它从未收到提交消息)

我假设每个节点都在运行一个普通的数据库,该数据库对分布式事务一无所知。我错过了什么?

11163 次浏览

No. Point 4 is incorrect. Each node records in stable storage that it was able to commit or rollback the transaction, so that it will be able to do as commanded even across crashes. When the crashed node comes back up, it must realize that it has a transaction in pre-commit state, reinstate any relevant locks or other controls, and then attempt to contact the coordinator site to collect the status of the transaction.

The problems only occur if the crashed node never comes back up (then everything else thinks the transaction was OK, or will be when the crashed node comes back).

Two phase commit isn't foolproof and is just designed to work in the 99% of the time cases.

"The protocol assumes that there is stable storage at each node with a write-ahead log, that no node crashes forever, that the data in the write-ahead log is never lost or corrupted in a crash, and that any two nodes can communicate with each other."

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

No, they are not instructed to roll back because in the original poster's scenario, some of the nodes have already committed. What happens is when the crashed node becomes available, the transaction coordinator tells it to commit again.

Because the node responded positively in the "prepare" phase, it is required to be able to "commit", even when it comes back from a crash.

There are many ways to attack the problems with two-phase commit. Almost all of them wind up as some variant of the Paxos three-phase commit algorithm. Mike Burrows, who designed the Chubby lock service at Google which is based on Paxos, said that there are two types of distributed commit algorithms - "Paxos, and incorrect ones" - in a lecture I saw.

One thing the crashed node could do, when it reawakes, is say "I never heard about this transaction, should it have been committed?" to the coordinator, which will tell it what the vote was.

Bear in mind that this is an example of a more general problem: the crashed node could miss many transactions before it recovers. Therefore it's terribly important that upon recovery it should talk either to the coordinator or another replica before making itself available. If the node itself can't tell whether or not it has crashed, then things get more involved but still tractable.

If you use a quorum system for database reads, the inconsistency will be masked (and made known to the database itself).

Summarizing everyone's answers:

  1. One cannot use normal databases with distributed transactions. The database must explicitly support a transaction coordinator.

  2. The nodes are not instructed to roll back because some of the nodes have already committed. What happens is that when the crashed node comes back, the transaction coordinator tells it to finish the commit.