Notes on Two-phase Commit

June 9, 2017 Tech 本文总阅读量次

I recently came across a good description of two-phase commit from actordb’s document. I decide to borrow it as a note. The following is copied from actordb’s document:

3.2.3 Multi-actor transactions
Multi-actor transactions need to be ACID compliant. They are executed by a transaction manager. The manager is itself an actor. It has name and a transaction number that is incremented for every transaction.

Sequence of events from the transaction manager point of view:

Start transaction by writing the number and state (uncommitted) to transaction table of transaction manager actor.
Go through all actors in the transaction and execute their belonging SQL to check if it can execute, but do not commit it. If actor successfully executes SQL it will lock itself (queue all reads and writes).
All actors returned success. Change state in transaction table for transaction to committed.
Inform all actors that they should commit.

Sequence of events from an actors point of view:

Actor receives SQL with a transaction ID, transaction number and which node transaction manager belongs to.
Store the actual SQL statement with transaction info to a transaction table (not execute it).
Once it is stored, the SQL will be executed but not committed. If there was no error, return success.
Actor waits for confirm or abort from transaction manager. It will also periodically check back with the transaction manager in case the node where it was running from went down and confirmation message is lost.
Once it has a confirmation or abort message it executes it and unlocks itself.

Problem scenarios:

Node where transaction manager lives goes down before committing transaction: Actors will be checking back to see what state a transaction is in. If transaction manager actor resumes on another node and sees an uncommitted transaction, it will mark it as aborted. Actors will in turn abort the transaction as well.
Node where transaction manager lives goes down after committing transaction to local state, but before informing actors that transaction was confirmed. Actors checking back will detect a confirmed transaction and commit it.
Node where one or more actors live goes down after confirming that they can execute transaction. The actual SQL statements are stored in their databases. The next time actors start up, they will notice that transaction. Check back with the transaction manager and either commit or abort it.