单个 SQLServer 语句是原子的和一致的吗?

SQLServerACID中是否有语句?

我的意思是

给定一个没有包装在 BEGIN TRANSACTION/COMMIT TRANSACTION中的 T-SQL 语句,就是该语句的操作:

  • 原子 : 要么执行它的所有数据修改,要么不执行任何修改。
  • 一致 : 完成时,事务必须使所有数据保持一致状态。
  • 隔离 : 并发事务所做的修改必须与任何其他并发事务所做的修改隔离开来。
  • 持久 : 事务完成后,其效果在系统中永久存在。

我这么问的原因

在一个活动系统中,我有一条似乎违反查询规则的语句。

实际上,我的 T-SQL 语句是:

--If there are any slots available,
--then find the earliest unbooked transaction and mark it booked
UPDATE Transactions
SET Booked = 1
WHERE TransactionID = (
SELECT TOP 1 TransactionID
FROM Slots
INNER JOIN Transactions t2
ON Slots.SlotDate = t2.TransactionDate
WHERE t2.Booked = 0 --only book it if it's currently unbooked
AND Slots.Available > 0 --only book it if there's empty slots
ORDER BY t2.CreatedDate)

注意 : 但是一个更简单的概念变体可能是:

--Give away one gift, as long as we haven't given away five
UPDATE Gifts
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)

在这两个语句中,请注意它们都是单个语句(UPDATE...SET...WHERE)。

在某些情况下,错误的事务是 “预定”; 它实际上是选择 回见事务。盯着这个看了16个小时,我被难住了。似乎 SQLServer 只是违反了规则。

我想知道如果在更新发生之前 Slots视图的结果发生了变化会怎么样?如果 SQLServer 没有在该 约会上的 交易上持有 SHARED锁,该怎么办?是否有可能单个语句不一致?

所以我决定测试一下

我决定检查子查询或内部操作的结果是否不一致。我用一个 int列创建了一个简单的表:

CREATE TABLE CountingNumbers (
Value int PRIMARY KEY NOT NULL
)

在一个紧凑的循环中,我从多个连接调用 单个 T-SQL 语句:

INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers

换句话说,伪代码是:

while (true)
{
ADOConnection.Execute(sql);
}

几秒钟之内我就明白了:

Violation of PRIMARY KEY constraint 'PK__Counting__07D9BBC343D61337'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate value is (1332)

语句是原子的吗?

单个语句不是原子的事实让我怀疑单个语句是否是原子的?

或者 声明还有更多的 真狡猾定义,它不同于(例如) SQLServer 认为的语句:

enter image description here

这是否从根本上意味着在单个 T-SQL 语句的范围内,SQLServer 语句不是原子的?

如果单个语句是原子的,那么如何解释键冲突呢?

从存储过程内部

我没有使用远程客户端打开 N连接,而是使用一个存储过程来尝试:

CREATE procedure [dbo].[DoCountNumbers] AS


SET NOCOUNT ON;


DECLARE @bumpedCount int
SET @bumpedCount = 0


WHILE (@bumpedCount < 500) --safety valve
BEGIN
SET @bumpedCount = @bumpedCount+1;


PRINT 'Running bump '+CAST(@bumpedCount AS varchar(50))


INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers


IF (@bumpedCount >= 500)
BEGIN
PRINT 'WARNING: Bumping safety limit of 500 bumps reached'
END
END


PRINT 'Done bumping process'

在 SSMS 中打开5个标签,按下每个标签中的 F5,然后观察它们是否也违反了 ACID:

Running bump 414
Msg 2627, Level 14, State 1, Procedure DoCountNumbers, Line 14
Violation of PRIMARY KEY constraint 'PK_CountingNumbers'.
Cannot insert duplicate key in object 'dbo.CountingNumbers'.
The duplicate key value is (4414).
The statement has been terminated.

所以故障与 ADO、 ADO.net 无关,或者与上述任何一种都无关。

15年来,我一直假设 SQLServer 中的一条语句是一致的;

那么事务隔离的 xxx 层呢?

对于要执行的 SQL 批处理的不同变体:

  • 默认值(读提交) : 键违规

    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    
  • default (read committed), explicit transaction: no error key violation

    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    
  • serializable: deadlock

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    
  • snapshot (after altering database to enable snapshot isolation): key violation

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    

Bonus

  • Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)
  • Default transaction isolation level (READ COMMITTED)

Turns out every query I've ever written is broken

This certainly changes things. Every update statement I've ever written is fundamentally broken. E.g.:

--Update the user with their last invoice date
UPDATE Users
SET LastInvoiceDate = (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.uid = Users.uid)

错误的值; 因为另一个发票可以插入到 MAX之后和 UPDATE之前。或者一个来自 BOL 的例子:

UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD +
(SELECT SUM(so.SubTotal)
FROM Sales.SalesOrderHeader AS so
WHERE so.OrderDate = (SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader AS so2
WHERE so2.SalesPersonID = so.SalesPersonID)
AND Sales.SalesPerson.BusinessEntityID = so.SalesPersonID
GROUP BY so.SalesPersonID);

如果没有独家锁定,SalesYTD就是错误的。

这么多年来我是怎么做到的。

26246 次浏览

I've been operating under the assumption that a single statement in SQL Server is consistent

That assumption is wrong. The following two transactions have identical locking semantics:

STATEMENT


BEGIN TRAN; STATEMENT; COMMIT

No difference at all. Single statements and auto-commits do not change anything.

So merging all logic into one statement does not help (if it does, it was by accident because the plan changed).

Let's fix the problem at hand. SERIALIZABLE will fix the inconsistency you are seeing because it guarantees that your transactions behave as if they executed single-threadedly. Equivalently, they behave as if they executed instantly.

You will be getting deadlocks. If you are ok with a retry loop, you're done at this point.

If you want to invest more time, apply locking hints to force exclusive access to the relevant data:

UPDATE Gifts  -- U-locked anyway
SET GivenAway = 1
WHERE GiftID = (
SELECT TOP 1 GiftID
FROM Gifts WITH (UPDLOCK, HOLDLOCK) --this normally just S-locks.
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)

You will now see reduced concurrency. That might be totally fine depending on your load.

The very nature of your problem makes achieving concurrency hard. If you require a solution for that we'd need to apply more invasive techniques.

You can simplify the UPDATE a bit:

WITH g AS (
SELECT TOP 1 Gifts.*
FROM Gifts
WHERE g2.GivenAway = 0
AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
ORDER BY g2.GiftValue DESC
)
UPDATE g  -- U-locked anyway
SET GivenAway = 1

This gets rid of one unnecessary join.

Below is an example of an UPDATE statement that does increment a counter value atomically

-- Do this once for test setup
CREATE TABLE CountingNumbers (Value int PRIMARY KEY NOT NULL)
INSERT INTO CountingNumbers VALUES(1)


-- Run this in parallel: start it in two tabs on SQL Server Management Studio
-- You will see each connection generating new numbers without duplicates and without timeouts
while (1=1)
BEGIN
declare @nextNumber int
-- Taking the Update lock is only relevant in case this statement is part of a larger transaction
-- to prevent deadlock
-- When executing without a transaction, the statement will itself be atomic
UPDATE CountingNumbers WITH (UPDLOCK, ROWLOCK) SET @nextNumber=Value=Value+1
print @nextNumber
END

Select does not lock exclusively, even serializable does, but only for the time the select is executed! Once the select is over, the select lock is gone. Then, update locks take on as they now know what to lock as Select has return results. Meanwhile, anyone else can Select again!

The only sure way to safely read and lock a row is:

begin transaction


--lock what i need to read
update mytable set col1=col1 where mykey=@key


--now read what i need
select @d1=col1,@d2=col2 from mytable where mykey=@key


--now do here calculations checks whatever i need from the row i read to decide my update
if @d1<@d2 set @d1=@d2 else set @d1=@d2 * 2 --just an example calc


--now do the actual update on what i read and the logic
update mytable set col1=@d1,col2=@d2 where mykey=@key


commit transaction

This way any other connection running the same statement for the same data it will surely wait at the first (fake) update statement until the previous is done. This ensures that when lock is released only one connection will granted permission to lock request to 'update' and this one will surely read committed finalized data to make calculations and decide if and what to actually update at the second 'real' update.

In other words, when you need to select information to decide if/how to update, you need a begin/commit transaction block plus you need to start with a fake update of what you need to select - before you select it(update output will also do).