Friday, October 9, 2009

Missing Committed Rows, In The Read Committed Isolation Level

Today, I am going to talk about the possibility of a SELECT statement that misses committed rows, in the read committed isolation level.  You may be asking your self, what!!!! How can committed rows be missed in the READ COMMITTTED isolation level?  This is the perfect example of an oxy-moron in my opinion, but it can and does happen.  Perhaps you have noticed this behavior or perhaps has not been that apparent to you or your users.  Granted this only occurs under certain circumstances, but it can really confuse a lot of people.  The problem exists because of the method SQL Server uses to scan a table.  When SQL Server scans a given table it takes a shared lock one row at a time.  The problem occurs when a transaction obtains an  exclusive lock, which prohibits the table from proceeding further with the scan.  What do you think will happen if the inserted value occurs before the currently scanned row?  You got it… the row does not appear in the result set.  Take a look at the graphic below to see what is actually happening.

Drawing1

Now it is time to see this behavior in action.

In a new query window, run the below code. I will be referring to this window as Query1.

create table t (a int primary key, b int)
insert t values (1, 1)
insert t values (2, 2)
insert t values (3, 3)
 
begin tran
update t set b = 2 where a = 3

Next, create a new query window, which I will refer to as Query2, and paste and execute the following code.

select * from t ORDER BY a ASC

Note: I use an order by clause to guarantee the sort.  Otherwise, the data will be returned as it is found on disk

Now open a third window.  Paste and execute the code below.

INSERT INTO t VALUES (0,10)

Return to Query1 (The query with the UPDATE statement), and execute the code below.

INSERT INTO t VALUES (4,10)
select * from t
commit tran

After jumping through all these hoops, here are the results:

Query1:

image

Query 2:

image

Query 3:

image

Query 3 does not really tell us much other than our insert was committed to the database.  The query that tells us the story is Query 2.  As you can see, Query 2 is missing the row we inserted and committed in Query3.  If you repeat the steps above but change the sort order to DESC you will see that all rows are returned, unless of course you decide to insert a row after the current MAX a value.  Another thing of note is the clustered index.  If you change the clustered index to NONCLUSTERED, you will get all rows.  Why does this happen?  The answer is when a CLUSTERED index is scanned thus it returns the data in the order of the index sort; however, when a heap is scanned it returns data as it finds it on disk, unless an order by is specified.  Please realize that no sort is ever guaranteed, without an ORDER BY clause.  As you can see the direction of the scan impacts which data will be missing and which data is displayed.  How do you resolve this issue?  The only answer is to choose an isolation level with more consistency and less concurrency, like serializable or snapshot.  Here is the BOL entry for transactional isolation level, http://msdn.microsoft.com/en-us/library/ms173763.aspx.

There you have it.  I have demonstrated how committed rows can be skipped, in the read committed isolation level.  I wonder what else can happen in the read committed isolation level?  If you want a hint, try updating a row that has not been scanned yet and an row that has already been scanned, in the final commit part.  I think you will be surprised that your query will return invalid data and will be missing data. 

Happy coding.

2 comments:

Unknown said...

I believe you may also know about the way of fix-pdf

Z said...

Thank you. Nice Demo.
Verified that this works as explained in Microsoft SQL Server 2014 (SP2-CU3) as well.
Could you do a similar article with showing what can be missed when using WITH NOLOCK?

Also, how do other RDBMS like Oracle handle this scenario?