I remember a job interview where I was asked to rate my SQL Server experience on a scale of 1 to 10. Not wanting to put my foot in my mouth, I responded with a modest 6. So, the interviewer asked me what the different SQL Server 2008 isolation levels were. Yikes! That was not an easy question. Luckily, it just so happened that I had recently been reading High Performance MySQL, and even though it is an entirely different database, the concept of isolation levels remains mostly the same. I was able to wing it and get the job.
Although this question seems like it is just a “vocabulary” quiz that tests somebody’s ability to memorize, it can open up a good conversation and test a candidate’s understanding of databases, transactions, and concurrency. After the interview, I decided to brush up. After all, I don’t deal much with isolation levels on a daily basis at work. The wikipedia article is a great place to start, but being wikipedia, and not the easiest to understand. Here’s my attempt to dumb it down.
A database isolation level essentially dictates what data different transactions running concurrently on a database can see. For example, lets say you have one transaction A running that does a SELECT * FROM dbo.Customers (never mind the fact that this is a horrible query to run ). Lets say transaction B is running simultaneously, and inserts a new customer into the database. Neither transaction has committed yet. Does transaction A return the newly inserted customer from transaction B? Well, the answer depends on the isolation level. Here are the ones defined in the ANSI SQL standard:
- Serializable – All transactions are isolated from each other. The system appears to behave as if all transactions execute serially. This doesn’t necessarily mean that they run one after the other. If two transactions don’t impact one another, some database systems run them concurrently. In our example, transaction A and transaction B cannot execute at the same time. So if A runs first, it does not see the insert. If B runs first, A sees the insert.
Needless to say, such a draconian isolation level is not good for performance. Isolation levels such as this one would probably need to be used in financial applications such as ATMs that require information to be as up to date and accurate as possible. When dealing with money, you don’t want race conditions.
- Repeatable read – Consider a transaction that runs the same query multiple times. In the repeatable read isolation level, the query will return the same result every time. Essentially, this means that a read lock is acquired for every row touched by the transaction; Other transactions can’t update or delete those rows until the current transaction is done. Ironically enough, despite being called the “repeatable read” isolation level, it can’t quite guarantee the same result set every time.
This isolation level is susceptible to a phenomenon known as phantom reads. Lets say that the query being executed in transaction A is “SELECT * FROM dbo.Customer WHERE FirstName BETWEEN ‘A’ and ‘B'”. Transaction B inserts a customer with a first name of “Bob”. Now transaction A runs the same query. The newly inserted “Bob” row shows up. The reason for this is because no range locks are acquired on the rows. So although the individual rows themselves are locked, the range in the WHERE clause is not. If another transaction inserts/updates/deletes a row that happens to satisfy the WHERE clause, the first transaction sees it.This is the main difference between the serializable and repeatable read isolation levels. With serializable, you never get phantom reads. But with repeatable read, you get slightly more flexibility and better performance.
- Read committed – This is typically the default isolation level of most databases. It does exactly what it sounds like. Any given transaction can see committed changes from other transactions. Write locks are held until the transaction commits, but read locks are released immediately (these read locks are acquired to ensure only commited data is read). Needless to say, nonrepeatable reads can occur, but the results will always be “valid”, since they contain the latest rows from commited transactions. This is a good default level, as it prevents dirty reads and lost updates (two transactions updating the same row simultaneously), while not stifling performance.
- Read uncommitted – Transactions can read uncommitted changes from other transactions. Everything is a dirty read. This is great for performance, as this results in far fewer locks, and works well with things where race conditions don’t really matter, like tweets or facebook updates. For example, lets say John decides to break up with Mary on some social networking website that implements stalker feeds. Transaction A inserts this into the database. Mary happens to log on at the same time, and Transaction B selects this event from the database and displays John’s updated status of being single on Mary’s home page. For whatever reason, transaction A fails and rolls back. Well, Mary sees the breakup, and not believing her eyes, refreshes her home page, and sees that the update is gone. Well, its not the end of the world, she’ll get news of the breakup sooner or later, and move on to someone better 🙂
To sum up, the higher the isolation level, the more locking overhead and risk of deadlocks. The lower, the better the performance, but the higher the risk of reading “invalid” data. Typically, the default level is good enough.