Recently, I had a database for a customer that has the usual plethora of several hundred stored procedures, nicely normlized schema, and a gob of indexes and statistics to boot for optimized searching. 

The customer wanted a small change to a stored procedure that would return a new field to the UI. Dutifully, I went about modifying the usual things - UI, Biz Logic Layer, Model updates, stored proc, table alter, etc. Well, after making this and about 30 other small changes in the development environment, we push to the QA server for more testing. After banging away and doing a quick performance check, everyone gives a thumbs up. So, over the weekend,  we do the hotfix push to production.

So, on 8:30AM on Monday, the performance on SQL Server went totally south! Oh the horror! I mean, way south, to the point that the server was completely sluggish.

At this point, I am totally baffled. Did Windows update do something nasty? No. Is there some crazy memory issue on the server? 16GBs, 9 free. Dang. That's not it. Did SQL Server's priority get bumped down accidentally by the Admin? Nope. I'm screwed and will have to back out the build, I think.

Then I look at the process manager and see that SQL Server is just spinning wildly, using all CPU cores. But it's I/O activity shows that it is doing virtually nothing- nada. Wait. Could I have possibly screwed this up by adding just one little dinky field? In short, yes. I suspect that I have a slew of deadlocks or at least a bunch of backed up locks. So, I whip out sp_lock2 (find it here: http://support.microsoft.com/kb/255596 ).

sp_lock2 is an extended version of sp_lock. Essentially, it tells you what the heck Sql Server is doing in terms of traffic for execution requests. Here is a sample output for sp_lock2:

It turns out that the very table I modified got a 120 million row import over the weekend too. Well, that's not really that big a deal, it's basically a write / audit table for details in a complex transaction, computing profit margins, targets, and such. Not rocket science. Except, yours truly didn't realize that the stored proc using this table was being hit concurrently by 100+ users and a new SQL job!

So, the additional stress caused by the long running job surfaced the weaknesses in the stored proc. It turns out that the scan to delete the existing computations and then insert updated ones was causing a gazillion table lock requests. This effectively crippled the server. The problem was solved by adding a statistical index to the new column I added and voila! Problem solved.

Normally, you will only see a few hundred rows or even a couple of thousand requests at a time, even with a busy SQL server instance. Sp_lock2 returned the app's SPID along with a backlog of 10,000 locks in my case! By looking up the object_id(objid) returned by sp_lock, I quickly zeroed in on the offending functionality. It was the stored procedure's attempt to delete based on a missing constraint (causing  a full table scan of 120+ million rows) and the lack of statistics on my secondary join on the new column. Short summary - performance was abysmal.

Without sp_lock2, it would have taken me forever to find the problem. Next time, I'll know what to do right away! This is what I like to see on a SQL Server!

 

 

 

 

 


Posted by: ezekiel.brooks
Posted on: 9/3/2009 at 11:36 PM
Tags:
Categories: Tips And Tricks
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed
 
 

Just a tip. The next time you have a very sluggish ASP.NET application to improve, you can pick up an easy win just by convincing your customer to use a fast rendering web browser. 

I have an older ASP.NET application that I am slowly migrating to Silverlight. There were a couple of terribly slow pages where it has a WYSIWYG render of a large electronic audit for a customer's store(s). For typical audits, there might be as many as 10,000 items grouped by categories and type. This would have been a great job for a disconnected fat client, but not good for the rest of the application. A web based solution was chosen for the usual reasons, like troublesome click-once deployment issues that dog corporate networks, etc. A great fit for Silverlight, but we're not there yet in terms of upfitting the entire portal.

So, the project manager tells me that a few of the pages are horrendous on some of the large store audits. Paging is a big negative because of the users' work flows. Best solution for this scenario, Silverlight, is not for a few months. What do I do?

I decide to give Chrome a shot to see if it can do better than IE7. Shazaam! Page rendering and loading goes from 5 minutes to 60 seconds on a 5,000 item audit report. Unbelievably easy speed improvement. 

Yes, there is some tweaking for frames and layout, but it's an easy win. I don't say this often, but kudos to Google Chrome! Download your copy and buy yourself some time to avoid major refactoring when there isn't time for it!

 

 


Posted by: ezekiel.brooks
Posted on: 8/11/2009 at 8:43 PM
Tags:
Categories: Tips And Tricks
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed