entia non sunt multiplicanda praeter necessitatem

All things being equal, the simple solution is probably right.  Or, if it looks like a duck, walks like a duck, and sounds like a duck, it ain't an emu.  Someone else put it something like "don't look for zebras when it's likely there's horses about".

Last week I posted exasperated about Reporting Services performance.  What was going on, in a nutshell, was that we were at a standstill in application performance, and though it had been getting progressively worse over time, it had made a giant leap into "shitting the bed" almost overnight.  On Tuesday, we thought it might be SSRS. So we turned it off.  Things got better, but quickly degraded over the course of the day and the system was again unusable by Wednesday morning.  I went hunting for zebras.  I churned through a ton of code looking for places to refactor data access.  I didn't find much.  By Wednesday night, I was desparate, and I turned to what I should have been looking at all along.  The database.  This led me to my studies on and implementation of indexing, as discussed here.  This led to significant and immediate performance gains.  Problem solved.

What does this have to do with Occam's razor?  Sunday we upgraded to a new database server.  We went from a dual core to a quad core with almost double the RAM.  We went from SQL 2000 to SQL 2005.  These things should have made for a marked performance improvement, or, at the very least, stasis.  Not degradation.  Nothing else changed.  But I spent 2 days trying to convince myself that it was some unknown force killing my software.  The long and short is, upon restoring the database to a different SQL engine and a different box (not sure which caused it tbh) all, and I mean pretty much all, of our clustered indexes were 90% fragmented.  Rebuilding those indexes on the 10 most heavily trafficked tables on Thursday made things as good as they were, or slightly better, the prior week.  The new reporting services setup hadn't been augmented to set timeouts, and timeouts were happening at 30 minutes.  This was the problem we were seeing with SSRS.  Resetting those timeouts and the caching of the reports fixed that problem.  In essence, the 2 things that changed were the causes of our issue.  So why waste time looking elsewhere?  I don't know.

How many times have you or someone on your team come up against a bug that just came into existence out of nowhere.  The first question is always "what changed?".  Logical enough.  How many times have you or that other developer convinced yourself that the change you made was totally unrelated and couldn't have been the cause of the new bug, only to find out after a few hours of looking for zebras that it was the changed code all along?  This happens all the time to new developers, but it happens to us veterans as well, probably way more than it should.

So, this is a memo to me as much as any of you.  The simplest solution is probably the right one.  Start there and save yourself some hassle.

Tags: