Harin Sandhoo's Blog

July 19, 2009

SharePoint Workflow Deadlocks

Filed under: SharePoint — Tags: , , — Harin @ 11:51 pm

I’ve been working on an application built on MOSS that is extremely workflow intensive.  The application has a series of infopath forms that progress through a custom approval process workflow we developed using Visual Studio 2005.  After a couple weeks, when we were getting about 1000 forms going through this workflow a day, we noticed that we were getting about 10 or so that would just fail inexplicably.  When we terminated and restarted the workflows, they went through the approval process with no issues.

We then started to really dig into our diagnostics.  We checked our custom logging, the SharePoint logs, Event Logs, the workflow trace logs, SQL Server logs, and all the performance counters we could think of.  We noticed that under load we were seeing a number of lock escalations and periodic deadlock errors on our SQL profiler trace and sure enough, if the WF runtime process hosted inside MOSS was the one terminated by SQL we would see a corresponding entry in our workflow runtime trace file. [1

We also saw quite a few different processes involved in our deadlock graphs, but they all could be traced back to the proc_UpdateListItemWorkflowInstanceData stored procedure in the content db, which is called when the workflows start and at each persist point.

So, we did the next logical thing and tried to reproduce the error in our testing environment so we could troubleshoot and deploy a fix.  The only thing was, we couldn’t reproduce it.  It turns out, because we were performing functional testing on a virtualized environment there was never enough load put on the SQL box to see the problem.  We had to pave a machine and install a native 64 bit SQL Server 2005 Enterprise Edition to see it.  Our infrastructure was set up in a large farm topology with 2 WFEs, 1 Index, 1 Search, and a 2 Node active-passive clustered SQL Server 2005 instance.  All the MOSS boxes in production were 32 bit and the SQL nodes were 64 bit, and our testing environment was virtualized so all 32 bit.

I’ll spare you the details of the trial and error process we went through to track these down, but to get rid of most of our deadlocks we ultimately had to:

1. Ensure our lists do not contain a large number of items.
2. Set the maximum degree of parallelism on our SQL box to 1 (MAXDOP=1)
3. Periodically run DBCC FREEPROCCACHE, especially after the weekly SharePoint update statistics timer job is run.

[1] Error in the WF Trace File:
System.Workflow.Runtime.Hosting Error: 0 : DefaultWorkflowCommitWorkBatchService caught exception from commitWorkBatchCallback: System.Data.SqlClient.SqlException: Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Microsoft.SharePoint.Utilities.SqlSession.ExecuteNonQuery(SqlCommand command)
   at Microsoft.SharePoint.Workflow.SPWorkflowManager.SaveInstanceData(Guid trackingId, Stream instanceStream, DateTime wakeupTime, Boolean workflowCompleted, Boolean workflowSuspended, Boolean workflowFaulting, Boolean workflowTerminated, Boolean workflowCanceled, Boolean unlockInstance)
   at Microsoft.SharePoint.Workflow.SPWorkflowHostServiceBase.SaveInstanceData(Guid instanceId, Stream instanceStream, DateTime wakeupTime, Boolean bWorkflowCompleted, Boolean bWorkflowSuspended, Boolean bWorkflowFaulting, Boolean bWorkflowTerminated, Boolean bWorkflowCanceled, Boolean unlockInstance)
   at Microsoft.SharePoint.Workflow.SPWinOePersistenceService.Commit(Transaction transaction, ICollection items)
   at System.Workflow.Runtime.WorkBatch.PendingWorkCollection.Commit(Transaction transaction)
   at System.Workflow.Runtime.WorkBatch.Commit(Transaction transaction)
   at System.Workflow.Runtime.VolatileResourceManager.Commit()
   at System.Workflow.Runtime.WorkflowExecutor.DoResourceManagerCommit()
   at System.Workflow.Runtime.Hosting.WorkflowCommitWorkBatchService.CommitWorkBatch(CommitWorkBatchCallback commitWorkBatchCallback)
   at System.Workflow.Runtime.Hosting.DefaultWorkflowCommitWorkBatchService.CommitWorkBatch(CommitWorkBatchCallback commitWorkBatchCallback)

System.Workflow.Runtime Error: 0 : Workflow Runtime: WorkflowExecutor: Persist attempt on instance ‘1d973253-8b24-4270-9065-4f3923e87374’ threw an exception ‘Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.’ 
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Microsoft.SharePoint.Utilities.SqlSession.ExecuteNonQuery(SqlCommand command)
   at Microsoft.SharePoint.Workflow.SPWorkflowManager.SaveInstanceData(Guid trackingId, Stream instanceStream, DateTime wakeupTime, Boolean workflowCompleted, Boolean workflowSuspended, Boolean workflowFaulting, Boolean workflowTerminated, Boolean workflowCanceled, Boolean unlockInstance)
   at Microsoft.SharePoint.Workflow.SPWorkflowHostServiceBase.SaveInstanceData(Guid instanceId, Stream instanceStream, DateTime wakeupTime, Boolean bWorkflowCompleted, Boolean bWorkflowSuspended, Boolean bWorkflowFaulting, Boolean bWorkflowTerminated, Boolean bWorkflowCanceled, Boolean unlockInstance)
   at Microsoft.SharePoint.Workflow.SPWinOePersistenceService.Commit(Transaction transaction, ICollection items)
   at System.Workflow.Runtime.WorkBatch.PendingWorkCollection.Commit(Transaction transaction)
   at System.Workflow.Runtime.WorkBatch.Commit(Transaction transaction)
   at System.Workflow.Runtime.VolatileResourceManager.Commit()
   at System.Workflow.Runtime.WorkflowExecutor.DoResourceManagerCommit()
   at System.Workflow.Runtime.Hosting.WorkflowCommitWorkBatchService.CommitWorkBatch(CommitWorkBatchCallback commitWorkBatchCallback)
   at System.Workflow.Runtime.Hosting.DefaultWorkflowCommitWorkBatchService.CommitWorkBatch(CommitWorkBatchCallback commitWorkBatchCallback)
   at System.Workflow.Runtime.WorkflowExecutor.CommitTransaction(Activity activityContext)
   at System.Workflow.Runtime.WorkflowExecutor.Persist(Activity dynamicActivity, Boolean unlock, Boolean needsCompensation)

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: