SQL Replication Fun!

May 9, 2008

We use Microsoft SQL Server as the back end for our Retail Management & EPOS software. On most systems we use merge replication configured with pull subscriptions. Recently we’ve had a few instances of the replication mysteriously hanging on a merge - the first sign is usually the client calling to say their system is running very slowly. A quick look will show CPU usage at 100% with the replmerg process appearing to do a lot but actually doing very little! This could happen to one subscription or several at the one time - while remaining subscriptions operate fine.

 

Symptoms (you may get some or all of these)

  1. Subscriber machine slow & unresponsive
  2. ReplMerg.exe process takes 100% CPU time
  3. Agent process takes considerable CPU time on Publisher
  4. Replication “hangs” for long period
  5. Pretty much zero network traffic while it’s “working” away
  6. Query timeout messages at the subscriber repeating ever so often
  7. The following errors are found in replmerg.log on the subscriber :-
  8. Replprov.dll, 2008/05/07 09:32:57.479, 3964, 16890,  S1, ERROR: ErrNo = 0×8004501e, ErrSrc = <null>, ErrType = 8, ErrStr = The merge process was unable to create a new generation at the ‘Subscriber’. Troubleshoot by restarting the synchronization with verbose history logging and specify an output file to which to write.

    Replprov.dll, 2008/05/07 09:32:57.760, 3964, 16890,  S1, ERROR: ErrNo = 0×80045607, ErrSrc = <null>, ErrType = 8, ErrStr = The Merge Agent was unable to update information about the last synchronization at the Subscriber. Ensure that the subscription exists at the Subscriber, and restart the Merge Agent.

  9. Errors at the server include :-

The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

 

Cause

The cause is a known bug in SQL Server where it gets into an infinite loop when trying to run sp_MSmakegeneration. An update is supposedly due for release in July/August 2008 to resolve this problem - in the meantime a recommended solution is to set generation_leveling_threshold to 0 using the following SQL Code.

update sysmergepublications set generation_leveling_threshold = 0

This has worked for us most times - although sometimes it does seem to get reset back to the default 1000 and eventually will die again.

 

Other detail

Here’s a quote from the MSDN Forums which explains things a little :-

This basically tells the merge agent how many commands to put into a given generation.  This setting coupled with DownloadGenerationsPerBatch will tell the merge agent how much to download at a given time.  For example, if DownloadGenerationsPerBatch is set to 1, and generation_leveling_threshold is set to 100, and there are 10 generations to process (totalling 1000 changes), merge agent will sync 1 genreation at a time, and if it fails anywhere along the line, it will restart from the failed generation.  This is highly useful in scenarios with unreliable network as it gives the agent a chance to make progress amonst the many retries.  By setting it to 0, it will try to sync 1000 changes all at once, and if by the 999th change the network cuts out, merge agent will retry from the beginning all over again.
http://forums.microsoft.com/msdn/showpost.aspx?postid=1655111&siteid=1&sb=0&d=1&at=7&ft=11&tf=0&pageid=3

 

And some other references on the forums

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1619326&SiteID=1

http://forums.microsoft.com/msdn/showpost.aspx?postid=1655111&siteid=1&sb=0&d=1&at=7&ft=11&tf=0&pageid=0

Leave a Reply