Snapshot getting applied repeatedly

Recently I was faced with an interesting issue where upon rerunning distributor agent, complete snapshot was being applied even though we had got “snapshot has been delivered successfully” message. The activity was about adding newly created tables into the replication.

Below were high level steps followed for the activity.

  1. Set allow_anonymous and immediate_sync to FALSE for publication  and stop log reader and distribution agent jobs.
  2. Add newly created tables in replication using “sp_addarticle” commands.
  3. Refresh the subscription
  4. Confirm through script that snapshot will be generated for newly created tables only.
  5. Start Snapshot Agent using the Replication monitor. You will notice that bulk-insert statements are created only for the specific article instead of all articles.
  6. Start log reader and distribution agent jobs.
  7. Re-enable the disabled properties, first, immediate_sync and then Allow_anonymous options

After step 5, once we got the message for successful snapshot delivery, DBA started the log reader and distributer agent jobs and noticed that snapshot started to get applied again. I was bewildered on this beahviour and wanted to know the reason behind this to prevent this in future.  When I checked the currently running processes on the subscriber server, I found that  replication process was creating non-clustered indexes under a  SPID different from  SPID of distribution agent job which was in sleeping state waiting for nonclusted index creation to be completed.

On further research I found that as per article https://repltalk.com/2019/02/24/replicating-non-clustered-indexes-improves-subscriber-query-performance/?unapproved=128814&moderation-hash=d5c8f70a771a2bffe87543511005be8e#comment-128814 , by default “Copy nonclustered index” setting remains off  but for all newly added articles it was set to TRUE which caused ~1500 non clustered indexes to be created at subscriber in ~120 minutes.
 
Snapshot take cares of cluster index creation during snapshot delivery and same was mentioned in messages of distribution agent logs but it did mention about creation of non-cluster indexes. Although non clustered index creation was  part of snapshot application but they were being created in background after message of snapshot delivery completion.  As we got the message for successful snapshot delivery completion, we were under impression that now undistributed commands should come to zero as replication has been synchronized but it was not the case and we got below error logged in distribution agent.

The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.


Hence DBA decided to rerun the distribution agent which resulted into reapplication of snapshot.

When we discovered that non-clustered index creation was going on subscriber database after snapshot delivery, we waited for process to complete for approx 2  hours and undistributed commands became zero only when all non-clustered indexes were created successfully on the subscriber database. 

I would encourage to visit https://wordsontech.com/understanding-files-created-by-replication-snapshot-agent/ for more understanding regarding snapshot agent.

In summary, we should ensure that SPID running against snaphot agent is complete and should not rely solely on snapshot delivery completion message to start/rerun the distribution agent.

Hope this will help you!

Happy learning!!