SQL Server Replication, Push vs. Pull. What is the difference?

Slalom Consultant David Van De Somple

Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

Replication is a very useful method of copying data from production systems to standby servers, reporting servers, and downstream data relay points.  Unfortunately, it is an often misunderstood and/or overlooked technology, even among experienced DBAs.  In my professional career, I have often seen replication misused in an attempt to achieve a goal for which it was not intended.  When it is determined that replication is the correct solution (making that determination will be the subject of a future article), you must install and configure three basic components (these are extremely simplified definitions):

  1. Publisher:  This is the source database or databases containing the information you want to replicate.
  2. Distributor:  This is the database and set of jobs responsible for queuing replicated data from the publisher.
  3. Subscriber:  This is the destination database or databases for data coming from the publisher.

Push and Pull, as named in the title of this article,are the two methods available for moving data from the Distributor to the Subscriber(s).  Under the Push method, the Distributor is responsible for queuing data from the Publisher,then propagating it to the Subscriber(s).  Under the Pull method, the Distributor is responsible for queuing data from the Publisher, and it is the job of each Subscriber to connect to the Distributor and grab all queued data ready for replication.  Selecting the incorrect method can lead to serious performance issues, especially at peak times of database use.

In my years of experience troubleshooting replication problems in production database environments I have discovered a common denominator:  No thought, consideration or understanding of Push and Pull was given when the replication topology was designed and implemented.  This leads to a highly disorganized and chaotic environment, featuring a random blend of Push and Pull methods and much consternation.  Interviewing the designers of such systems about why some subscriptions are Pushed, and others Pulled, has yielded responses such as: “I don’t know.” “What is Push/Pull?” and, “Does it really matter?”  Choosing whether to use Push or Pull methods for your replication topology matters a great deal. There are two key factors to consider when making this determination:

  1. What kind of load is expected on the Publisher? For example, are the published databases serving as the foundation for a high-traffic website?  My current client provides online web advertising engines which serve ads to web sites and capture anywhere from thousand to millions of mouse clicks and ad impressions per hour; all loaded into a set of very busy databases that are published for replication.
  2. On which server will the Distributor reside? Distribution can be configured to run on the Publisher, or it can be configured to run remotely on its own server. Determining the amount of load on your Publisher will often answer the question of where your Distributor should reside.

Consider the following example scenarios based upon the key factors I just mentioned; and these are by no means exhaustive:

1. Your Publisher receives millions of records per day in a high-transaction environment. Running Distribution remotely would be ideal, but budgetary constraints have forced you to run Distribution on the Publisher.

Solution: Strongly consider using the Pull method.  The server running Publishing and Distribution will be under heavy load so place the burden of receiving replicated data on the Subscriber.

2. Your Publisher is moderately busy. Distribution is running remotely.  Your Subscriber is a highly-used reporting system with heavy disk I/O and periods of high CPU usage.

Solution: Strongly consider using the Push method.  A heavy reporting load on the Subscriber need not be further complicated by forcing the Subscriber to Pull replicated data from the Distributor.  Let the Distributor handle that task.

Selecting the correct method (Push vs. Pull) to move data from the Distributor to the Subscriber(s) is a key factor in deploying a successful replication topology.  It is worth spending some extra time in the planning stages to work out the method you will use.  Your client, end users and DBAs will thank you in the end.

subscribe by emailSubscribe to be emailed about new Enterprise Platforms posts.

About David Van De Sompele
Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

15 Responses to SQL Server Replication, Push vs. Pull. What is the difference?

  1. jayant. dass says:

    Hi David,

    It’s a very goog article for difference between pull and push replication.

    Thanks a lot

    Regards
    Jayant dass

  2. Maruf says:

    Hi David,

    Very nice article, but there are some confusion regarding the same as u said for Pull (and it is the job of each Subscriber to connect to the Distributor and grab all queued data ready for replication. ) how the performance will increase in the Pull (Your Subscriber is a highly-used reporting system with heavy disk I/O and periods of high CPU usage.) as the agent will run from the sub more I/O as the agents are also running from sub to fectch the data.

  3. d41k says:

    Nice article! Fairly clear!
    Thanks you very much.

  4. r2 says:

    Hi David,
    Thanks for the helpful article. Could another reason for considering Push rather than Pull be firewall rules between the Publisher and the Subscriber. Suppose the firewall allows only inbound connections to the subscriber, would Push work? Does the Subscriber need to make an outbound connection to the Publisher?

  5. Kesav says:

    Nice article…. :)

  6. Shadi says:

    Hi David
    Thank you so much.your article is very helpful for difference between pull and push.

  7. Scott says:

    Good article for someone who is deciding how to build there first replicated environment

  8. kirit says:

    How to know data transfer from one server to another server while using replication?
    Many people ask me this question?

  9. Pingback: #DailyDigital What’s the difference between a Push Pixel and a Pull Pixel? | Trey Peden

  10. Sapen says:

    what if due to budgetary constraints distributor, publisher and subscriber are on the same server and the load on publisher is a minimum of million records inserted into 10 tables atleast
    ?

  11. Victor says:

    Bit of a over-simplified and naïve article. Fairly useless….

    For example:
    1. It doesn’t discuss security implications for pull versus push models
    2. It doesn’t discuss impact on High-Availability architecture
    3. It doesn’t discuss impact of where the Agents run

    With your Scenario 2 I would rather impact my Reporting environment rather than my Line-of-Business environment.

    Plus, you still have to deliver the transactions to the Subscriber in any case in real-time in Pull versus Push. (Which will potentially produce contention, etc.)

    Performance “as you describe it” is hardly a reason to choose Pull versus Push model in most cases. Just buy the extra RAM and be done with it.

    The “bottom-line” is that “performance” is probably the least important in a list of 3-5 considerations when choosing Pull versus Push replication models…

    • Victor, Thank you for your comments. This blog post was never intended to be a comprehensive guide to designing and implementing replication topology; there are plenty of well-written and detailed guides for that.

      With your Scenario 2 I would rather impact my Reporting environment rather than my Line-of-Business environment. And that is your prerogative. I have worked with several replication environments in which the Line-of-Business spans the Publisher and the Subscriber-not my design. And in those cases performance matters a great deal and is absolutely a reason for ensuring you select the correct model (Push or Pull) for the application. Extensive testing against both models has revealed this.

      Just by the extra RAM and be done with it. This is assuming that: 1. The server(s) have capacity to do so. 2. Your client has budget for additional hardware purchases. 3. Adding RAM will actually solve the problem. For example, does Performance Monitor show that memory pressure exists? Could it be that an under-specd disk subsystem is a more likely culprit for performance issues?

      The bottom-line is that performance is probably the least important in a list of 3-5 considerations when choosing Pull versus Push replication models I disagree. Due to budgetary constraints you may have to work with older hardware that is not ideally suited for the demands of SQL Server. Application(s) may be poorly-written and inefficient in their use of database resources. Data consumers will have certain performance requirements. In these instances, to get the best performance with the situation you have been given, choosing the correct replication topology is highly important. Again, extensive testing has often revealed that one model (Push or Pull) is better suited for a specific environment. But as you mentioned, and Ill agree with you on this, there are other factors, like security and high-availability, to consider. A discussion of those topics was simply beyond the scope of this particular blog post.

      ________________________________________

  12. Hey there just wanted to give you a quick heads up and let you know a few of the
    images aren’t loading correctly. I’m not sure why but I think
    its a linking issue. I’ve tried it in two different web browsers and both show the same outcome.

  13. Pingback: Replication | nagarjunarajus

  14. Amith says:

    Hi, Its very useful and best explanation between push and pull. Thanks a lot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 130 other followers

%d bloggers like this: