Monday, June 9, 2008

Multi-site Development and the Write-Thru Proxy

With Subversion 1.5, along with other notable new features such as built-in merge tracking, the WebDAV write-thru proxy will be introduced to simplify use of svnsync for Subversion deployments based around Apache 2.2.x.

Prior to Subversion 1.5, users had to manually redirect their client to the master server whenever they executed a commit or other write transaction using the “svn switch -- relocate” command. The WebDAV write-thru proxy will now detect when a commit or other write command has been issued by a client connected to a slave repository. It will then automatically redirect the client to the master server. This should make life somewhat easier for end users, and help prevent unintended writes against slave repositories from leading to split-brain scenarios that can be difficult to recover from.

However, the WebDAV write-thru proxy leaves svnsync’s master-slave architecture unchanged. While svnsync does offer the advantage of local reads, eliminating WAN traffic that would otherwise take place between a remote client and a central Subversion server, writes only happen on the master. Thus, the master repository can become a single point of failure for write transactions. In addition, the lag time between each instance of master repository replication can result in users at remote sites checking out stale copies of source code files from their local slave. This in turn can lead to update conflicts when changes are committed against the master. If the replication process fails due to network outages or server crashes, there are no built-in recovery capabilities.

In contrast, WANdisco’s Subversion MultiSite turns every Subversion repository into a peer of every other, and every repository is readable as well as writeable for the entire code base. Replication is triggered automatically when a write operation is done at any location, and transactional consistency is guaranteed across all of the repositories. Self-healing capabilities are provided to automate the recovery process after a network outage or server crash, and prevent any data loss.

Although both svnsync and Subversion MultiSite support the WebDAV HTTP protocol, Subversion MultiSite only uses this protocol over a LAN. WANdisco’s own optimized protocol is used over a WAN on top of TCP/IP. The result is that commits consisting of hundreds of files are sent in a single pass during replication with Subversion MultiSite, rather than one-by-one as a series of HTTP PUTS, as is the case with svnsync. This enables Subversion MultiSite to deliver a significant performance boost over a wide area network.

With svnsync, user information, including access privileges must be maintained consistently across all of the servers, and there are no built-in features to support this. When WANdisco’s Subversion Access Control solution is implemented with Subversion MultiSite, the security configuration is replicated automatically when it’s initially set up, as are any changes, insuring consistency across all of the servers.


To learn more check out: Subversion MultiSite.

4 comments:

Jnatividad said...

Hhmmm... svnsync over http will indeed send a lot of PUTS over TCP/IP. But is that the case with svnsync using the SVN protocol using svnserve?

How exactly is Wandisco's "optimized" protocol different from the SVN protocol?

SubversionMan said...

Svnsync with the SVN RA protocol, (SVN protocol using svnserve) still requires up to six WAN round trips for each write operation between a remote developer’s Subversion client and the master Subversion server, versus just one WAN round trip with WANdisco. These WAN round trips using the SVN RA protocol are required to open the connection, authenticate the user, execute the write command on the master server, and close the connection. Once the write is done on the master, svnsync will then re-execute the write on each of the read-only slaves, including the slave repository at the site where the write request originated. In WANdisco’s case, because of the write anywhere, active-active replication approach used, the commit is already written to the repository at the site where the write request originated. This eliminates an additional set of WAN round trips that would be required by svnsync.

Over a LAN between a Subversion client and the local server, WANdisco uses the standard Subversion protocols (SVN RA or WebDAV). WANdisco’s protocol over the WAN is unique in that it is designed to support WANdisco’s write anywhere active-active replication approach. WANdisco’s protocol works over TCP/IP, so nothing needs to change from a network configuration perspective.

Jnatividad said...

From internal benchmarking, I know firsthand that SVN RA is about 2x faster than HTTP (webdav). This is because unlike HTTP, which is stateless, SVN RA is tailor-made for svnserve.

How much faster is Wandisco's protocol?

It would REALLY go a long way if Wandisco can give some benchmarks... I actually think it will make your jobs a lot easier, especially since you're marketing to a tech-savvy audience.

Regardless, even if the Wandisco protocol is not that much faster than SVN RA, the peer-to-peer stuff and transparent replication is still hard to beat.

Maybe in these same comparison benchmarks, you can even go thru some failure scenarios vis-a-vis a pure write-thru proxy setup.

And then, not only show Wandisco SVN Multisite is faster, but show how much more robust and fault-tolerant it is.

SubversionMan said...

JV


We're certainly not trying to hide anything. In fact, quite the opposite. We are in the process of setting up a benchmark with the Subversion community now that 1.5 is out, and we'll be happy to publish the results when they become available.

Also, as I noted in my original response to you Svnsync with the SVN RA protocol, (SVN protocol using svnserve) still requires up to six WAN round trips for each write operation between a remote developer’s Subversion client and the master Subversion server, versus just one WAN round trip with WANdisco. These WAN round trips using the SVN RA protocol are required to open the connection, authenticate the user, execute the write command on the master server, and close the connection. Once the write is done on the master, svnsync will then re-execute the write on each of the read-only slaves, including the slave repository at the site where the write request originated. In WANdisco’s case, because of the write anywhere, active-active replication approach used, the commit is already written to the repository at the site where the write request originated. This eliminates an additional set of WAN round trips that would be required by svnsync.

Thanks for your comments and interest.