Sunday, August 5, 2007

Get the WAN Out of the Way

Virtually every approach to globally distributed multi-site development
using Subversion leaves the WAN in the way of developer productivity. In this post, I’ll explain why this is the case, and how it can be dealt with.

The WAN performance that developers experience results from a combination of two factors: (1) the number of WAN round trip times required to complete a write operation between a Subversion client and a master server, and (2) the available throughput on the network.
I’ll focus on write operations over the WAN such as commits, since there are master-slave solutions like svnsync that allow developers to do checkouts, updates and other read operations locally, without generating WAN traffic. However, read operations over the WAN may still be required with master- slave solutions like svnsync. To understand why this is the case, see my earlier post, “Keeping Multiple Subversion Repositories in Sync.”

Let’s examine the first factor impacting WAN performance, the number of WAN round trips. With each Subversion commit using the SVN RA protocol (Subversion without Apache) up to six WAN round trips will take place between a remote developer’s Subversion client and the master Subversion server. These WAN round trips are required to open the connection, authenticate the user, and write the commit to the master server. This introduces some minimal latency, typically on the order of 2 to 3 seconds over a WAN. If Subversion is implemented with Apache using the WebDAV HTTP protocol,
4 WAN round trips will be incurred for each file in the commit, since each file will get transferred with its own separate HTTP put. With a large number of files in a commit, several minutes of wait time can be incurred over a WAN.

However, the real impact on developer productivity comes from the second factor, the amount of time required to transmit data at WAN-speed, based on the available throughput on the network. For example, consider a commit consisting of 500MB of data sent over a WAN from India to the US. Given that the typical E-1 line used between the US and India operates at approximately 2 megabits per second, it should take 2000 seconds, or a little over 33 minutes to transfer 500 MB. This assumes an absolute best-case scenario in which there’s no competition with other network traffic at the same time, and everything goes smoothly without any connection loss or communication error between the client and the remote master server.

What if a remote site developer’s commits could be processed at LAN-speed, instead of WAN-speed? Given that most LANs operate at one gigabit per second, it should take about four seconds to transfer 500MB of data between the Subversion client and the server over a LAN. Instead of waiting over 33 minutes under the best of circumstances, remote site developers would see their commits complete in four seconds! Developers would check in their changes more frequently, rather than waiting until the end of the day, or end of the week as they would have done in the past, due to the pain of poor network performance.

In addition, what if the distributed Subversion servers were kept in sync in real-time over the WAN? The 33 minutes saved would be just the tip of the iceberg. If developers across all sites had access to the latest source code without having to wait for a master server to be copied to their local read-only servers, then update conflicts and other problems could be fixed as soon as they were found. It would also be possible to achieve real-time collaboration between distributed development teams instead of having them work in silos. As a result, less time would be spent on QA and rework, and a significant amount of time and cost would be squeezed out of the development cycle.

WANdisco, with its unique active-active replication capabilities, allows all of this to be accomplished. WANdisco delivers LAN-speed performance for both read and write operations, while keeping distributed Subversion repositories in sync in real-time. WANdisco gets the WAN out of the way, so that all of the productivity improvements and cost-savings that IT organizations are seeking from globally distributed development can be achieved.

No comments: