Wednesday, May 6, 2009

Can Globally Distributed Development Really be Supported with a Central Subversion Server?

Based on the feedback we’ve received from our customers and prospects, the answer to this question is a resounding no, at least not in a distributed development organization of any size, with a number of users at remote sites.

There are essentially three major obstacles that distributed development organizations will run into with a central server approach:

1. WAN Latency.

WAN latency becomes a problem not only because of increased network traffic as the number of remote users grows, but also due to the fact that every remote request entails a WAN penalty. Even though Subversion clients only send changes to the central server when modifications to existing source code files are committed, when a new source code file is committed, or an existing file is checked out, the entire file is sent over the WAN.

2. Degradation of the Central Server’s Performance.

This results not only from the extra load generated by increasing numbers of remote users, but also from read transactions that would otherwise be unnecessary if those users had access to a local copy of the Subversion repository. A frequent pattern that arises with a central Subversion server configuration is that multiple developers at the same remote locations repeatedly perform checkouts, updates and other read operations against the same files.

These repeated, unnecessary reads use up central server memory and processor capacity, as well as network bandwidth.

3. Availability.

The ultimate weakness with a central server approach is its single point of failure architecture, and the impact this has on repository availability. When the network connection is lost, remote users have no repository access at all. Even a transient WAN connection failure between a remote Subversion client and the central server can slow down developers at remote sites if it takes place during a large commit, since the entire commit will have to be resubmitted.

Obviously, if the server hosting the repository is down for any reason, all users will be impacted, not just those at remote locations, unless a backup is available. Even if a backup server is available, the time involved in bringing it into service can be significant, and, there’s always the risk of data loss and extended periods of downtime due to human error during the recovery
process.

The arguments typically used in favor of a central server approach are that everyone works from a single consistent copy of the repository, maintenance and administration are only required at one location, and overall control can be implemented more effectively from both a project management and data security perspective. The bottom line for advocates of supporting globally distributed development with a central Subversion server is that these perceived benefits greatly outweigh any gains that might be achieved by distributing Subversion repositories.

Let’s take each of these arguments and examine their validity in light of the available alternatives.


Single Consistent Copy of the Subversion Repository.

The first argument that advocates of a central server approach make is that it’s the only way to maintain a single consistent copy of the repository.

There are a number of master-slave solutions available that provide a partial response to this argument. The most commonly used is svnsync, introduced with Subversion 1.4. While svnsync and the other solutions do offer the advantage of allowing remote site developers to access a local read-only slave mirror of a master Subversion repository, the slaves are only as current as the last instance of replication from the master. The lag time between each instance of replication often leaves developers at remote sites checking out stale versions of source code files. This in turn leads to update conflicts when remote developers perform their commits against the master. This then requires them to perform updates over the WAN against the master to get the latest revision and resolve any conflicts before reattempting their commit. This can negate some of the expected improvements in network performance and developer productivity, because read operations still have to be performed over the WAN. Finally, the master repository represents a single point of failure for write operations.

In contrast to a master-slave approach, Subversion MultiSite relies on a peer-to-peer architecture with no single point of failure. All of the repository replicas are readable and writeable for the entire code base, and consistency across the repositories is guaranteed. In addition, WANdisco’s active-active replication capability allows developers at all locations to work at LAN speed over a WAN for both read and write operations, while at the same time keeping all of the repository replicas continually in sync. In effect Subversion MultiSite delivers one-copy equivalence across a system of distributed Subversion repositories, and provides the same user experience that would be achieved if all of the developers worked at one location over a LAN against a single repository, instead of thousands of miles apart.

Maintenance and Administration.

The second major argument in favor of a central repository is that maintenance and administration only have to be performed at one location.

While this may sound like a major benefit at first glance, for remote developers there can be a significant negative impact if they are separated from the central server site by large time zone differences, and either the network connection or the server goes down during their location’s normal working hours. From the remote site’s perspective, it can take until the next business day to restore access.

In addition, in a typical Subversion implementation, not only Subversion, but Apache and the operating system have to be maintained as well. It is true that if Subversion repositories are distributed, maintenance, particularly in the area of applying patches and performing upgrades becomes more of a challenge. If it’s handled inconsistently, problem resolution can become incredibly complex, resulting in data loss and extended periods of downtime.

WANdisco addresses these challenges by making it possible to monitor and administers servers at all sites from a single location. In addition, Subversion MultiSite is now available as a virtual software appliance, with access to an update server. Patches and upgrades can be applied automatically for all of the components of the implementation, including Subversion, Apache, and the operating system at each location, eliminating the risks inherent in performing these tasks manually.

Backup and Recovery

Since backup and recovery is such an important aspect of Subversion repository maintenance and administration, I’d like to discuss it briefly here. In an upcoming post, I’ll cover this topic in more detail.

With a central server approach, backup and recovery solutions typically either rely on disk mirroring, or svnadmin scripts used to copy the repository to a standby backup server. In any event, even if a backup server is available, the lag time involved in bringing it into service can be significant. In addition, there’s always the risk of data loss and extended downtime resulting from human error during the failover and recovery process.

Although master-slave solutions like svnsync can be used for backup and recovery, if the master goes down, the slaves are likely to be missing data that may be unrecoverable depending upon the nature of the master server failure. The extent of data loss will depend upon the lag time and size of the changes to the master since the last instance of replication.

The other issue to be aware of is what actually gets replicated to the mirror slave repositories by the tool you’re using. For example, with svnsync, only the versioned repository data gets synchronized. Repository configuration files, user-specified repository path locks, and other items that might live in the physical repository directory but not inside the repository's virtual versioned filesystem are not replicated.

With Subversion MultiSite, continuous hot backup is achieved by default as a byproduct of active-active replication and all repository data is replicated. After an outage, recovery from any other site’s server is automatic.

Overall Control.

The third major argument used by advocates of a central Subversion server approach is that development projects can be managed more tightly. What happens in practice is often just the opposite.

Many of our customers report that prior to implementing Subversion MultiSite, remote developers often held back large commits until the end of the day or end of the week, using WAN latency as an excuse. This made it harder to monitor what everyone was doing on a day-to-day basis, and meant that it took longer to find out that developers didn’t understand the specs they were given. As a result, code was delivered that had to be rewritten and project deadlines were frequently missed.

In terms of maintaining control from a data security perspective, the goal is to achieve consistent enforcement of security policy across all development sites. When Subversion MultiSite is implemented with Subversion Access Control, the security policy configuration is automatically replicated to all sites when it’s initially set up, as are any future changes. This guarantees that access control is enforced consistently at every location. Subversion Access Control also provides audit capabilities that track every user access to the repository and alert administrators whenever access violations occur.

Data security as it relates to the contents of source code repositories has become a greater concern in recent years as IT organizations began to outsource development work to countries where enforcement of intellectual property rights is relatively weak. In addition Sarbanes-Oxley and other regulations have begun to reach into the IT organization, imposing requirements of their own. In an upcoming post, I’ll cover this topic in more detail and describe
what’s really required to secure the intellectual property stored in souce code repositories in a globally distributed environment.

No comments: