1. BitKeeper in classified environments
Defense contractors and other customers with similar constraints may face situations wherein they need to maintain multiple versions of a source base without any network connectivity. BitKeeper is well suited to this task because of its distributed nature. We describe straight forward application of the BitKeeper system which is one possible solution. The effect of our solution is to have full functionality both in and out of the classified area. The only associated cost is latency introduced by having outgoing changes reviewed and approved.
1.2. BitKeeper background
BitKeeper is a distributed configuration management system which supports full functionality with or without a network connection to other instances of the data. It does this by replicating, and subsequently merging, all of the data including metadata. Other systems claim to be distributed but all they really mean is that there is a local copy of some version of the data and to update that, roll it backwards or forwards, or perform any other revision control operation, there must be a network connection to a centralized server.
Since BitKeeper replicates all of the data and allows any user to update any copy in any repository at any time any problem which requires disconnected operation is ideal for BitKeeper. (If the feature claims seem a bit pedantic, note that ClearCase's multisite product grants write access to each site but only on a site specific branch. I.e., if there are three multisite repositories, each has to be in a different branch or it is read only. BitKeeper does not have this restriction.) The fact that there is no connection between two repositories in no way reduces the abilities of either of those repositories. Each repository may be cloned (copied) to create another repository for a new development effort, each repository may be rolled backwards to any/all earlier states, and each repository may be updated in parallel.
1.3. Problem statement
In a classified environment, it is typically much easier to get information into the environment than it is to get it back out. This holds true for source code. The classified area is frequently the only realistic test environment, many projects are too expensive to replicate the test environment outside the classified area; security issues may preclude the replication of the test environment even if cost does not.
The lack of data flow in both directions makes software development difficult across the boundaries. Limitations in most SCM systems force one or the other environments to operate without SCM since the air gap makes it impossible for traditional systems to communicate.
1.4. Proposed solution
With BitKeeper, none of these problems are difficult. The solution may be summarized as follows:
1.4.1. Solution diagram
A flow chart of the proposed solution is above. The following sections provide more detailed information about how this would work.
1.4.2. External environment
A master repository is created. Each developer gets their own repository to work on their tasks; those repositories are labeled workspaces in the diagram. Until such time as the work is ready to be tested, all development occurs outside, and is aggregated in the master. So far, this is a very typical, fairly basic usage model for BitKeeper.
1.4.3. Creating the internal environment
Getting data into the classified zone is accomplished by putting the data on some sort of easily transferred media; since CD writers are cheap and ubiquitous, we'll assume that media.
On the outside, someone burns a CD containing a copy of the master repository. This CD is hand carried into the classified environment and loaded onto a disk. The created repository a copy of the external master, labeled "BK master 1 copy" in the diagram. Before any internal work starts, the copy of the master is cloned to an internal master, labeled "BK Master 2" in the diagram. All internal work will be coordinated in the internal master. The copy of the external master is not strictly required but we will see that it makes it trivial to update the external master in a moment.
1.4.4. Internal development
At this point testing may begin in the classified area. There are no restrictions placed by the SCM system on the internal usage. If multiple tests are required, each engineer could clone a copy of the internal master and go to work. If old versions are needed for a demonstration, for comparison, or for debugging purposes, rollback to any version is easily accomplished with a one line clone command. Other than updating, there is absolutely no difference in functionality on the inside versus the outside. It is easy to test if this is true by cloning a copy of a repository onto a laptop, unplug the laptop from the network, and run some tests against the repository. It will become apparent that there is no dependency on the network for any operation other than updates.
1.4.5. Updating the internal site
At any time, if it is known or suspected that the external site has advanced, the process for updating the internal site is as follows. Burn another CD of BK master 1. Hand carry the CD into classified area and load it on disk. Before loading it, remove the (possibly) out of date ``BK master 1 copy'' repository; the new CD is a superset of the internal copy. After loading, do a bk pull from the master copy into ``BK master 2.'' After doing so, BK master 2 will contain everything that is on the outside merged with all work on the inside.
1.4.6. Moving changes back out
At some regular interval, any internal bug fixes need to be moved to the unclassified area. This poses two problems:
The usefulness of the BK master 1 copy repository will become apparent as we examine how BitKeeper moves data. A pull operation is actually made up of several sub-operations. Pull starts by comparing the changeset identifiers in each repository to determine what changes are in the source but not in the destination. The list, if any, is used by a program called makepatch which generates a special sort of patch, one which contains both data and metadata. These patches are quite small and are proportional to the size of the changes. The patch is sent to the receiving side, which explodes it using a program called takepatch, and then resolves any conflicts and applies the changes using a program called resolve.
BitKeeper can be instructed to do only part of this process, and that is exactly what is needed here. If a user does a bk pull -r from the BK master 2 repository to the BK master 1 copy repository, then the patch will be left in the PENDING directory in the destination repository. This patch is precisely what is needed to update the external repository. Instead of updating the copy repository, we want to have the patch reviewed, sent out, and then applied to the external master.
Note that none of these operations need be synchronous. While it is annoying to have to wait for a review, BitKeeper will be perfectly happy to take the patch 2 or 3 weeks later, with 2 or 3 weeks more development in the external tree, and apply it. The review puts a delay into the update process, but the whole thing may be pipelined such that there is a fairly constant stream of changes being sent out.
Also note that it is not necessary to wait for the changes to be applied to the external repository before updating the internal one again. One or more updates of the internal repository may happen, either from the external repository or from internal bug fixes, while the patch is in process with the reviewer. A small amount of care should be taken not to send the reviewer the same data twice, but this is an easily automated process.