VCP::Process - How vcp works
"vcp" is designed to be a general purpose repository import/export
tool. This document describes some of the techniques used to keep
"vcp" general purpose.
"vcp" works in several phases:
- 1. Metadata Scanning
- Before anything else can happen, "vcp" must take the source
repository spec, something like "cvs:module/dir/..." and use the
appropriate repository interface ("cvs log" in this case) to
extract the metadata.
The metadata is currently kept all in memory; if you run in to a repository
so big that this is troublesome, do the transfer in phases or pester us to
provide a swap file capability for this data.
In the case of a RevML source, it is not practical to scan the input for
metadata alone (the RevML may be coming from the standard input, for
instance), so all of the files in a RevML source file are extracted during
the scanning phase, as mentioned in VCP::Source::revml.
- 1a. Base revisions and backfilling
- When sourcing from incremental RevML transfers, an additional step must be
taken for each text file in the transfer. An incremental RevML file does
not usually contain the entire body of any revision of a text file; it
only contains deltas between revisions. This is not so for binary files,
which are currently always shipped in their entirety, or for when the
--bootstrap option has been provided during the extraction.
"vcp" therefore needs to be able to recreate the first revision of
a text file in an incremental transfer when RevML is in use. This is
addressed by a process called "backfilling the base revision".
The "base" revision of a file is the revision that immediately
precedes the first revision being transfered. It is also the last revision
in the previous transfer and must be the most recent revision (on the
appropriate branch) in the destination repository.
"vcp" "backfills" the base revision by checking it out
of the destination repository, then reconstitutes the first revision by
applying the (base revision => first revision) delta to the base
revision. Each revision in a RevML file contains an MD5 checksum to make
sure that all backfilling and patching is implemented accurately.
- 1b. Selecting
- In the case of VCP::Source::cvs, the initial scan often nets too much
data, so the data scanned is winnowed down to the desired set (see
"Files that aren't tagged" in VCP::Source::cvs for
- 2. Sorting and Change Aggregation
- The order that the soruce repository presents revisions in is often not
the order they need to be inserted in, so the destination driver
(VCP::Dest::p4, for example) is given the opportunity to sort the
This is primarily used to do change number aggregation when converting from
a repository that does not provide change set metadata (like CVS) to one
that does (like p4).
This is also important when generating RevML files because the order of
appearance of files in a log file may hinge on exactly when the files were
inserted along with their names, at least in the case of CVS. Sorting the
revisions provides for consistent RevML files, which is important in
- 3. File transfer.
- The final stage is to do the file transfer. When the entire source file is
available, it is simply added to the result repository in the correct
For incremental transfers an extra step is taken to ensure that incremental
transfers leave no gaps. The base revision is backfilled from the
destination repository (using the process for backfilling described in
phase 1 above) and compared to the base revision from the source
Currently, "vcp" shells out to command line tools like "cvs"
and "p4". This is a "least common denominator" approach
that allows VCP to operate at a safe distance from the underlying
implementations. It is also the primary bottleneck in transferring files. We
will gladly accept donations of drivers that use direct library interfaces or
remote procedure call (SOAP, RMI, etc., etc.) techniques to speed this process
Barrie Slaymaker <firstname.lastname@example.org>
Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights reserved.