home  |  search  |  contact  
 
     

Reporting and Data Mining


BitKeeper is essentially a database of everything that has ever happened with your source base. Your source base becomes an invaluable resource of information about what was done when and by whom. There are many ways to go about mining for this data. This chapter explains how to get at that information, whether it be by a quick one-time command or a regularly-run generated report that shows a particular group's activities for a period of time.

File Level

File State Information

File State Information is general information about how many files are in a particular state. This includes information such as how many files are checked out; how many files are not under BitKeeper control; and how many files have checked-in deltas, but have not been committed to a changeset.

To find out the number of files in a repository that are under revision control, not under revision control, modified (not checked in), checked in (not committed) use:

bk status

To find which files are modified, but not yet checked in:

bk -r sfiles -gc

To find which files are pending (checked in, but not yet committed):

bk pending

To find all files the are not under revision control:

bk -r sfiles -gx

There are many options to bk sfiles that will give specific file information in the output. To see a more detailed list please look at the help page

bk helptool sfiles

File revision history

File revision history is all the information collected about a file as it changes over time.

GUI-tools

File revision history is probably best viewed graphically by bk revtool.

Upon startup, the bottom window displays the last month's revision history for the file or project.

To view the comments for just one revision, left click once on that revision in the graph.

To see the differences between two revisions, left click the older revision and right click on the newer revision. The differences will be displayed in the lower text window. Right click on another revision to diff with another revision. The default diff format is -u (unified diffs).

To see the contents of a file, double click the left mouse button on the revision node in the graph. The text shown for the file is annotated with the user name and the latest revision that modified the line.

Once the annotated file listing is shown, you can then click on the text to view the checkin comments associated with the chosen line. Double clicking on an annotated line brings up csettool and shows all of the other files that were modified in the same changeset as the selected line.

To get a side-by-side view of the differences, select the two revisions and click on the "Diff tool" button.

Command Line Interface

BitKeeper has commands that will do the equivalent of bk revtool described above. The information must be extracted by various bk commands, as follows:

To get a list of revisions of a file with revision comments:

bk sccslog file

The above could also be done with:

bk prs file

prs is a superset of sccslog, and used with options, is a powerful reporting tool for scripting. For more information on prs and scripting reports, see Reporting with Scripting.

To display a line-by-line listing of a file showing who modified each line and in what version number of the file it was modified, use:

bk annotate file

There are useful options to bk annotate to give more detailed information in the annotations including prefixing each line with the date it was modified, line number, and filename. Please see Command Summary for details.

To restrict the set of lines displayed to those lines added in a specified revision or date range use:

bk sccscat -rrev

or

bk sccscat -cdate

In this case, the whole file is not displayed, instead, only the lines which were added in that range of changes are displayed.

To display all lines in all versions of a file use:

bk sccscat file

This is useful for determining when a particular feature was added.

File Contents

Contents of a single revision

To view contents of a single revision of a file you may use bk get to get any revision of a file. If the revision wanted is not known, bk prs may be used to get a listing of each file revision with revision comments.

bk prs file
bk get -r
rev file

File content comparisons (one file)

There are two methods to do file content comparisons: via the GUI tools or via the command line.

To do file content comparisons via the GUI tools, use bk difftool. Differences between any two revisions of a file can be viewed, including the latest modified but not yet checked in version. It's also possible to compare files in different repositories with difftool. To see the differences between the latest modified version of a file and the latest checked in version of the file, use:

bk difftool file

To see the differences between the modified version of a file and another revision:

bk difftool -rrev file

To see the differences between any two specific revisions of a file:

bk difftool -rrev1 -rrev2

To see the differences between a file in one repository and a file in a different repository:

bk difftool /somepath/file1 /someotherpath/file2

To do file content comparisons via the command line, use bk diffs. Differences between the modified version of a file with the last checked in revision of that file, between any two revisions of a file in a repository, or between a specific revision and its preceding revision can be displayed. There is also the ability to do side-by-side diffs as well as author, date, and/or revision annotations if desired. To view the differences between a modified file and the last checked in version of that file, use:

bk diffs file

To view the differences between a modified file and any previous version of that file:

bk diffs -rrev file

To view the differences between any two revisions of a file:

bk diffs -rrev1..rev2 file

To view the differences between a specific revision and its preceding revision:

bk diffs -Rrev file

There is also the ability to do side-by-side diffs as well as author, date, and/or revision annotations if desired. To do that use the options shown below individually or in combination to produce the desired output.

To do side-by-side diffs:

bk diffs -s

To prefix lines with author annotations:

bk diffs -U

To prefix lines with the revision date annotations:

bk diffs -D

To prefix files with revision numbers:

bk diffs -M

As an example, suppose a side-by-side view of differences of revision 1.5 and 1.14 of file foo.c with author and date annotations is desired. To accomplish this the following command could be issued:

bk diffs -sDU -r1.5..1.14 foo.c

File content comparisons (all files in a directory or repository)

Sometimes it is desired to see all the file changes made in a directory or repository since the last time the files were checked out.

To view all differences of all modified files in one directory with the corresponding last checked in files (sometimes this output is very long so piping to more or less may be useful), use:

bk diffs | more

To view all differences of all modified files in a repository with the corresponding last checked in files (sometimes this output is very long so piping to more or less may be useful):

bk -r diffs | more

Project Level

At the project, or repository, level there are many items which are tracked and can be used to get an overall view of the evolution of the project. You can:

  • view the history of the project over time
  • get a detailed view of particular changesets
  • get a listing of all tagged changesets

All of which can help get the high level picture of what is going on in a project.

Viewing Project History

To see an overall picture of what's been happening in a repository, you would view the history of the ChangeSets in that repository as follows:

bk revtool

When launched the tool shows all ChangeSets for the repository. To view a specific ChangeSet, click on that ChangeSet box in the graph that's displayed in the upper window.

To find which files are associated with a ChangeSet, double click on the ChangeSet box.

To view changes made over a range of ChangeSets, left click on the ChangeSet with the lower revision number then right click on the ChangeSet with the higher revision number, and the cumulative list will be displayed in the lower window.

Viewing ChangeSet Contents

To get a detailed view of the last applied ChangeSet's contents, use

bk csettool

Csettool will show:

  • associated changeset comments
  • a list of changed files
  • revision comments for files associated with the changeset
  • the differences between each file and the revision from the previous changeset

Csettool can also be launched from revtool by selecting a ChangeSet or range of ChangeSets and clicking on the View ChangeSets button in the top of the window.

Viewing Tagged ChangeSets

To get a list of all changesets which have been tagged use

bk changes -t

If finding this information is done as part of a script, using prs may be more useful:

bk -R prs -hr1.0.. -nd'$if(:TAG:){:DEFAULT:}' ChangeSet

Command History

Each BitKeeper repository keeps logs of the commands run in that repository. To get a history of the repository level commands, which are clone, commit, export, pull, and push use:

bk cmdlog

To get a history of all commands run in a repository use:

bk cmdlog -a

Debugging with BitKeeper

BitKeeper makes debugging a faster process than can be found in other systems due to the differing levels of information gathered by the tool at the file level, changeset level, and repository level.

There are a couple of different types of bugs that will be explained and then a process for debugging each kind of bug will be demonstrated.

Build Bugs

Build bugs are failures that occur during the build of a product. These can be anything from a missing semi-colon at the end of a line of c-code to a missing include file. Most often, the information needed is what file broke the build and who made the change in the file that broke the build.

If the offending file is known, use bk revtool to find all the changes made to the file between the last build and the present build. Revtool shows the differences between any two versions of the file by left clicking on the last known good version of the file and right clicking on the buggy version of the file. Scanning the differences may make clear the line that broke the build. The differences are annotated with the user who last changed the line so if the line that introduced the bug is found, so is the author.

Functionality Bugs

Functionality bugs are unwanted and unexpected behavioral problems with the software as its running. Most often the process of tracking down a bug is: first find what file and what revision the problem is in, second is finding the changeset containing that revision of the file, and then have the engineer who caused the bug fix it.

If a particular string is known, an error message for example, bk grep can help find the file and revision of the file that contains the particular string (and hopefully, the bug). The default output is annotated with file name and revision number. If you have a general idea in which directory to check, use:

bk grep string

if not, it is possible to do the search on all files in a repository by doing:

bk -r grep string

If the file has already been found (this can sometimes be accomplished by an assertion or core file) run revtool on the file.

bk revtool file

Using revtool on the file, information about what was introduced in various revisions of the file can be found. When the revision that caused the bug is found, click on the View Changesets button, to see all the files associated with the one that caused the bug. This will give a picture of what happened at a particular point in time to help illuminate the problem.

When All Else Fails - Binary Search

BitKeeper employs the concept of binary search when trying to track down elusive bugs. A range of time of when the bug was introduced is needed. This can be sometime between the first changeset and the last changeset, or it can be sometime between the last known good version and the first known bad version. The range can be anything as long as the lower bound does not have the bug and the upper bound does have the bug. The lower bound is the earlier cset (without the bug) and the upper bound is the later cset (with the bug).

Choose the changeset that occurs in the middle of the upper and lower bound. Clone the repository as of that changeset to a new repository and build the product.

bk clone -rmiddle_cset repository /tmp/repository
make product

Check if the resulting product has the bug. If so, the middle changeset is now the new upper bound. If not, the middle changeset is now the lower bound. Repeat.

Code Reviews

Code reviews are the checking of a developer's code by one or more reviewers to determine whether or not that code is suitable for integration with the main source base.

Code reviews can be done using the Request To Integrate (RTI) process. This is done by "queuing" changes against a "gate".

Queue

A gate is the integration tree, it has what is believed to be good changesets. Things normally don't go into the gate unless they have passed peer review and regression tests on all platforms.

Have a gate for each active release. For example, for BitKeeper we have 3.2.x and 3.3.x plus other shorter lived gates coming and going as needed. For each gate there is a gate queue. You can do this all on a file server like so


	/home/bk
	    bk-3.2.x - 3.2.x gate
	    bk-3.2.x-queue - changes which are ready, done, rejected
	    bk-3.3.x - 3.3.x gate
	    bk-3.3.x-queue - changes which are ready, done, rejected
	    etc.
In each release -queue there are directories called

	ready/
	done/
	rejected/
The ready directory is where most of the activity is concentrated. In the ready directory there are three entries for each feature or bugfix:

	ready/feature - symlink to the repository containing the change
	ready/feature.RTI - text file containing the reason why this is needed
	ready/feature.REVIEWED - text file containing review comments

Process

The process to go through is:

  • Engineer creates a repository which is a clone of the gate and then develops the new feature. Use hardlinked clones, if possible, as it takes up less space.
  • Engineer feels the feature is ready and creates a symlink to the repository in the ready directory of the appropriate queue.
  • Engineer sends email to the reviewers that feature is queued.
  • Reviewers review and add comments to feature.REVIEWED.
  • If the feature is accepted the "gatekeeper" can pull the change into the gate and move all feature related repositories and files to the done directory.
  • If the review comments are negative then the engineer does a

    bk fix -c

    on the repository, fixes the problem, and starts the review process again.

Reporting with Scripting

Information about files or sets of files can be found using bk prs, a command used to extract revision history and or metadata from a file or set of files. There are many options to prs, making it a powerful query language for the BitKeeper database which is your source base. The default behavior is to print a summary of each revision to each of the specified files. There are options to restrict the set if revisions to print, a very commonly used one is -r+ which restricts the set to the most recent revision.

With no options specified, prs output defaults to giving information on all revisions of all files under BitKeeper control in the present directory. The filename and range of revisions is listed first, offset by === characters. Below that line, for each revision prs lists: revision number, revision date and time, user who made that revision, what the relative path from root of repository is to that file, the comments that go with that revision, and any renames that have occurred, if appropriate. Revision information is repeated from most recent to oldest, separated by a a line of ---\characters. Once the oldest revision information is listed, then the sequence will repeat with the next file in the directory.

Output Format
The bk prs command has a default output format which can be overridden. There are many different pieces of information in a BitKeeper file and bk prs can extract most of them. To extract specific information, a dspec (data specification) string must be provided and should contain keywords, surrounded by colons. bk prs will expand each of these keywords in the output it produces.

To specify a TAB character in the output, use \t; to specify a NEWLINE in the output, use \n; An example dspec which prints the file name (s.file) and the revision number is

bk prs -d':SFILE: :REV:\n' file

In almost all cases, a trailing newline is not provided by any of the variables and one should be provided as needed. The list of variables which currently provide one are: COMMENTS, PATH, DEFAULT, SYMBOLS.

If a multi-line variable is printed as one line, i.e., without $each() (see below) providing a prefix and/or a suffix, then the lines are separated by spaces. The list of variables with this behavior is: C, GB, FD.

Conditional Output
The dspec can produce output conditionally. The following will print the default output format for each revision made by lm:

bk prs -d'$if(:P:=lm){:DEFAULT:}' file

Conditional Statements

BK/Web

BK/Web is an interface for browsing repositories using the web. A project's ChangeSets, user statistics, the source tree and the tags in a tree can all be browsed using the BK/Web interface.


Home    Company    Products    How to Buy    Downloads    Contact Us    Documentation    Support    Site Map

© 1997-2005, BitMover, Inc.