Tip of the day: Unfuddle DMP format and lesser-known git commands

Submitted by Frederic Marand on

While exporting a project from Unfuddle in order to import its issues to Jira, I took a look at the other files beyond backup.xml and the media/ directory. Turns out that when Unfuddle provides you with a project backup, it includes the repositories in an undocumented (on their site, at least) format, under the dmp file extension. Let's find out how to actually use these.

A quick check shows them NOT to be produced by git archive or git bundle:

file ../unfuddle/foo.git.dmp
../unfuddle/foo.git.dmp: data
git bundle verify ../unfuddle/foo.git.dmp
error: '../unfuddle/foo.git.dmp' does not look like a v2 bundle file

Looking into the file showed them to be some sort of stream-oriented TLV format. A bit of search showed it was actually a format used for the little-known commands git fast-export and git fast-import. A the documentation on those pages says: This program is usually not what the end user wants to run directly.

Turns out that in the case of Unfuddle backups, this is exactly what we want. It just takes a few steps to go from these dumps to usable repositories. Although the git fast-import doc mentions that the fast-import backend itself can import into an empty repository (one that has already been initialized by git init), in practice, when working on a just-created repository, it appears that the commits are read into the repository, but the branches never appears, so we have to do a bit more setup.

$ mkdir import
$ cd import
$ git init
Initialized empty Git repository in /<snip>/import/.git/
$ git fast-import < ../unfuddle/foo.git.dmp
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:         2190 (         0 duplicates                  )
      blobs  :         1949 (         0 duplicates        699 deltas of       1940 attempts)
      trees  :          216 (         0 duplicates         50 deltas of        213 attempts)
      commits:           25 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:        1048576 (      1974 unique    )
      atoms:           2279
Memory total:          2469 KiB
       pools:          2235 KiB
     objects:           234 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =          2
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =    9611593 /    9611593
---------------------------------------------------------------------
$

Actual numbers will vary, obviously. At this point the repo contains the history from the original, but it may still have a problem if, as was my case, it dit not contain a master branch. The usual gitk will show nothing, and even git log will show nothing either. Is that a problem ?

$ git status
git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)
$ git log
fatal: your current branch 'master' does not have any commits yet
$ git fsck --full --strict
notice: HEAD points to an unborn branch (master)
Checking object directories: 100% (256/256), done.
Checking objects: 100% (2190/2190), done.

As shown by git fsck, the data is indeed there, it's just a limitation of many porcelain tools which don't work as expected if there is neither a default branch nor an explicit one being checked out. The fix is to just tell them which branch to use for default using the more or less arcane git symbolic-ref command, after finding which ones exist:

$ cd .git
$ ls refs/heads
4.x
$ git symbolic-ref HEAD refs/heads/4.x
$ git reset --hard
$

In this case, the exported repo only had a single 4.x branch, which was not set as the default branch (this is an Unfuddle Stack limitation: it does not support default branches not being called master. After using git symbolic-ref, the index is properly setup, but the checkout still contains no file, hence the final git reset --hard to have the repository be clean on the chosen branch.

A shortcut for this would have been a simple git checkout 4.x, which provides an active checkout branch for porcelain tools, but it does not set the repository default branch. Using git symbolic-ref is therefore safer.