Extract a subdirectory or single file from a Git repository

The git-filter-repo library supports many ways of file filtering and history rewriting1, but extracting directories and files are two I’ve needed in the past.

Cheat sheet

Extracting a subdirectory from a Git repository

To extract the directory named path/to/dir, move all files in path/to/dir to the repository root:

git filter-repo --subdirectory-filter path/to/dir
Extracting a single file from a Git repository

To extract a single file named path/to/file.txt, move all files in path/to to the repository root:

git filter-repo --subdirectory-filter path/to

Then, remove all files except file.txt:

git filter-repo --path file.txt

Extracting files

Git comes with a file named git-prompt.sh, which exposes __git_ps1—a function that returns the current Git branch name—to be used in a terminal command prompt2.

The instructions in the file encourage copying the file somewhere like ~/, and then using it in your prompt. However, I prefer being able to pull in changes by keeping it in a Git repository. Keeping a local checkout of Git’s source proved a bit too bulky, so I decided to extract the file I needed into a separate repository3.

Extracting a single file from a repository requires extracting a directory and then removing any unwanted files. First, we clone Git’s repository:

git clone https://github.com/git/git.git
Cloning into 'git'...
[...]
Updating files: 100% (3956/3956), done.

Then, switch directories to end up inside:

cd git

A note on git-filter-branch

An often-recommended approach is using git filter-branch. To extract the contrib/completion directory, we pass the directory name through the --subdirectory option:

git filter-branch --subdirectory-filter contrib/completion

The command halts, refuses to continue and displays us the following warning:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
         rewrites.  Hit Ctrl-C before proceeding to abort, then use an
         alternative filtering tool such as 'git filter-repo'
         (https://github.com/newren/git-filter-repo/) instead.  See the
         filter-branch manual page for more details; to squelch this warning,
         set FILTER_BRANCH_SQUELCH_WARNING=1.

Aside from the risk of corrupting your repository, running git-filter-branch on this repository takes well over a minute if we were to run it with the FILTER_BRANCH_SQUELCH_WARNING environment variable set4.

It’s generally a bad idea to use git-filter-branch, so we’ll have to find another way.

Using git-filter-repo

As instructed, we’ll use git-filter-repo instead. The git-filter-repo subcommand isn’t bundled with Git, but it’s installable through most package managers5. On a Mac, use Homebrew:

brew install git-filter-repo
==> Downloading https://ghcr.io/v2/homebrew/core/git-filter-repo/manifests/2.32.0-1
[...]
/usr/local/Cellar/git-filter-repo/2.32.0: 8 files, 278.2KB

Extracting a subdirectory

The command to extract a subdirectory is a direct translation from the one in git-filter-repo. Again, we pass the directory we’d like to extract as the --subdirectory-filter:

git filter-repo --subdirectory-filter contrib/completion
Parsed 65913 commits
HEAD is now at 5a63a42a9e Merge branch 'fw/complete-cmd-idx-fix'

New history written in 13.23 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Completely finished after 15.47 seconds.

Voilà! That was at least five times as fast. We’ve extracted all files in the contrib/completion directory, while retaining their history:

ls
git-completion.bash
git-completion.tcsh
git-completion.zsh
git-prompt.sh

Extracting a single file

The previous example removed most of git’s source code from our checkout, but we’re still left with the completion files in the directory. In this specific cace, we only really need contrib/completion/git-prompt.sh.

To extract a single file from a git repository, first extract the subdirectory like we’ve just done, then use the --path option to filter out all files except the selected one:

git filter-repo --path 'git-prompt.sh'
Parsed 1240 commits
HEAD is now at 9db4940 git-prompt: work under set -u

New history written in 1.86 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Completely finished after 2.62 seconds.

Now we’re left with a repository holding a single file.

ls
git-prompt.sh

You can pass the --path option multiple times to take out more than one, use --path-blog or --path-regex to match multiple files with a pattern, or combine any of these with the --invert-paths option to invert the selection to remove the matching files instead of keeping them.


  1. Outside of extracting subdirectories and files, git-filter-branch can move files between directories, remove files from repositories, move a while repository into a subdirectory, rewrite commit messages, change author names, and more.

    ↩︎
  2. I extracted the git-prompt.sh file out of Git’s repository back in 2016 as git-prompt.sh, and I’ve been using this prompt ever since:

    source ~/.config/git-prompt.sh/git-prompt.sh
    
    export PROMPT='%~ $(__git_ps1 "(%s) ")$ '
    

    While in a Git repository, my prompt shows the current branch name:

    $ ~/.config/git-prompt.sh (main)
    
    ↩︎
  3. Assuming I’m the only one using this extraction, I must admit there’s a problem with this approach. While it seems like it would save some time because it’s quick to pull a new version of the file, updating it involves extracting the file, then rebasing the license and readme onto Git’s upstream main branch before I get the ease of pulling in changes through Git. If no such extraction existed (I’m committed to it now), it would have been quicker to download the file directly through cURL:

    curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-prompt.sh --output ~/.config/git-prompt
    

    A mirror that automatically updates and extracts the file would solve this. Until then, it’s a nice example to show how to extract files from Git repositories.

    ↩︎
  4. The documentation page for git-filter-branch reiterates that using this command is “glacially slow”, that it “easily corrupts repos”, and urges us once more to use git-filter-repo.

    ↩︎
  5. Check out the documentation for installation instructions.

    ↩︎