The git-filter-repo
library supports many ways of file filtering and history rewriting1, but extracting directories and files are two I’ve needed in the past.
Cheat sheet
- Extracting a subdirectory from a Git repository
To extract the directory named
path/to/dir
, move all files inpath/to/dir
to the repository root:git filter-repo --subdirectory-filter path/to/dir
- Extracting a single file from a Git repository
To extract a single file named
path/to/file.txt
, move all files inpath/to
to the repository root:git filter-repo --subdirectory-filter path/to
Then, remove all files except
file.txt
:git filter-repo --path file.txt
Extracting files
Git comes with a file named git-prompt.sh
, which exposes __git_ps1
—a function that returns the current Git branch name—to be used in a terminal command prompt2.
The instructions in the file encourage copying the file somewhere like ~/
, and then using it in your prompt.
However, I prefer being able to pull in changes by keeping it in a Git repository.
Keeping a local checkout of Git’s source proved a bit too bulky, so I decided to extract the file I needed into a separate repository3.
Extracting a single file from a repository requires extracting a directory and then removing any unwanted files. First, we clone Git’s repository:
git clone https://github.com/git/git.git
Cloning into 'git'... [...] Updating files: 100% (3956/3956), done.
Then, switch directories to end up inside:
cd git
A note on git-filter-branch
An often-recommended approach is using git filter-branch
.
To extract the contrib/completion
directory, we pass the directory name through the --subdirectory
option:
git filter-branch --subdirectory-filter contrib/completion
The command halts, refuses to continue and displays us the following warning:
WARNING: git-filter-branch has a glut of gotchas generating mangled history rewrites. Hit Ctrl-C before proceeding to abort, then use an alternative filtering tool such as 'git filter-repo' (https://github.com/newren/git-filter-repo/) instead. See the filter-branch manual page for more details; to squelch this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
Aside from the risk of corrupting your repository, running git-filter-branch
on this repository takes well over a minute if we were to run it with the FILTER_BRANCH_SQUELCH_WARNING
environment variable set4.
It’s generally a bad idea to use git-filter-branch
, so we’ll have to find another way.
Using git-filter-repo
As instructed, we’ll use git-filter-repo
instead.
The git-filter-repo
subcommand isn’t bundled with Git, but it’s installable through most package managers5.
On a Mac, use Homebrew:
brew install git-filter-repo
==> Downloading https://ghcr.io/v2/homebrew/core/git-filter-repo/manifests/2.32.0-1 [...] /usr/local/Cellar/git-filter-repo/2.32.0: 8 files, 278.2KB
Extracting a subdirectory
The command to extract a subdirectory is a direct translation from the one in git-filter-repo
.
Again, we pass the directory we’d like to extract as the --subdirectory-filter
:
git filter-repo --subdirectory-filter contrib/completion
Parsed 65913 commits HEAD is now at 5a63a42a9e Merge branch 'fw/complete-cmd-idx-fix' New history written in 13.23 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Completely finished after 15.47 seconds.
Voilà! That was at least five times as fast.
We’ve extracted all files in the contrib/completion
directory, while retaining their history:
ls
git-completion.bash git-completion.tcsh git-completion.zsh git-prompt.sh
Extracting a single file
The previous example removed most of git’s source code from our checkout, but we’re still left with the completion files in the directory.
In this specific cace, we only really need contrib/completion/git-prompt.sh
.
To extract a single file from a git repository, first extract the subdirectory like we’ve just done, then use the --path
option to filter out all files except the selected one:
git filter-repo --path 'git-prompt.sh'
Parsed 1240 commits HEAD is now at 9db4940 git-prompt: work under set -u New history written in 1.86 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Completely finished after 2.62 seconds.
Now we’re left with a repository holding a single file.
ls
git-prompt.sh
You can pass the --path
option multiple times to take out more than one, use --path-blog
or --path-regex
to match multiple files with a pattern, or combine any of these with the --invert-paths
option to invert the selection to remove the matching files instead of keeping them.
Outside of extracting subdirectories and files,
↩︎git-filter-branch
can move files between directories, remove files from repositories, move a while repository into a subdirectory, rewrite commit messages, change author names, and more.I extracted the
git-prompt.sh
file out of Git’s repository back in 2016 as git-prompt.sh, and I’ve been using this prompt ever since:source ~/.config/git-prompt.sh/git-prompt.sh export PROMPT='%~ $(__git_ps1 "(%s) ")$ '
While in a Git repository, my prompt shows the current branch name:
$ ~/.config/git-prompt.sh (main)
Assuming I’m the only one using this extraction, I must admit there’s a problem with this approach. While it seems like it would save some time because it’s quick to pull a new version of the file, updating it involves extracting the file, then rebasing the license and readme onto Git’s upstream main branch before I get the ease of pulling in changes through Git. If no such extraction existed (I’m committed to it now), it would have been quicker to download the file directly through cURL:
curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-prompt.sh --output ~/.config/git-prompt
A mirror that automatically updates and extracts the file would solve this. Until then, it’s a nice example to show how to extract files from Git repositories.
↩︎The documentation page for
↩︎git-filter-branch
reiterates that using this command is “glacially slow”, that it “easily corrupts repos”, and urges us once more to use git-filter-repo.Check out the documentation for installation instructions.
↩︎