git-filter-repo library supports many ways of file filtering and history rewriting1, but extracting directories and files are two I’ve needed in the past.
- Extracting a subdirectory from a Git repository
To extract the directory named
path/to/dir, move all files in
path/to/dirto the repository root:
git filter-repo --subdirectory-filter path/to/dir
- Extracting a single file from a Git repository
To extract a single file named
path/to/file.txt, move all files in
path/toto the repository root:
git filter-repo --subdirectory-filter path/to
Then, remove all files except
git filter-repo --path file.txt
Git comes with a file named
git-prompt.sh, which exposes
__git_ps1—a function that returns the current Git branch name—to be used in a terminal command prompt2.
The instructions in the file encourage copying the file somewhere like
~/, and then using it in your prompt. However, I prefer being able to pull in changes by keeping it in a Git repository. Keeping a local checkout of Git’s source proved a bit too bulky, so I decided to extract the file I needed into a separate repository3.
Extracting a single file from a repository requires extracting a directory and then removing any unwanted files. First, we clone Git’s repository:
git clone https://github.com/git/git.git
Cloning into 'git'... [...] Updating files: 100% (3956/3956), done.
Then, switch directories to end up inside:
A note on
An often-recommended approach is using
git filter-branch. To extract the
contrib/completion directory, we pass the directory name through the
git filter-branch --subdirectory-filter contrib/completion
The command halts, refuses to continue and displays us the following warning:
WARNING: git-filter-branch has a glut of gotchas generating mangled history rewrites. Hit Ctrl-C before proceeding to abort, then use an alternative filtering tool such as 'git filter-repo' (https://github.com/newren/git-filter-repo/) instead. See the filter-branch manual page for more details; to squelch this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
Aside from the risk of corrupting your repository, running
git-filter-branch on this repository takes well over a minute if we were to run it with the
FILTER_BRANCH_SQUELCH_WARNING environment variable set4.
It’s generally a bad idea to use
git-filter-branch, so we’ll have to find another way.
As instructed, we’ll use
git-filter-repo instead. The
git-filter-repo subcommand isn’t bundled with Git, but it’s installable through most package managers5. On a Mac, use Homebrew:
brew install git-filter-repo
==> Downloading https://ghcr.io/v2/homebrew/core/git-filter-repo/manifests/2.32.0-1 [...] /usr/local/Cellar/git-filter-repo/2.32.0: 8 files, 278.2KB
Extracting a subdirectory
The command to extract a subdirectory is a direct translation from the one in
git-filter-repo. Again, we pass the directory we’d like to extract as the
git filter-repo --subdirectory-filter contrib/completion
Parsed 65913 commits HEAD is now at 5a63a42a9e Merge branch 'fw/complete-cmd-idx-fix' New history written in 13.23 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Completely finished after 15.47 seconds.
Voilà! That was at least five times as fast. We’ve extracted all files in the
contrib/completion directory, while retaining their history:
git-completion.bash git-completion.tcsh git-completion.zsh git-prompt.sh
Extracting a single file
The previous example removed most of git’s source code from our checkout, but we’re still left with the completion files in the directory. In this specific cace, we only really need
To extract a single file from a git repository, first extract the subdirectory like we’ve just done, then use the
--path option to filter out all files except the selected one:
git filter-repo --path 'git-prompt.sh'
Parsed 1240 commits HEAD is now at 9db4940 git-prompt: work under set -u New history written in 1.86 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Completely finished after 2.62 seconds.
Now we’re left with a repository holding a single file.
You can pass the
--path option multiple times to take out more than one, use
--path-regex to match multiple files with a pattern, or combine any of these with the
--invert-paths option to invert the selection to remove the matching files instead of keeping them.
Outside of extracting subdirectories and files,
git-filter-branch can move files between directories, remove files from repositories, move a while repository into a subdirectory, rewrite commit messages, change author names, and more.
I extracted the
git-prompt.sh file out of Git’s repository back in 2016 as git-prompt.sh, and I’ve been using this prompt ever since:
source ~/.config/git-prompt.sh/git-prompt.sh export PROMPT='%~ $(__git_ps1 "(%s) ")$ '
While in a Git repository, my prompt shows the current branch name:
$ ~/.config/git-prompt.sh (main)
Assuming I’m the only one using this extraction, I must admit there’s a problem with this approach. While it seems like it would save some time because it’s quick to pull a new version of the file, updating it involves extracting the file, then rebasing the license and readme onto Git’s upstream main branch before I get the ease of pulling in changes through Git. If no such extraction existed (I’m comitted to it now), it would have been quicker to download the file directly through cURL:
curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-prompt.sh --output ~/.config/git-prompt
A mirror that automatically updates and extracts the file would solve this. Until then, it’s a nice example to show how to extract files from Git repositories.