You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
git-filter-repo/Documentation/converting-from-filter-bran...

347 lines
11 KiB
Markdown

# Cheat Sheet: Converting from filter-branch
This document is aimed at folks who are familiar with filter-branch and want
to learn how to convert over to using filter-repo.
## Table of Contents
* [Half-hearted conversions](#half-hearted-conversions)
* [Intention of "equivalent" commands](#intention-of-equivalent-commands)
* [Basic Differences](#basic-differences)
* [Cheat Sheet: Conversion of Examples from the filter-branch manpage](#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
* [Cheat Sheet: Additional conversion examples](#cheat-sheet-additional-conversion-examples)
## Half-hearted conversions
You can switch nearly any `git filter-branch` command to use
filter-repo under the covers by just replacing the `git filter-branch`
part of the command with
[`filter-lamely`](../contrib/filter-repo-demos/filter-lamely). The
git.git regression testsuite passes when I swap out the filter-branch
script with filter-lamely, for example. (However, the filter-branch
tests are not very comprehensive, so don't rely on that too much.)
Doing a half-hearted conversion has nearly all of the drawbacks of
filter-branch and nearly none of the benefits of filter-repo, but it
will make your command run a few times faster and makes for a very
simple conversion.
You'll get a lot more performance, safety, and features by just
switching to direct filter-repo commands.
## Intention of "equivalent" commands
filter-branch and filter-repo have different defaults, as highlighted
in the Basic Differences section below. As such, getting a command
which behaves identically is not possible. Also, sometimes the
filter-branch manpage lies, e.g. it says "suppose you want to...from
all commits" and then uses a command line like "git filter-branch
... HEAD", which only operates on commits in the current branch rather
than on all commits.
Rather than focusing on matching filter-branch output as exactly as
possible, I treat the filter-branch examples as idiomatic ways to
solve a certain type of problem with filter-branch, and express how
one would idiomatically solve the same problem in filter-repo.
Sometimes that means the results are not identical, but they are
largely the same in each case.
## Basic Differences
With `git filter-branch`, you have a git repository where every single
commit (within the branches or revisions you specify) is checked out
and then you run one or more shell commands to transform the working
copy into your desired end state.
With `git filter-repo`, you are essentially given an editing tool to
operate on the [fast-export](https://git-scm.com/docs/git-fast-export)
serialization of a repo. That means there is an input stream of all
the contents of the repository, and rather than specifying filters in
the form of commands to run, you usually employ a number of common
pre-defined filters that provide various ways to slice, dice, or
modify the repo based on its components (such as pathnames, file
content, user names or emails, etc.) That makes common operations
easier, even if it's not as versatile as shell callbacks. For cases
where more complexity or special casing is needed, filter-repo
provides python callbacks that can operate on the data structures
populated from the fast-export stream to do just about anything you
want.
filter-branch defaults to working on a subset of the repository, and
requires you to specify a branch or branches, meaning you need to
specify `-- --all` to modify all commits. filter-repo by contrast
defaults to rewriting everything, and you need to specify `--refs
<rev-list-args>` if you want to limit to just a certain set of
branches or range of commits. (Though any `<rev-list-args>` that
begin with a hyphen are not accepted by filter-repo as they look like
the start of different options.)
filter-repo also takes care of additional concerns automatically, like
rewriting commit messages that reference old commit IDs to instead
reference the rewritten commit IDs, pruning commits which do not start
empty but become empty due to the specified filters, and automatically
shrinking and gc'ing the repo at the end of the filtering operation.
## Cheat Sheet: Conversion of Examples from the filter-branch manpage
### Removing a file
The filter-branch manual provided three different examples of removing
a single file, based on different levels of ease vs. carefulness and
performance:
```shell
git filter-branch --tree-filter 'rm filename' HEAD
```
```shell
git filter-branch --tree-filter 'rm -f filename' HEAD
```
```shell
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
```
All of these just become
```shell
git filter-repo --invert-paths --path filename
```
### Extracting a subdirectory
Extracting a subdirectory via
```shell
git filter-branch --subdirectory-filter foodir -- --all
```
is one of the easiest commands to convert; it just becomes
```shell
git filter-repo --subdirectory-filter foodir
```
### Moving the whole tree into a subdirectory
Keeping all files but placing them in a new subdirectory via
```shell
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
```
(which happens to be GNU-specific and will fail with BSD userland in
very subtle ways) becomes
```shell
git filter-repo --to-subdirectory-filter newsubdir
```
(which works fine regardless of GNU vs BSD userland differences.)
### Re-grafting history
The filter-branch manual provided one example with three different
commands that could be used to achieve it, though the first of them
had limited applicability (only when the repo had a single initial
commit). These three examples were:
```shell
git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
```
```shell
git filter-branch --parent-filter \
'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
```
```shell
git replace --graft $commit-id $graft-id
git filter-branch $graft-id..HEAD
```
git-replace did not exist when the original two examples were written,
but it is clear that the last example is far easier to understand. As
such, filter-repo just uses the same mechanism:
```shell
git replace --graft $commit-id $graft-id
git filter-repo --force
```
NOTE: --force should usually be avoided unless you have taken care to
make sure you have a backup (or are running on a fresh clone of) your
repo. It is needed in this case because filter-repo errors out when
no arguments are specified, and because it usually first checks
whether you are in a fresh clone before irrecoverably rewriting your
repository (git-replace created a new graft and thus added something
to your previously fresh clone).
### Removing commits by a certain author
WARNING: This is a BAD example for BOTH filter-branch and filter-repo.
It does not remove the changes the user made from the repo, it just
removes the commit in question while smashing the changes from it into
any subsequent commits as though the subsequent authors had been
responsible for those changes as well. `git rebase` is likely to be a
better fit for what you really want if you are looking at this
example. (See also [this explanation of the differences between
rebase and
filter-repo](https://github.com/newren/git-filter-repo/issues/62#issuecomment-597725502))
This filter-branch example
```shell
git filter-branch --commit-filter '
if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
then
skip_commit "$@";
else
git commit-tree "$@";
fi' HEAD
```
becomes
```shell
git filter-repo --commit-callback '
if commit.author_name == b"Darl McBribe":
commit.skip()
'
```
### Rewriting commit messages -- removing text
Removing git-svn-id: lines from commit messages via
```shell
git filter-branch --msg-filter '
sed -e "/^git-svn-id:/d"
'
```
becomes
```shell
git filter-repo --message-callback '
return re.sub(b"^git-svn-id:.*\n", b"", message, flags=re.MULTILINE)
'
```
### Rewriting commit messages -- adding text
Adding Acked-by lines to the last ten commits via
```shell
git filter-branch --msg-filter '
cat &&
echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
' master~10..master
```
becomes
```shell
git filter-repo --message-callback '
return message + b"Acked-by: Bugs Bunny <bunny@bugzilla.org>\n"
' --refs master~10..master
```
### Changing author/committer(/tagger?) information
```shell
git filter-branch --env-filter '
if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
then
GIT_AUTHOR_EMAIL=john@example.com
fi
if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
then
GIT_COMMITTER_EMAIL=john@example.com
fi
' -- --all
```
becomes either
```shell
# Ensure '<john@example.com> <root@localhost>' is a line in .mailmap, then:
git filter-repo --use-mailmap
```
or
```shell
git filter-repo --email-callback '
return email if email != b"root@localhost" else b"john@example.com"
'
```
(and as a bonus both filter-repo alternatives will fix tagger emails
too, unlike the filter-branch example)
### Restricting to a range
The partial examples
```shell
git filter-branch ... C..H
```
```shell
git filter-branch ... C..H ^D
```
```shell
git filter-branch ... D..H ^C
```
become
```shell
git filter-repo ... --refs C..H
```
```shell
git filter-repo ... --refs C..H ^D
```
```shell
git filter-repo ... --refs D..H ^C
```
Note that filter-branch accepts `--not` among the revision specifiers,
but that appears to python to be a flag name which breaks parsing.
So, instead of e.g. `--not C` as we might use with filter-branch, we
can specify `^C` to filter-repo.
## Cheat Sheet: Additional conversion examples
### Running a code formatter or linter on each file with some extension
Running some program on a subset of files is relatively natural in
filter-branch:
```shell
git filter-branch --tree-filter '
git ls-files -z "*.c" \
| xargs -0 -n 1 clang-format -style=file -i
'
```
filter-repo decided not to provide a way to run an external program to
do filtering, because most filter-branch uses of this ability are
riddled with [safety
problems](https://git-scm.com/docs/git-filter-branch#SAFETY) and
[performance
issues](https://git-scm.com/docs/git-filter-branch#PERFORMANCE).
However, in special cases like this it's fairly safe. One can write a
script that uses filter-repo as a library to achieve this, while also
gaining filter-repo's automatic handling of other concerns like
rewriting commit IDs in commit messages or pruning commits that become
empty. In fact, one of the [contrib
demos](../contrib/filter-repo-demos),
[lint-history](../contrib/filter-repo-demos/lint-history), handles
this exact type of situation already:
```shell
lint-history --relevant 'return filename.endswith(b".c")' \
clang-format -style=file -i
```