pull/494/merge
NAHO 10 months ago committed by GitHub
commit c6ae44d242
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,17 +1,17 @@
git filter-repo is a versatile tool for rewriting history, which includes
`git filter-repo` is a versatile tool for rewriting history, which includes
[capabilities I have not found anywhere
else](#design-rationale-behind-filter-repo). It roughly falls into the
same space of tool as [git
filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
same space of tool as [`git
filter-branch`](https://git-scm.com/docs/git-filter-branch) but without the
capitulation-inducing poor
[performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
with far more capabilities, and with a design that scales usability-wise
beyond trivial rewriting cases. [git filter-repo is now recommended by the
beyond trivial rewriting cases. [`git filter-repo` is now recommended by the
git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
of git filter-branch.
of git `filter-branch`.
While most users will probably just use filter-repo as a simple command
line tool (and likely only use a few of its flags), at its core filter-repo
While most users will probably just use `filter-repo` as a simple command
line tool (and likely only use a few of its flags), at its core `filter-repo`
contains a library for creating history rewriting tools. As such, users
with specialized needs can leverage it to quickly create [entirely new
history rewriting tools](contrib/filter-repo-demos).
@ -21,130 +21,140 @@ history rewriting tools](contrib/filter-repo-demos).
* [Prerequisites](#prerequisites)
* [How do I install it?](#how-do-i-install-it)
* [How do I use it?](#how-do-i-use-it)
* [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
* [filter-branch](#filter-branch)
* [Why `filter-repo` instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
* [`filter-branch`](#filter-branch)
* [BFG Repo Cleaner](#bfg-repo-cleaner)
* [Simple example, with comparisons](#simple-example-with-comparisons)
* [Solving this with filter-repo](#solving-this-with-filter-repo)
* [Solving this with `filter-repo`](#solving-this-with-filter-repo)
* [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
* [Solving this with filter-branch](#solving-this-with-filter-branch)
* [Solving this with `filter-branch`](#solving-this-with-filter-branch)
* [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
* [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
* [Design rationale behind `filter-repo`](#design-rationale-behind-filter-repo)
* [How do I contribute?](#how-do-i-contribute)
* [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
* [Upstream Improvements](#upstream-improvements)
# Prerequisites
filter-repo requires:
`filter-repo` requires:
* git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
* `git` >= 2.22.0 at a minimum; [some features](#upstream-improvements)
require git >= 2.24.0 or later
* python3 >= 3.5
* `python3` >= 3.5
# How do I install it?
`git-filter-repo` is a single-file python script, which was done to make
installation for basic use on many systems trivial: just place that
file into your $PATH.
file into your `$PATH`.
See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
cases. The more involved instructions are only needed if one of the
following apply:
* you do not find the above comment about trivial installation intuitively
obvious
* you are working with a python3 executable named something other than
"python3"
* you want to install documentation (beyond the builtin docs shown with -h)
* you want to run some of the [contrib](contrib/filter-repo-demos/) examples
* you want to create your own python filtering scripts using filter-repo as
* You do not find the above comment about trivial installation intuitively
obvious.
* You are working with a `python3` executable named something other than
`python3`.
* You want to install documentation (beyond the builtin docs shown with `-h`).
* You want to run some of the [contrib](contrib/filter-repo-demos/) examples.
* You want to create your own python filtering scripts using `filter-repo` as.
a module/library
# How do I use it?
For comprehensive documentation:
* see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
* alternative formating of the user manual is available on various
* See the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html).
* Alternative formating of the user manual is available on various
external sites
([example](https://www.mankier.com/1/git-filter-repo)), for those
that don't like the htmlpreview.github.io layout, though it may
only be up-to-date as of the latest release
only be up-to-date as of the latest release.
If you prefer learning from examples:
* there is a [cheat sheet for converting filter-branch
* There is a [cheat sheet for converting `filter-branch`
commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
which covers every example from the filter-branch manual
* there is a [cheat sheet for converting BFG Repo Cleaner
which covers every example from the `filter-branch` manual.
* There is a [cheat sheet for converting BFG Repo Cleaner
commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
which covers every example from the BFG website
* the [simple example](#simple-example-with-comparisons) below may
be of interest
* the user manual has an extensive [examples
section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
which covers every example from the BFG website.
* The [simple example](#simple-example-with-comparisons) below may
be of interest.
# Why filter-repo instead of other alternatives?
* The user manual has an extensive [examples
section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES).
# Why `filter-repo` instead of other alternatives?
This was covered in more detail in a [Git Rev News article on
filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
`filter-repo`](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
but some highlights for the main competitors:
## filter-branch
## `filter-branch`
* filter-branch is [extremely to unusably
* `filter-branch` is [extremely to unusably
slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
([multiple orders of magnitude slower than it should
be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
for non-trivial repositories.
* [filter-branch is riddled with
* [`filter-branch` is riddled with
gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
silently corrupt your rewrite or at least thwart your "cleanup"
efforts by giving you something more problematic and messy than what
you started with.
* filter-branch is [very onerous](#simple-example-with-comparisons)
* `filter-branch` is [very onerous](#simple-example-with-comparisons)
[to
use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
for any rewrite which is even slightly non-trivial.
* the git project has stated that the above issues with filter-branch
* The git project has stated that the above issues with `filter-branch`
cannot be backward compatibly fixed; they recommend that you [stop
using
filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
`filter-branch`](https://git-scm.com/docs/git-filter-branch#_warning).
* die-hard fans of filter-branch may be interested in
[filter-lamely](contrib/filter-repo-demos/filter-lamely)
(a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
a reimplementation of filter-branch based on filter-repo which is
* Die-hard fans of `filter-branch` may be interested in
[`filter-lamely`](contrib/filter-repo-demos/filter-lamely)
(a.k.a. [`filter-branch-ish`](contrib/filter-repo-demos/filter-branch-ish)),
a reimplementation of `filter-branch` based on `filter-repo` which is
more performant (though not nearly as fast or safe as
filter-repo).
`filter-repo`).
* a [cheat
* A [cheat
sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
is available showing how to convert example commands from the manual of
filter-branch into filter-repo commands.
`filter-branch` into `filter-repo` commands.
## BFG Repo Cleaner
* great tool for its time, but while it makes some things simple, it
* Great tool for its time, but while it makes some things simple, it
is limited to a few kinds of rewrites.
* its architecture is not amenable to handling more types of
* Its architecture is not amenable to handling more types of
rewrites.
* its architecture presents some shortcomings and bugs even for its
* Its architecture presents some shortcomings and bugs even for its
intended usecase.
* fans of bfg may be interested in
[bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
based on filter-repo which includes several new features and bugfixes
* Fans of bfg may be interested in
[`bfg-ish`](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
based on `filter-repo` which includes several new features and bugfixes
relative to bfg.
* a [cheat
* A [cheat
sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
is available showing how to convert example commands from the manual of
BFG Repo Cleaner into filter-repo commands.
BFG Repo Cleaner into `filter-repo` commands.
# Simple example, with comparisons
@ -152,21 +162,25 @@ Let's say that we want to extract a piece of a repository, with the intent
on merging just that piece into some other bigger repo. For extraction, we
want to:
* extract the history of a single directory, src/. This means that only
paths under src/ remain in the repo, and any commits that only touched
* Extract the history of a single directory, `src/`. This means that only
paths under `src/` remain in the repo, and any commits that only touched
paths outside this directory will be removed.
* rename all files to have a new leading directory, my-module/ (e.g. so that
src/foo.c becomes my-module/src/foo.c)
* rename any tags in the extracted repository to have a 'my-module-'
* Rename all files to have a new leading directory, `my-module/` (e.g. so that
`src/foo.c` becomes `my-module/src/foo.c`).
* Rename any tags in the extracted repository to have a `my-module-`
prefix (to avoid any conflicts when we later merge this repo into
something else)
something else).
## Solving this with filter-repo
## Solving this with `filter-repo`
Doing this with filter-repo is as simple as the following command:
```shell
git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
Doing this with `filter-repo` is as simple as the following command:
```bash
git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
```
(the single quotes are unnecessary, but make it clearer to a human that we
are replacing the empty string as a prefix with `my-module-`)
@ -175,124 +189,132 @@ are replacing the empty string as a prefix with `my-module-`)
BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
three types of wanted changes are outside of its capabilities.
## Solving this with filter-branch
## Solving this with `filter-branch`
filter-branch comes with a pile of caveats (more on that below) even
`filter-branch` comes with a pile of caveats (more on that below) even
once you figure out the necessary invocation(s):
```shell
git filter-branch \
--tree-filter 'mkdir -p my-module && \
git ls-files \
| grep -v ^src/ \
| xargs git rm -f -q && \
ls -d * \
| grep -v my-module \
| xargs -I files mv files my-module/' \
--tag-name-filter 'echo "my-module-$(cat)"' \
--prune-empty -- --all
git clone file://$(pwd) newcopy
cd newcopy
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git gc --prune=now
```bash
git filter-branch \
--tree-filter 'mkdir -p my-module && \
git ls-files \
| grep -v ^src/ \
| xargs git rm -f -q && \
ls -d * \
| grep -v my-module \
| xargs -I files mv files my-module/' \
--tag-name-filter 'echo "my-module-$(cat)"' \
--prune-empty -- --all
git clone file://$(pwd) newcopy
cd newcopy
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git gc --prune=now
```
Some might notice that the above filter-branch invocation will be really
slow due to using --tree-filter; you could alternatively use the
--index-filter option of filter-branch, changing the above commands to:
```shell
git filter-branch \
--index-filter 'git ls-files \
| grep -v ^src/ \
| xargs git rm -q --cached;
git ls-files -s \
| sed "s%$(printf \\t)%&my-module/%" \
| git update-index --index-info;
git ls-files \
| grep -v ^my-module/ \
| xargs git rm -q --cached' \
--tag-name-filter 'echo "my-module-$(cat)"' \
--prune-empty -- --all
git clone file://$(pwd) newcopy
cd newcopy
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git gc --prune=now
Some might notice that the above `filter-branch` invocation will be really
slow due to using `--tree-filter`; you could alternatively use the
`--index-filter` option of `filter-branch`, changing the above commands to:
```bash
git filter-branch \
--index-filter 'git ls-files \
| grep -v ^src/ \
| xargs git rm -q --cached;
git ls-files -s \
| sed "s%$(printf \\t)%&my-module/%" \
| git update-index --index-info;
git ls-files \
| grep -v ^my-module/ \
| xargs git rm -q --cached' \
--tag-name-filter 'echo "my-module-$(cat)"' \
--prune-empty -- --all
git clone file://$(pwd) newcopy
cd newcopy
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git gc --prune=now
```
However, for either filter-branch command there are a pile of caveats.
However, for either `filter-branch` command there are a pile of caveats.
First, some may be wondering why I list five commands here for
filter-branch. Despite the use of --all and --tag-name-filter, and
filter-branch's manpage claiming that a clone is enough to get rid of
`filter-branch`. Despite the use of `--all` and `--tag-name-filter`, and
`filter-branch`'s manpage claiming that a clone is enough to get rid of
old objects, the extra steps to delete the other tags and do another
gc are still required to clean out the old objects and avoid mixing
new and old history before pushing somewhere. Other caveats:
* Commit messages are not rewritten; so if some of your commit
messages refer to prior commits by (abbreviated) sha1, after the
rewrite those messages will now refer to commits that are no longer
part of the history. It would be better to rewrite those
(abbreviated) sha1 references to refer to the new commit ids.
* The --prune-empty flag sometimes misses commits that should be
* The `--prune-empty` flag sometimes misses commits that should be
pruned, and it will also prune commits that *started* empty rather
than just ended empty due to filtering. For repositories that
intentionally use empty commits for versioning and publishing
related purposes, this can be detrimental.
* The commands above are OS-specific. GNU vs. BSD issues for sed,
xargs, and other commands often trip up users; I think I failed to
get most folks to use --index-filter since the only example in the
filter-branch manpage that both uses it and shows how to move
* The commands above are OS-specific. GNU vs. BSD issues for `sed`,
`xargs`, and other commands often trip up users; I think I failed to
get most folks to use `--index-filter` since the only example in the
`filter-branch` manpage that both uses it and shows how to move
everything into a subdirectory is linux-specific, and it is not
obvious to the reader that it has a portability issue since it
silently misbehaves rather than failing loudly.
* The --index-filter version of the filter-branch command may be two to
three times faster than the --tree-filter version, but both
filter-branch commands are going to be multiple orders of magnitude
slower than filter-repo.
* The `--index-filter` version of the `filter-branch` command may be two to
three times faster than the `--tree-filter` version, but both
`filter-branch` commands are going to be multiple orders of magnitude
slower than `filter-repo`.
* Both commands assume all filenames are composed entirely of ascii
characters (even special ascii characters such as tabs or double
quotes will wreak havoc and likely result in missing files or
misnamed files)
misnamed files).
## Solving this with fast-export/fast-import
One can kind of hack this together with something like:
```shell
git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
--signed-tags=strip --tag-of-filtered-object=rewrite --all \
| grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
| grep -vP '^D (?!src/)' \
| perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
| perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
| perl -pe s%refs/tags/%refs/tags/my-module-% \
| git -c core.ignorecase=false fast-import --date-format=raw-permissive \
--force --quiet
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git reset --hard
git reflog expire --expire=now --all
git gc --prune=now
```bash
git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
--signed-tags=strip --tag-of-filtered-object=rewrite --all \
| grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
| grep -vP '^D (?!src/)' \
| perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
| perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
| perl -pe s%refs/tags/%refs/tags/my-module-% \
| git -c core.ignorecase=false fast-import --date-format=raw-permissive \
--force --quiet
git for-each-ref --format="delete %(refname)" refs/tags/ \
| grep -v refs/tags/my-module- \
| git update-ref --stdin
git reset --hard
git reflog expire --expire=now --all
git gc --prune=now
```
But this comes with some nasty caveats and limitations:
* The various greps and regex replacements operate on the entire
fast-export stream and thus might accidentally corrupt unintended
portions of it, such as commit messages. If you needed to edit
file contents and thus dropped the --no-data flag, it could also
file contents and thus dropped the `--no-data` flag, it could also
end up corrupting file contents.
* This command assumes all filenames in the repository are composed
entirely of ascii characters, and also exclude special characters
such as tabs or double quotes. If such a special filename exists
within the old src/ directory, it will be pruned even though it
within the old `src/` directory, it will be pruned even though it
was intended to be kept. (In slightly different repository
rewrites, this type of editing also risks corrupting filenames
with special characters by adding extra double quotes near the end
of the filename and in some leading directory name.)
* This command will leave behind huge numbers of useless empty
commits, and has no realistic way of pruning them. (And if you
tried to combine this technique with another tool to prune the
@ -300,12 +322,13 @@ But this comes with some nasty caveats and limitations:
commits which were made empty by the filtering that you want to
remove, and commits which were empty before the filtering process
and which you thus may want to keep.)
* Commit messages which reference other commits by hash will now
reference old commits that no longer exist. Attempting to edit
the commit messages to update them is extraordinarily difficult to
add to this kind of direct rewrite.
# Design rationale behind filter-repo
# Design rationale behind `filter-repo`
None of the existing repository filtering tools did what I wanted;
they all came up short for my needs. No tool provided any of the
@ -315,7 +338,7 @@ two of the last four traits either:
1. [Starting report] Provide user an analysis of their repo to help
them get started on what to prune or rename, instead of expecting
them to guess or find other tools to figure it out. (Triggered, e.g.
by running the first time with a special flag, such as --analyze.)
by running the first time with a special flag, such as `--analyze`.)
1. [Keep vs. remove] Instead of just providing a way for users to
easily remove selected paths, also provide flags for users to
@ -323,7 +346,7 @@ two of the last four traits either:
specifying to remove all paths other than the ones they want to
keep, but the need to specify all paths that *ever* existed in
**any** version of the repository could sometimes be quite
painful. For filter-branch, using pipelines like `git ls-files |
painful. For `filter-branch`, using pipelines like `git ls-files |
grep -v ... | xargs -r git rm` might be a reasonable workaround
but can get unwieldy and isn't as straightforward for users; plus
those commands are often operating-system specific (can you spot
@ -349,7 +372,7 @@ two of the last four traits either:
mechanism. Strongly encourage that workflow by [detecting and
bailing if we're not in a fresh
clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
unless the user overrides with --force.
unless the user overrides with `--force`.
1. [Auto shrink] Automatically remove old cruft and repack the
repository for the user after filtering (unless overridden); this
@ -357,12 +380,12 @@ two of the last four traits either:
history together, and avoids problems where the multi-step
process for shrinking the repo documented in the manpage doesn't
actually work in some cases. (I'm looking at you,
filter-branch.)
`filter-branch`.)
1. [Clean separation] Avoid confusing users (and prevent accidental
re-pushing of old stuff) due to mixing old repo and rewritten
repo together. (This is particularly a problem with filter-branch
when using the --tag-name-filter option, and sometimes also an
repo together. (This is particularly a problem with `filter-branch`
when using the `--tag-name-filter` option, and sometimes also an
issue when only filtering a subset of branches.)
1. [Versatility] Provide the user the ability to extend the tool or
@ -415,11 +438,11 @@ two of the last four traits either:
cases, if the merge has no file changes of its own, then the merge
commit can also be pruned. However, much as we do with empty
pruning we do not prune merge commits that started degenerate
(which indicates it may have been intentional, such as with --no-ff
(which indicates it may have been intentional, such as with `--no-ff`
merges) but only merge commits that become degenerate and have no
file changes of their own.
1. [Speed] Filtering should be reasonably fast
1. [Speed] Filtering should be reasonably fast.
# How do I contribute?
@ -427,18 +450,18 @@ See the [contributing guidelines](Documentation/Contributing.md).
# Is there a Code of Conduct?
Participants in the filter-repo community are expected to adhere to
Participants in the `filter-repo` community are expected to adhere to
the same standards as for the git project, so the [git Code of
Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
applies.
# Upstream Improvements
Work on filter-repo and [its
Work on `filter-repo` and [its
predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
has also driven numerous improvements to fast-export and fast-import
(and occasionally other commands) in core git, based on things
filter-repo needs to do its work:
`filter-repo` needs to do its work:
* git-2.28.0
* [fast-import: add new --date-format=raw-permissive format](

Loading…
Cancel
Save