|
|
|
@ -1,17 +1,17 @@
|
|
|
|
|
git filter-repo is a versatile tool for rewriting history, which includes
|
|
|
|
|
`git filter-repo` is a versatile tool for rewriting history, which includes
|
|
|
|
|
[capabilities I have not found anywhere
|
|
|
|
|
else](#design-rationale-behind-filter-repo). It roughly falls into the
|
|
|
|
|
same space of tool as [git
|
|
|
|
|
filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
|
|
|
|
|
same space of tool as [`git
|
|
|
|
|
filter-branch`](https://git-scm.com/docs/git-filter-branch) but without the
|
|
|
|
|
capitulation-inducing poor
|
|
|
|
|
[performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
|
|
|
|
|
with far more capabilities, and with a design that scales usability-wise
|
|
|
|
|
beyond trivial rewriting cases. [git filter-repo is now recommended by the
|
|
|
|
|
beyond trivial rewriting cases. [`git filter-repo` is now recommended by the
|
|
|
|
|
git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
|
|
|
|
|
of git filter-branch.
|
|
|
|
|
of git `filter-branch`.
|
|
|
|
|
|
|
|
|
|
While most users will probably just use filter-repo as a simple command
|
|
|
|
|
line tool (and likely only use a few of its flags), at its core filter-repo
|
|
|
|
|
While most users will probably just use `filter-repo` as a simple command
|
|
|
|
|
line tool (and likely only use a few of its flags), at its core `filter-repo`
|
|
|
|
|
contains a library for creating history rewriting tools. As such, users
|
|
|
|
|
with specialized needs can leverage it to quickly create [entirely new
|
|
|
|
|
history rewriting tools](contrib/filter-repo-demos).
|
|
|
|
@ -21,130 +21,140 @@ history rewriting tools](contrib/filter-repo-demos).
|
|
|
|
|
* [Prerequisites](#prerequisites)
|
|
|
|
|
* [How do I install it?](#how-do-i-install-it)
|
|
|
|
|
* [How do I use it?](#how-do-i-use-it)
|
|
|
|
|
* [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
|
|
|
|
|
* [filter-branch](#filter-branch)
|
|
|
|
|
* [Why `filter-repo` instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
|
|
|
|
|
* [`filter-branch`](#filter-branch)
|
|
|
|
|
* [BFG Repo Cleaner](#bfg-repo-cleaner)
|
|
|
|
|
* [Simple example, with comparisons](#simple-example-with-comparisons)
|
|
|
|
|
* [Solving this with filter-repo](#solving-this-with-filter-repo)
|
|
|
|
|
* [Solving this with `filter-repo`](#solving-this-with-filter-repo)
|
|
|
|
|
* [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
|
|
|
|
|
* [Solving this with filter-branch](#solving-this-with-filter-branch)
|
|
|
|
|
* [Solving this with `filter-branch`](#solving-this-with-filter-branch)
|
|
|
|
|
* [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
|
|
|
|
|
* [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
|
|
|
|
|
* [Design rationale behind `filter-repo`](#design-rationale-behind-filter-repo)
|
|
|
|
|
* [How do I contribute?](#how-do-i-contribute)
|
|
|
|
|
* [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
|
|
|
|
|
* [Upstream Improvements](#upstream-improvements)
|
|
|
|
|
|
|
|
|
|
# Prerequisites
|
|
|
|
|
|
|
|
|
|
filter-repo requires:
|
|
|
|
|
`filter-repo` requires:
|
|
|
|
|
|
|
|
|
|
* git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
|
|
|
|
|
* `git` >= 2.22.0 at a minimum; [some features](#upstream-improvements)
|
|
|
|
|
require git >= 2.24.0 or later
|
|
|
|
|
* python3 >= 3.5
|
|
|
|
|
* `python3` >= 3.5
|
|
|
|
|
|
|
|
|
|
# How do I install it?
|
|
|
|
|
|
|
|
|
|
`git-filter-repo` is a single-file python script, which was done to make
|
|
|
|
|
installation for basic use on many systems trivial: just place that
|
|
|
|
|
file into your $PATH.
|
|
|
|
|
file into your `$PATH`.
|
|
|
|
|
|
|
|
|
|
See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
|
|
|
|
|
cases. The more involved instructions are only needed if one of the
|
|
|
|
|
following apply:
|
|
|
|
|
|
|
|
|
|
* you do not find the above comment about trivial installation intuitively
|
|
|
|
|
obvious
|
|
|
|
|
* you are working with a python3 executable named something other than
|
|
|
|
|
"python3"
|
|
|
|
|
* you want to install documentation (beyond the builtin docs shown with -h)
|
|
|
|
|
* you want to run some of the [contrib](contrib/filter-repo-demos/) examples
|
|
|
|
|
* you want to create your own python filtering scripts using filter-repo as
|
|
|
|
|
* You do not find the above comment about trivial installation intuitively
|
|
|
|
|
obvious.
|
|
|
|
|
|
|
|
|
|
* You are working with a `python3` executable named something other than
|
|
|
|
|
`python3`.
|
|
|
|
|
|
|
|
|
|
* You want to install documentation (beyond the builtin docs shown with `-h`).
|
|
|
|
|
|
|
|
|
|
* You want to run some of the [contrib](contrib/filter-repo-demos/) examples.
|
|
|
|
|
|
|
|
|
|
* You want to create your own python filtering scripts using `filter-repo` as.
|
|
|
|
|
a module/library
|
|
|
|
|
|
|
|
|
|
# How do I use it?
|
|
|
|
|
|
|
|
|
|
For comprehensive documentation:
|
|
|
|
|
* see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
|
|
|
|
|
* alternative formating of the user manual is available on various
|
|
|
|
|
|
|
|
|
|
* See the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html).
|
|
|
|
|
|
|
|
|
|
* Alternative formating of the user manual is available on various
|
|
|
|
|
external sites
|
|
|
|
|
([example](https://www.mankier.com/1/git-filter-repo)), for those
|
|
|
|
|
that don't like the htmlpreview.github.io layout, though it may
|
|
|
|
|
only be up-to-date as of the latest release
|
|
|
|
|
only be up-to-date as of the latest release.
|
|
|
|
|
|
|
|
|
|
If you prefer learning from examples:
|
|
|
|
|
* there is a [cheat sheet for converting filter-branch
|
|
|
|
|
|
|
|
|
|
* There is a [cheat sheet for converting `filter-branch`
|
|
|
|
|
commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
|
|
|
|
|
which covers every example from the filter-branch manual
|
|
|
|
|
* there is a [cheat sheet for converting BFG Repo Cleaner
|
|
|
|
|
which covers every example from the `filter-branch` manual.
|
|
|
|
|
|
|
|
|
|
* There is a [cheat sheet for converting BFG Repo Cleaner
|
|
|
|
|
commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
|
|
|
|
|
which covers every example from the BFG website
|
|
|
|
|
* the [simple example](#simple-example-with-comparisons) below may
|
|
|
|
|
be of interest
|
|
|
|
|
* the user manual has an extensive [examples
|
|
|
|
|
section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
|
|
|
|
|
which covers every example from the BFG website.
|
|
|
|
|
|
|
|
|
|
* The [simple example](#simple-example-with-comparisons) below may
|
|
|
|
|
be of interest.
|
|
|
|
|
|
|
|
|
|
# Why filter-repo instead of other alternatives?
|
|
|
|
|
* The user manual has an extensive [examples
|
|
|
|
|
section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES).
|
|
|
|
|
|
|
|
|
|
# Why `filter-repo` instead of other alternatives?
|
|
|
|
|
|
|
|
|
|
This was covered in more detail in a [Git Rev News article on
|
|
|
|
|
filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
|
|
|
|
|
`filter-repo`](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
|
|
|
|
|
but some highlights for the main competitors:
|
|
|
|
|
|
|
|
|
|
## filter-branch
|
|
|
|
|
## `filter-branch`
|
|
|
|
|
|
|
|
|
|
* filter-branch is [extremely to unusably
|
|
|
|
|
* `filter-branch` is [extremely to unusably
|
|
|
|
|
slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
|
|
|
|
|
([multiple orders of magnitude slower than it should
|
|
|
|
|
be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
|
|
|
|
|
for non-trivial repositories.
|
|
|
|
|
|
|
|
|
|
* [filter-branch is riddled with
|
|
|
|
|
* [`filter-branch` is riddled with
|
|
|
|
|
gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
|
|
|
|
|
silently corrupt your rewrite or at least thwart your "cleanup"
|
|
|
|
|
efforts by giving you something more problematic and messy than what
|
|
|
|
|
you started with.
|
|
|
|
|
|
|
|
|
|
* filter-branch is [very onerous](#simple-example-with-comparisons)
|
|
|
|
|
* `filter-branch` is [very onerous](#simple-example-with-comparisons)
|
|
|
|
|
[to
|
|
|
|
|
use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
|
|
|
|
|
for any rewrite which is even slightly non-trivial.
|
|
|
|
|
|
|
|
|
|
* the git project has stated that the above issues with filter-branch
|
|
|
|
|
* The git project has stated that the above issues with `filter-branch`
|
|
|
|
|
cannot be backward compatibly fixed; they recommend that you [stop
|
|
|
|
|
using
|
|
|
|
|
filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
|
|
|
|
|
`filter-branch`](https://git-scm.com/docs/git-filter-branch#_warning).
|
|
|
|
|
|
|
|
|
|
* die-hard fans of filter-branch may be interested in
|
|
|
|
|
[filter-lamely](contrib/filter-repo-demos/filter-lamely)
|
|
|
|
|
(a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
|
|
|
|
|
a reimplementation of filter-branch based on filter-repo which is
|
|
|
|
|
* Die-hard fans of `filter-branch` may be interested in
|
|
|
|
|
[`filter-lamely`](contrib/filter-repo-demos/filter-lamely)
|
|
|
|
|
(a.k.a. [`filter-branch-ish`](contrib/filter-repo-demos/filter-branch-ish)),
|
|
|
|
|
a reimplementation of `filter-branch` based on `filter-repo` which is
|
|
|
|
|
more performant (though not nearly as fast or safe as
|
|
|
|
|
filter-repo).
|
|
|
|
|
`filter-repo`).
|
|
|
|
|
|
|
|
|
|
* a [cheat
|
|
|
|
|
* A [cheat
|
|
|
|
|
sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
|
|
|
|
|
is available showing how to convert example commands from the manual of
|
|
|
|
|
filter-branch into filter-repo commands.
|
|
|
|
|
`filter-branch` into `filter-repo` commands.
|
|
|
|
|
|
|
|
|
|
## BFG Repo Cleaner
|
|
|
|
|
|
|
|
|
|
* great tool for its time, but while it makes some things simple, it
|
|
|
|
|
* Great tool for its time, but while it makes some things simple, it
|
|
|
|
|
is limited to a few kinds of rewrites.
|
|
|
|
|
|
|
|
|
|
* its architecture is not amenable to handling more types of
|
|
|
|
|
* Its architecture is not amenable to handling more types of
|
|
|
|
|
rewrites.
|
|
|
|
|
|
|
|
|
|
* its architecture presents some shortcomings and bugs even for its
|
|
|
|
|
* Its architecture presents some shortcomings and bugs even for its
|
|
|
|
|
intended usecase.
|
|
|
|
|
|
|
|
|
|
* fans of bfg may be interested in
|
|
|
|
|
[bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
|
|
|
|
|
based on filter-repo which includes several new features and bugfixes
|
|
|
|
|
* Fans of bfg may be interested in
|
|
|
|
|
[`bfg-ish`](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
|
|
|
|
|
based on `filter-repo` which includes several new features and bugfixes
|
|
|
|
|
relative to bfg.
|
|
|
|
|
|
|
|
|
|
* a [cheat
|
|
|
|
|
* A [cheat
|
|
|
|
|
sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
|
|
|
|
|
is available showing how to convert example commands from the manual of
|
|
|
|
|
BFG Repo Cleaner into filter-repo commands.
|
|
|
|
|
BFG Repo Cleaner into `filter-repo` commands.
|
|
|
|
|
|
|
|
|
|
# Simple example, with comparisons
|
|
|
|
|
|
|
|
|
@ -152,21 +162,25 @@ Let's say that we want to extract a piece of a repository, with the intent
|
|
|
|
|
on merging just that piece into some other bigger repo. For extraction, we
|
|
|
|
|
want to:
|
|
|
|
|
|
|
|
|
|
* extract the history of a single directory, src/. This means that only
|
|
|
|
|
paths under src/ remain in the repo, and any commits that only touched
|
|
|
|
|
* Extract the history of a single directory, `src/`. This means that only
|
|
|
|
|
paths under `src/` remain in the repo, and any commits that only touched
|
|
|
|
|
paths outside this directory will be removed.
|
|
|
|
|
* rename all files to have a new leading directory, my-module/ (e.g. so that
|
|
|
|
|
src/foo.c becomes my-module/src/foo.c)
|
|
|
|
|
* rename any tags in the extracted repository to have a 'my-module-'
|
|
|
|
|
|
|
|
|
|
* Rename all files to have a new leading directory, `my-module/` (e.g. so that
|
|
|
|
|
`src/foo.c` becomes `my-module/src/foo.c`).
|
|
|
|
|
|
|
|
|
|
* Rename any tags in the extracted repository to have a `my-module-`
|
|
|
|
|
prefix (to avoid any conflicts when we later merge this repo into
|
|
|
|
|
something else)
|
|
|
|
|
something else).
|
|
|
|
|
|
|
|
|
|
## Solving this with filter-repo
|
|
|
|
|
## Solving this with `filter-repo`
|
|
|
|
|
|
|
|
|
|
Doing this with filter-repo is as simple as the following command:
|
|
|
|
|
```shell
|
|
|
|
|
git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
|
|
|
|
|
Doing this with `filter-repo` is as simple as the following command:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
(the single quotes are unnecessary, but make it clearer to a human that we
|
|
|
|
|
are replacing the empty string as a prefix with `my-module-`)
|
|
|
|
|
|
|
|
|
@ -175,124 +189,132 @@ are replacing the empty string as a prefix with `my-module-`)
|
|
|
|
|
BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
|
|
|
|
|
three types of wanted changes are outside of its capabilities.
|
|
|
|
|
|
|
|
|
|
## Solving this with filter-branch
|
|
|
|
|
## Solving this with `filter-branch`
|
|
|
|
|
|
|
|
|
|
filter-branch comes with a pile of caveats (more on that below) even
|
|
|
|
|
`filter-branch` comes with a pile of caveats (more on that below) even
|
|
|
|
|
once you figure out the necessary invocation(s):
|
|
|
|
|
|
|
|
|
|
```shell
|
|
|
|
|
git filter-branch \
|
|
|
|
|
--tree-filter 'mkdir -p my-module && \
|
|
|
|
|
git ls-files \
|
|
|
|
|
| grep -v ^src/ \
|
|
|
|
|
| xargs git rm -f -q && \
|
|
|
|
|
ls -d * \
|
|
|
|
|
| grep -v my-module \
|
|
|
|
|
| xargs -I files mv files my-module/' \
|
|
|
|
|
--tag-name-filter 'echo "my-module-$(cat)"' \
|
|
|
|
|
--prune-empty -- --all
|
|
|
|
|
git clone file://$(pwd) newcopy
|
|
|
|
|
cd newcopy
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
```bash
|
|
|
|
|
git filter-branch \
|
|
|
|
|
--tree-filter 'mkdir -p my-module && \
|
|
|
|
|
git ls-files \
|
|
|
|
|
| grep -v ^src/ \
|
|
|
|
|
| xargs git rm -f -q && \
|
|
|
|
|
ls -d * \
|
|
|
|
|
| grep -v my-module \
|
|
|
|
|
| xargs -I files mv files my-module/' \
|
|
|
|
|
--tag-name-filter 'echo "my-module-$(cat)"' \
|
|
|
|
|
--prune-empty -- --all
|
|
|
|
|
git clone file://$(pwd) newcopy
|
|
|
|
|
cd newcopy
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Some might notice that the above filter-branch invocation will be really
|
|
|
|
|
slow due to using --tree-filter; you could alternatively use the
|
|
|
|
|
--index-filter option of filter-branch, changing the above commands to:
|
|
|
|
|
|
|
|
|
|
```shell
|
|
|
|
|
git filter-branch \
|
|
|
|
|
--index-filter 'git ls-files \
|
|
|
|
|
| grep -v ^src/ \
|
|
|
|
|
| xargs git rm -q --cached;
|
|
|
|
|
git ls-files -s \
|
|
|
|
|
| sed "s%$(printf \\t)%&my-module/%" \
|
|
|
|
|
| git update-index --index-info;
|
|
|
|
|
git ls-files \
|
|
|
|
|
| grep -v ^my-module/ \
|
|
|
|
|
| xargs git rm -q --cached' \
|
|
|
|
|
--tag-name-filter 'echo "my-module-$(cat)"' \
|
|
|
|
|
--prune-empty -- --all
|
|
|
|
|
git clone file://$(pwd) newcopy
|
|
|
|
|
cd newcopy
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
Some might notice that the above `filter-branch` invocation will be really
|
|
|
|
|
slow due to using `--tree-filter`; you could alternatively use the
|
|
|
|
|
`--index-filter` option of `filter-branch`, changing the above commands to:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
git filter-branch \
|
|
|
|
|
--index-filter 'git ls-files \
|
|
|
|
|
| grep -v ^src/ \
|
|
|
|
|
| xargs git rm -q --cached;
|
|
|
|
|
git ls-files -s \
|
|
|
|
|
| sed "s%$(printf \\t)%&my-module/%" \
|
|
|
|
|
| git update-index --index-info;
|
|
|
|
|
git ls-files \
|
|
|
|
|
| grep -v ^my-module/ \
|
|
|
|
|
| xargs git rm -q --cached' \
|
|
|
|
|
--tag-name-filter 'echo "my-module-$(cat)"' \
|
|
|
|
|
--prune-empty -- --all
|
|
|
|
|
git clone file://$(pwd) newcopy
|
|
|
|
|
cd newcopy
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
However, for either filter-branch command there are a pile of caveats.
|
|
|
|
|
However, for either `filter-branch` command there are a pile of caveats.
|
|
|
|
|
First, some may be wondering why I list five commands here for
|
|
|
|
|
filter-branch. Despite the use of --all and --tag-name-filter, and
|
|
|
|
|
filter-branch's manpage claiming that a clone is enough to get rid of
|
|
|
|
|
`filter-branch`. Despite the use of `--all` and `--tag-name-filter`, and
|
|
|
|
|
`filter-branch`'s manpage claiming that a clone is enough to get rid of
|
|
|
|
|
old objects, the extra steps to delete the other tags and do another
|
|
|
|
|
gc are still required to clean out the old objects and avoid mixing
|
|
|
|
|
new and old history before pushing somewhere. Other caveats:
|
|
|
|
|
|
|
|
|
|
* Commit messages are not rewritten; so if some of your commit
|
|
|
|
|
messages refer to prior commits by (abbreviated) sha1, after the
|
|
|
|
|
rewrite those messages will now refer to commits that are no longer
|
|
|
|
|
part of the history. It would be better to rewrite those
|
|
|
|
|
(abbreviated) sha1 references to refer to the new commit ids.
|
|
|
|
|
* The --prune-empty flag sometimes misses commits that should be
|
|
|
|
|
|
|
|
|
|
* The `--prune-empty` flag sometimes misses commits that should be
|
|
|
|
|
pruned, and it will also prune commits that *started* empty rather
|
|
|
|
|
than just ended empty due to filtering. For repositories that
|
|
|
|
|
intentionally use empty commits for versioning and publishing
|
|
|
|
|
related purposes, this can be detrimental.
|
|
|
|
|
* The commands above are OS-specific. GNU vs. BSD issues for sed,
|
|
|
|
|
xargs, and other commands often trip up users; I think I failed to
|
|
|
|
|
get most folks to use --index-filter since the only example in the
|
|
|
|
|
filter-branch manpage that both uses it and shows how to move
|
|
|
|
|
|
|
|
|
|
* The commands above are OS-specific. GNU vs. BSD issues for `sed`,
|
|
|
|
|
`xargs`, and other commands often trip up users; I think I failed to
|
|
|
|
|
get most folks to use `--index-filter` since the only example in the
|
|
|
|
|
`filter-branch` manpage that both uses it and shows how to move
|
|
|
|
|
everything into a subdirectory is linux-specific, and it is not
|
|
|
|
|
obvious to the reader that it has a portability issue since it
|
|
|
|
|
silently misbehaves rather than failing loudly.
|
|
|
|
|
* The --index-filter version of the filter-branch command may be two to
|
|
|
|
|
three times faster than the --tree-filter version, but both
|
|
|
|
|
filter-branch commands are going to be multiple orders of magnitude
|
|
|
|
|
slower than filter-repo.
|
|
|
|
|
|
|
|
|
|
* The `--index-filter` version of the `filter-branch` command may be two to
|
|
|
|
|
three times faster than the `--tree-filter` version, but both
|
|
|
|
|
`filter-branch` commands are going to be multiple orders of magnitude
|
|
|
|
|
slower than `filter-repo`.
|
|
|
|
|
|
|
|
|
|
* Both commands assume all filenames are composed entirely of ascii
|
|
|
|
|
characters (even special ascii characters such as tabs or double
|
|
|
|
|
quotes will wreak havoc and likely result in missing files or
|
|
|
|
|
misnamed files)
|
|
|
|
|
misnamed files).
|
|
|
|
|
|
|
|
|
|
## Solving this with fast-export/fast-import
|
|
|
|
|
|
|
|
|
|
One can kind of hack this together with something like:
|
|
|
|
|
|
|
|
|
|
```shell
|
|
|
|
|
git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
|
|
|
|
|
--signed-tags=strip --tag-of-filtered-object=rewrite --all \
|
|
|
|
|
| grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
|
|
|
|
|
| grep -vP '^D (?!src/)' \
|
|
|
|
|
| perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
|
|
|
|
|
| perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
|
|
|
|
|
| perl -pe s%refs/tags/%refs/tags/my-module-% \
|
|
|
|
|
| git -c core.ignorecase=false fast-import --date-format=raw-permissive \
|
|
|
|
|
--force --quiet
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git reset --hard
|
|
|
|
|
git reflog expire --expire=now --all
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
```bash
|
|
|
|
|
git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
|
|
|
|
|
--signed-tags=strip --tag-of-filtered-object=rewrite --all \
|
|
|
|
|
| grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
|
|
|
|
|
| grep -vP '^D (?!src/)' \
|
|
|
|
|
| perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
|
|
|
|
|
| perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
|
|
|
|
|
| perl -pe s%refs/tags/%refs/tags/my-module-% \
|
|
|
|
|
| git -c core.ignorecase=false fast-import --date-format=raw-permissive \
|
|
|
|
|
--force --quiet
|
|
|
|
|
git for-each-ref --format="delete %(refname)" refs/tags/ \
|
|
|
|
|
| grep -v refs/tags/my-module- \
|
|
|
|
|
| git update-ref --stdin
|
|
|
|
|
git reset --hard
|
|
|
|
|
git reflog expire --expire=now --all
|
|
|
|
|
git gc --prune=now
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
But this comes with some nasty caveats and limitations:
|
|
|
|
|
|
|
|
|
|
* The various greps and regex replacements operate on the entire
|
|
|
|
|
fast-export stream and thus might accidentally corrupt unintended
|
|
|
|
|
portions of it, such as commit messages. If you needed to edit
|
|
|
|
|
file contents and thus dropped the --no-data flag, it could also
|
|
|
|
|
file contents and thus dropped the `--no-data` flag, it could also
|
|
|
|
|
end up corrupting file contents.
|
|
|
|
|
|
|
|
|
|
* This command assumes all filenames in the repository are composed
|
|
|
|
|
entirely of ascii characters, and also exclude special characters
|
|
|
|
|
such as tabs or double quotes. If such a special filename exists
|
|
|
|
|
within the old src/ directory, it will be pruned even though it
|
|
|
|
|
within the old `src/` directory, it will be pruned even though it
|
|
|
|
|
was intended to be kept. (In slightly different repository
|
|
|
|
|
rewrites, this type of editing also risks corrupting filenames
|
|
|
|
|
with special characters by adding extra double quotes near the end
|
|
|
|
|
of the filename and in some leading directory name.)
|
|
|
|
|
|
|
|
|
|
* This command will leave behind huge numbers of useless empty
|
|
|
|
|
commits, and has no realistic way of pruning them. (And if you
|
|
|
|
|
tried to combine this technique with another tool to prune the
|
|
|
|
@ -300,12 +322,13 @@ But this comes with some nasty caveats and limitations:
|
|
|
|
|
commits which were made empty by the filtering that you want to
|
|
|
|
|
remove, and commits which were empty before the filtering process
|
|
|
|
|
and which you thus may want to keep.)
|
|
|
|
|
|
|
|
|
|
* Commit messages which reference other commits by hash will now
|
|
|
|
|
reference old commits that no longer exist. Attempting to edit
|
|
|
|
|
the commit messages to update them is extraordinarily difficult to
|
|
|
|
|
add to this kind of direct rewrite.
|
|
|
|
|
|
|
|
|
|
# Design rationale behind filter-repo
|
|
|
|
|
# Design rationale behind `filter-repo`
|
|
|
|
|
|
|
|
|
|
None of the existing repository filtering tools did what I wanted;
|
|
|
|
|
they all came up short for my needs. No tool provided any of the
|
|
|
|
@ -315,7 +338,7 @@ two of the last four traits either:
|
|
|
|
|
1. [Starting report] Provide user an analysis of their repo to help
|
|
|
|
|
them get started on what to prune or rename, instead of expecting
|
|
|
|
|
them to guess or find other tools to figure it out. (Triggered, e.g.
|
|
|
|
|
by running the first time with a special flag, such as --analyze.)
|
|
|
|
|
by running the first time with a special flag, such as `--analyze`.)
|
|
|
|
|
|
|
|
|
|
1. [Keep vs. remove] Instead of just providing a way for users to
|
|
|
|
|
easily remove selected paths, also provide flags for users to
|
|
|
|
@ -323,7 +346,7 @@ two of the last four traits either:
|
|
|
|
|
specifying to remove all paths other than the ones they want to
|
|
|
|
|
keep, but the need to specify all paths that *ever* existed in
|
|
|
|
|
**any** version of the repository could sometimes be quite
|
|
|
|
|
painful. For filter-branch, using pipelines like `git ls-files |
|
|
|
|
|
painful. For `filter-branch`, using pipelines like `git ls-files |
|
|
|
|
|
grep -v ... | xargs -r git rm` might be a reasonable workaround
|
|
|
|
|
but can get unwieldy and isn't as straightforward for users; plus
|
|
|
|
|
those commands are often operating-system specific (can you spot
|
|
|
|
@ -349,7 +372,7 @@ two of the last four traits either:
|
|
|
|
|
mechanism. Strongly encourage that workflow by [detecting and
|
|
|
|
|
bailing if we're not in a fresh
|
|
|
|
|
clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
|
|
|
|
|
unless the user overrides with --force.
|
|
|
|
|
unless the user overrides with `--force`.
|
|
|
|
|
|
|
|
|
|
1. [Auto shrink] Automatically remove old cruft and repack the
|
|
|
|
|
repository for the user after filtering (unless overridden); this
|
|
|
|
@ -357,12 +380,12 @@ two of the last four traits either:
|
|
|
|
|
history together, and avoids problems where the multi-step
|
|
|
|
|
process for shrinking the repo documented in the manpage doesn't
|
|
|
|
|
actually work in some cases. (I'm looking at you,
|
|
|
|
|
filter-branch.)
|
|
|
|
|
`filter-branch`.)
|
|
|
|
|
|
|
|
|
|
1. [Clean separation] Avoid confusing users (and prevent accidental
|
|
|
|
|
re-pushing of old stuff) due to mixing old repo and rewritten
|
|
|
|
|
repo together. (This is particularly a problem with filter-branch
|
|
|
|
|
when using the --tag-name-filter option, and sometimes also an
|
|
|
|
|
repo together. (This is particularly a problem with `filter-branch`
|
|
|
|
|
when using the `--tag-name-filter` option, and sometimes also an
|
|
|
|
|
issue when only filtering a subset of branches.)
|
|
|
|
|
|
|
|
|
|
1. [Versatility] Provide the user the ability to extend the tool or
|
|
|
|
@ -415,11 +438,11 @@ two of the last four traits either:
|
|
|
|
|
cases, if the merge has no file changes of its own, then the merge
|
|
|
|
|
commit can also be pruned. However, much as we do with empty
|
|
|
|
|
pruning we do not prune merge commits that started degenerate
|
|
|
|
|
(which indicates it may have been intentional, such as with --no-ff
|
|
|
|
|
(which indicates it may have been intentional, such as with `--no-ff`
|
|
|
|
|
merges) but only merge commits that become degenerate and have no
|
|
|
|
|
file changes of their own.
|
|
|
|
|
|
|
|
|
|
1. [Speed] Filtering should be reasonably fast
|
|
|
|
|
1. [Speed] Filtering should be reasonably fast.
|
|
|
|
|
|
|
|
|
|
# How do I contribute?
|
|
|
|
|
|
|
|
|
@ -427,18 +450,18 @@ See the [contributing guidelines](Documentation/Contributing.md).
|
|
|
|
|
|
|
|
|
|
# Is there a Code of Conduct?
|
|
|
|
|
|
|
|
|
|
Participants in the filter-repo community are expected to adhere to
|
|
|
|
|
Participants in the `filter-repo` community are expected to adhere to
|
|
|
|
|
the same standards as for the git project, so the [git Code of
|
|
|
|
|
Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
|
|
|
|
|
applies.
|
|
|
|
|
|
|
|
|
|
# Upstream Improvements
|
|
|
|
|
|
|
|
|
|
Work on filter-repo and [its
|
|
|
|
|
Work on `filter-repo` and [its
|
|
|
|
|
predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
|
|
|
|
|
has also driven numerous improvements to fast-export and fast-import
|
|
|
|
|
(and occasionally other commands) in core git, based on things
|
|
|
|
|
filter-repo needs to do its work:
|
|
|
|
|
`filter-repo` needs to do its work:
|
|
|
|
|
|
|
|
|
|
* git-2.28.0
|
|
|
|
|
* [fast-import: add new --date-format=raw-permissive format](
|
|
|
|
|