git-filter-repo

Commit Graph

Author	SHA1	Message	Date
Elijah Newren	d615d71411	filter-repo: simplify FastExportFilter and nuke unused code Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	5ea234153c	filter-repo: add RepoFilter.finish() function for code readability When we only have an output and no input of our own, filter.run() seems weird to call, especially since it'll only be closing a handle and waiting for fast-import to finish. Add a finish() synonym for such a case to make external code callers more legible. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	72b69b3dbe	filter-repo: support --source and --target options This will allow exporting from one repo into a different repo, and combined with chained RepoFilter instances from commit `81016821a1` (filter-repo: allow chaining of RepoFilter instances, 2019-01-07), will even allow things like splicing separate repositories together. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d0640bad7a	filter-repo: perform sanity checks before setting up output processes We do not want to kill fast-import processes unused; it's better to abort before those processes are created when we know we need to. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	81016821a1	filter-repo: allow chaining of RepoFilter instances Allow each instance to be just input or just output so that we can splice repos together or split one into multiple different repos. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	59f3947857	filter-repo: allow importing into a bare repository If we are using --stdin, it should be okay to import into a bare repo, but the checks were enforcing that we were in a clone with a packfile. Relax the check to work within a bare repo as well. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	6fffed6bb1	filter-repo: handle blob callbacks without excessive empty-pruning checks If we have blob callbacks, we cannot pass --no-data to fast-export. Also, with blob callbacks, any file the callback modifies could match the modification done to the file by a subsequent commit, possibly making the later commit empty. As such, we keep a record of all filenames modified (by blob or commit callbacks), and then check all these filenames for all subsequent commits to see if it causes empty commits. In particular, if files other than these are modified in a non-merge commit, we know that the commit will not become empty so we can bypass the empty-pruning checks. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	dbdb18170b	filter-repo: perf hack -- avoid expensive empty pruning checks If a commit was a non-merge commit previously, then since we do not do any kind of blob modifications (or funny parent grafting), there is no way for a filemodify instruction to introduce the same version of the file that already existed in the parent, as such the only check we need to do to determine whether a commit becomes empty is whether file_changes is empty. Subsequent more expensive checks can be skipped. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d0037275af	filter-repo: allow RepoFilter.run to be passed callbacks Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	55c2c32d7c	filter-repo: group high-level repo filtering functions into a class Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4e2110136e	filter-repo: group repo analysis functions into a class Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9887dd5cbe	filter-repo: move sanity_check to put analyze functions before filtering ones Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	fc90cf8ca9	filter-repo: collect various short functions into a GitUtils helper class Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2f3a445875	filter-repo: restructure argument parsing for re-use Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9bb4188e83	filter-repo: perf hack -- do minimal amount of quoting required by fast-import Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	da5895ecc3	filter-repo: restructure empty pruning Split a lot of the logic out into separate functions, and avoid flattening parents when the original commit history itself had redundant parents (such as --no-ff merges). Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	1c3bc2fa1e	filter-repo: track skipped/pruned commits Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	70e6f848ed	filter-repo: modify parse_optional_parent_ref to return original parent too commits may not have any parents at all. As such, parse_optional_parent_ref() is used expecting that it will sometimes return None. Now, when commits are skipped, we have a scheme to translate anyone that depends on such commits to instead depend on the nearest ancestor of such commits. If the entire ancestry of a commit was skipped along with a comit, then that commit will be translated to None, which is indistinguishable from there having been no parent to begin with. Sometimes our scheme needs to distinguish between a commit that started with no parents and one which ended up with no parents, so we need a way to tell these apart. Also, not knowing the original parent makes it hard for us to determine if the original had the same weird topology that the current commit does. For example, it is possible for a merge commit to have one parent be the ancestor of another (particularly when --no-ff is passed to git merge), or even for a merge commit to have the same commit used as both parents (if you use low-level commands to create a crazy commit). There are cases where the pruning of some commits could cause either of these situations to arise, and it's useful to be able to distinguish between intentionally "weird" history and history that has been made weird due to other pruning, because the latter we may have reason to do additional pruning on. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	ab1b43f480	filter-repo: add a couple minor clarifications Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	69147fe120	filter-repo: fix crazy timezone issues Oh, boy, timezone +051800 exists in the wild. Is that 0518 hours and 00 minutes? Or 05 hours and 1800 minutes? Or 051 hours and 800 minutes? Attempt to do something sane with these broken commits that fast-import barfs on. Also, fix an old bug in the handling of ahead-of-UTC timezones. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	03507e57f5	filter-repo: buffer subprocess stdout to significantly improve performance Apparently, the default for subprocess stdout is unbuffered; switching it to buffered yields a huge 40% speedup. Doing this also exposes the need to add fi_input.flush() calls, highlighting another performance issue. We may be able to have fewer such calls with some refactoring, but that is a bigger separate change. Just having them highlighted to remind about them as a performance issue is good for now. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9ebd3117ca	filter-repo: notify user when we start writing reports Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	554c7e39af	filter-repo: switch --analyze to use rev-list\|diff-tree pipeline As suggested by Peff, use rev-list & diff-tree to get the information we need, instead of relying on fast-export (with some out-of-tree patches) to get that information. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	beff0b958f	filter-repo: be more thorough about path quoting, and handle non-ascii Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	becc29a9bd	filter-repo: show progress parsing blob sizes Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	aa7eebbc88	filter-repo: add ProgressWriter class and switch FastExportFilter to it Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a2540f4087	filter-repo: add packed sizes to --analyze reports Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7048be2849	filter-repo: split analysis reports into separate files Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	37c92d9352	filter-repo: handle tags pointing at commits pruned along with their history If a tag points at a commit whose changes are all filtered out and thus becomes empty and gets pruned, and all of its ancestors are likewise pruned, then there is no need for the tag; just nuke it. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	77d5e93135	filter-repo: add some preventative sanity checks Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9b88f3f094	filter-repo: ensure we parse all merge parents, even if some became pruned Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	424faa3103	filter-repo: add optional newline to make --dry-run output easier to parse Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	c8a96d4684	filter-repo: add --subdirectory-filter and --to-subdirectory-filter Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e36a62c2c7	filter-repo: add tag renaming Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f813469ff8	filter-repo: start revamping the --help page Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4f149daacc	filter-repo: aid debugging with a string representation of several classes Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	6ca3d7c1c7	filter-repo: add --analyze option This option walks through the repository history and creates a report with basic statistics, rename related information, and sizes of objects and when/if those have been deleted. It primarily looks at unpacked sizes (i.e. size of object ignoring delta-ing and compression), and sums the size of each version of the file for each path. Additionally, it aggregates these sums by extension and by directory, and tracks whether paths, extensions, and directories have been deleted. This can be very useful in determining what the big things are, and whether they might have been considered to have been mistakes to add to the repository in the first place. There are numerous caveats with the determination of "deleted" and "renamed", and can give both false positives and false negatives. But they are only meant as a helpful heuristic to give others a starting point for an investigation, and the information provide so far is useful. I do want to improve the equivalence classes (rename handling), but that is for a future commit. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	af3225be67	filter-repo: show progress while parsing fast-export stream Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	8cc889eb89	filter-repo: handle basic path renames Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2bcf83aa7b	filter-repo: avoid dying on tags; strip/rewrite by default Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	af081d0fce	filter-repo: add automatic rewriting of commit hashes in commit messages Commit messages often refer to past commits; while rewriting commits we would also like to update these commit messages to refer to the new commit names. In the case that a commit message references another commit which was dropped by the filtering process, we have no way to rewrite the commit message to reference a valid commit hash. Instead of dying, note the suboptimal commit in the suboptimal-issues file. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f95308c5eb	filter-repo: add handling of 'original-oid' directive This will be used later to help with commit message rewriting (so that commits can continue to refer to other commits in their history, using the new rewritten hashes for those commits), and perhaps also in removing blobs by id. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	057947f6ff	filter-repo: prune commits that started empty if they now have no parents If ancient history that pre-dated some subdirectory had a few empty commits, we would rather those all got pruned as well. Empty commits from the original repository should only be retained if they have at least one retained parent. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e3fde7689c	filter-repo: record suboptimality notes about changing merges to non-merges When the pruning of empty commits causes a culling of parents of a merge commit, so that the merge commit drops to just one parent, the commit likely becomes misleading since the commit is no longer a merge commit but the message probably implies it is. (e.g. "Merge branch maint into master"). There's nothing we can do to automatically fix this, but we can note it as a suboptimal issue in the filtering process. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9e02ac95e4	filter-repo: record metadata for remapping for refs and commits Our filtering process will rewrite (and drop) commits, causing refs to also get updated. A useful debugging aid for users is to write metadata showing the mapping from old commit IDs to new commit IDs, and from the hash that old refs pointed and the hash that the new ones do. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	04260a3aa4	filter-repo: parse `fast-export --reference-excluded-parents` output Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d13f7e9178	filter-repo: fix detection of merge becoming empty commit In the previous commit, we detected when an entire line of history back to a common ancestor of the merge became empty commits, and avoided having a commit be merged with itself. This commits looks through the changes specified in the commit, which are always specified relative to the first parent, so that if the first parent side was the empty one we can still detect if the merge commit adds no extra changes relative to its remaining parent. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	70505e00f9	filter-repo: avoid merging a commit with one of its own ancestors Pruning of empty commits can cause an entire line of history to become empty and be pruned, resulting in a merge commit that merges some commit with one of its ancestors. In such a case, we should remove the unnecessary parent(s) -- which can and will often result in the merge commit being empty so we can remove it as well. Currently, if the side that becomes empty is the first parent side, then we do not detect if the commit becomes empty, due to the way that fast-export lists changes in a merge commit relative to first parent only. A subsequent commit will address this. Note that the callbacks could theoretically insert additional commits or reparent our commit on top of something else, meaning that the ancestry graph might need post-callback updates. However, in any extreme case where that mattered, we would more or less need full updates to the ancestry graph to be made for all the new commits from the callback as well, and once we expect the callback to handle any ancestry graph updates it can handle modifying it for the current commit. However, it is hard to come up with a case where it matters, since for the most part we just want to know whether our filtering causes commits to become empty and knowing the source repo we are exporting from is sufficient information without knowing any new commits inserted or reparenting that happens elsewhere. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	fa515c8d10	filter-repo: protect against truncated fast-import input Use the 'feature done' ability to mark when the fast-import stream is finished, so that an aborted run (due to running into some kind of bug while filtering, whether a bug in the code, or an error in the repo or flags specified for the case under consideration) won't cause the repo to be rewritten. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e5a3a134b1	filter-repo: retain refs that happen to point at commits that become empty It may be that the only time a reference is shown in the fast-export stream is for a commit which will become empty due to the filtering. We do not want such refs to be left out and thus not be updated; we want them to instead be set to the nearest non-empty ancestor. Only if it has no non-empty ancestor would we want it to be stripped out. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	064e2c0ef4	filter-repo: add --quiet option Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e6731225f8	filter-repo: allow importing into an empty repo Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	0ecfad479e	filter-repo: add parsing of more types of fast-export commands Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a7531af120	filter-repo: filtering note with empty commits and --dry-run Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3914c1377b	filter-repo: add --stdin option Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	cf460406d7	filter-repo: code cleanup Slightly re-order the code to make input, output, and filtering sections distinct. Also, avoid running `git fast-import` at all when we're in --dry-run mode. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3f8ce81aa2	filter-repo: prune parents made redundant by filtering Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f103735e01	filter-repo: implement --debug Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9499c78b94	filter-repo: implement --dry-run Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	40f90c9cb8	filter-repo: move git_dir determination into function for future re-use Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4fed91af18	filter-repo: allow FastExportFilter to take file-like objects Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	636a3cf575	filter-repo: add basic path filtering Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a427a80322	filter-repo: skeleton of new tool Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e2b8b68d3a	filter-repo: use subprocess explicitly; make it easier to wrap debug version Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3457348c63	filter-repo: remove excessive hashes Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	badd03105b	filter-repo: fix really old typo Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2977725634	filter-repo: enable usage with --no-data Regex is kinda sloppy, someone should slap me and fix that. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a6b008eaac	filter-repo: don't default to creating bare repositories for importing into Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7e0b7a9fc2	filter-repo: only require two arguments to record_id_rename We always called record_id_rename with handle_transitivity set to True, and I do not know of a use case that would do otherwise so let's just hardcode that value. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a8b39e58b3	filter-repo: rename _TimeZone to FixedTimeZone for external usage Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e7311b6db9	filter-repo: reinstate the id_offset My idea to use --export-marks and --import-marks to avoid the need for the id_offset was not tested and apparently a bad idea. When splicing together multiple repositories, the second will croak if we pass it --import-marks with a file having sha1sums that don't exist in that repository. I'm afraid this might conflict with the --import-marks stuff used in collab so I've only enabled it for streams beyond the first. So there might be an issue using --import-marks on a second or later fast-export output stream, but I can't think of a use case for that... Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	fbd3a04b7f	filter-repo: some wording and line-wrap cleanups Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
James Foucar	f74761e803	filter-repo: lots of documentation additions Signed-off-by: Elijah Newren <enewren@sandia.gov> Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9a85c6a1ae	filter-repo: fix parsing of filechanges with skipped blobs Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d2d6d79db0	filter-repo: fix __all__ declaration Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	adc3d52d26	filter-repo: add parsing of progress and checkpoint fast-export objects Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	48aaedfc32	filter-repo: add parsing of (annotated) tag objects Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a644632a83	filter-repo: provide default args for get_commit_count() Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	86a86fc074	filter-repo: remove the id_offset Filtering input from multiple repositories can still be done; however, to avoid overloading of mark numbers, one should pass --export-marks=<file> to the first git fast-export and pass --import-marks=<file> to the second. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	bfbc07d3a7	filter-repo: encapsulate input line advancement Have all callers of input.readline() be done through _advance_nextline() Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	85b1980d17	filter-repo: avoid using mark ids referred to in an --import-marks file Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a20bf1957f	filter-repo: cleanups to gathering the commit count Two things: * rename get_total_commits -> get_commit_count * accept rev-list arguments Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	d099d2628b	filter-repo: automatically drop commits whose changes are filtered out Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	4998de6751	filter-repo: handle ahead-of-UTC timezones too Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	307a31fd54	filter-repo: have author_date and committer_date be datetime objects Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	35fdb05c3c	filter-repo: streamline common/simple cases to require fewer calls and args Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	cb29d84f48	filter-repo: fix skipping of blob files Make sure commits don't reference skipped blob files. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	aba66f6d42	filter-repo: duct tape and bailing wire... Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7437d62329	filter-repo: fix id renaming Splicing repositories and dropping commits require different id renaming. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	358e9826d4	filter-repo: better handling of passing --all to fast-export Make --all be a default argument for fast-export, not a mandatory argument. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	a594ea530a	filter-repo: ensure new files from spliced-in commits aren't dropped at merges git-fast-import requires that file changes listed in a merge commit be relative to the first parent. Thus, if I've added new files on a branch being merged in from the second or later parents, I need to manually modify the list of files in the merge commit as well. In order to do that, as soon as I splice in any commit, I have to record the list of new files for both that commit and every descendant it has. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	dd5665b7ec	filter-repo: handle adding interleaving commits from separate repositories Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	e4a4787393	filter-repo: make sure git-fast-import has really finished when we exit Also, provide an OutputStream class, to make it easy to still direct all output to some file rather than always sending to git fast-import. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2581e7a0e6	filter-repo: silence verbose fast-import output Turn off fast-import stat output but do not squelch all error messages. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	5faec262d3	filter-repo: make skipping and later dumping easier Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	7371f8e3e4	filter-repo: add counting of objects, as well as commits Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	3d10238a47	filter-repo: make it easier to skip blobs & commits Automatically do renaming of references to commits that were skipped, and automatically remove skipped blobs from the output of commits that reference them. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2c769de150	filter-repo: work around git-fast-export bug Explicitly specify --topo-order; git-fast-export fails on some topologies unless it traverses in topological order. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	bf5e92d02a	filter-repo: portability fixes Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	471e9d8684	filter-repo: rewrite to not use pyparsing in order to avoid memory madness pyparsing sucks a whole file into memory at a time and then parses, which is really bad in this case since the output from git-fast-export is huge. I entered disk swapping madness pretty easily. So, now I just do my own manual parsing. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	ae486e85b8	filter-repo: small restructurings for the big sierra import * Allow hooking up (and filtering) multiple git fast-export's to one import * Allow user callbacks to force dumping of object in order to reference it with subsequent inserted objects * Put the separate callbacks and global vars in the calling program into a combined class Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	69497ac6e6	filter-repo: add get_total_commits() function, finish transition to a module Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	28cc91054e	filter-repo: fix handling of ids of blobs and commits My prior handlings of marks would only work if there were not additions or removals from the fast-export stream. Further, I referred to these as marks even though I really only accept idnum values, not sha1s or anything else. So, now I refer to these as ids everywhere, and I am much more careful in my handling of ids. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	94f0ccfd80	filter-repo: call everything_callback as necessary, fix commit_callback The commit_callback call was trying to pass a Reset object, which was not defined. Copy-n-paste-n-forget-to-replace isn't good. Now it passes a Commit object. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	9cd296655a	filter-repo: rename functions a bit, make filter object creation explicit Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	207c6d0c16	filter-repo: pipe output to git-fast-import now to create a new repository Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	0d9568684c	filter-repo: match git-fast-export spacing after reset commands Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	003dd21714	filter-repo: add ability to handle deleted files in commits Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	c92a4e471e	filter-repo: fix parsing bug in Reset object creation Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	b029443a6f	filter-repo: fix indexing bug in Commit object creation Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	392d09d084	filter-repo: don't hardcode sys.stdout, I'll eventually want to pipe elsewhere Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	11057e874e	filter-repo: add a FileChanges object, for changes that are part of a commit Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	586d65270b	filter-repo: add parsing of commits Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f6f4e5fbbf	filter-repo: match fast-import grammar slightly better Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	ff95c771d8	filter-repo: prevent pyparsing from expanding tabs to spaces We are not parsing simple text; we're parsing data and need to be able to print that data unmunged. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	de7aeb64bc	filter-repo: add parsing of branch resets Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	f990dda9ad	filter-repo: allow random blob insertion and creation without specifying marks Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	163e299ed7	filter-repo: handle multiple blobs, require all input to be parsed, nice errors Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	eb4afc4e78	filter-repo: add GitElement and Blob classes, and a FastExport Parser class We still only parse a single blob, but this should put the infrastructure in place for parsing more output from git-fast-export. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago
Elijah Newren	2b34e5c25d	filter-repo: initial import This initial version can parse git-fast-export blobs in exact-data format, but not much else yet. Signed-off-by: Elijah Newren <newren@gmail.com>	5 years ago

... 3 4 5 6 7

320 Commits (main)