将大型 Git 存储库拆分为许多较小的存储库

在成功地将一个 SVN 存储库转换为 Git 之后,我现在有了一个非常大的 Git 存储库,我希望将其分解为多个较小的存储库并维护历史记录。

那么,有没有人可以帮忙拆分一个可能看起来像这样的回购协议:

MyHugeRepo/
.git/
DIR_A/
DIR_B/
DIR_1/
DIR_2/

进入两个类似这样的存储库:

MyABRepo/
.git
DIR_A/
DIR_B/


My12Repo/
.git
DIR_1/
DIR_2/

在上一个问题中,我已经尝试过遵循指导,但是当试图将多个目录放入一个单独的回购(将子目录分离(移动)到单独的 Git 存储库中)中时,它并不真正适合。

24395 次浏览

This will setup MyABRepo; you can do My12Repo similarly of course.

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-branch --prune-empty --index-filter 'git rm --cached --ignore-unmatch DIR_1/* DIR_2/*' HEAD

A reference to .git/refs/original/refs/heads/master remains. You can remove that up with:

cd ..
git clone MyABRepo.tmp MyABRepo

If all went well you can then remove MyABRepo.tmp.


If for some reason you get an error regarding .git-rewrite, you can try this:

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-branch -d /tmp/git-rewrite.tmp --prune-empty --index-filter 'git rm --cached --ignore-unmatch DIR_1/* DIR_2/*' HEAD
cd ..
git clone MyABRepo.tmp MyABRepo

This will create and use /tmp/git-rewrite.tmp as a temporary directory, instead of .git-rewrite. Naturally, you can substitute any path you wish instead of /tmp/git-rewrite.tmp, so long as you have write permission, and the directory does not already exist.

You could use git filter-branch --index-filter with git rm --cached to delete the unwanted directories from clones/copies of your original repository.

For example:

trim_repo() { : trim_repo src dst dir-to-trim-out...
: uses printf %q: needs bash, zsh, or maybe ksh
git clone "$1" "$2" &&
(
cd "$2" &&
shift 2 &&


: mirror original branches &&
git checkout HEAD~0 2>/dev/null &&
d=$(printf ' %q' "$@") &&
git for-each-ref --shell --format='
o=%(refname:short) b=${o#origin/} &&
if test -n "$b" && test "$b" != HEAD; then
git branch --force --no-track "$b" "$o"
fi
' refs/remotes/origin/ | sh -e &&
git checkout - &&
git remote rm origin &&


: do the filtering &&
git filter-branch \
--index-filter 'git rm --ignore-unmatch --cached -r -- '"$d" \
--tag-name-filter cat \
--prune-empty \
-- --all
)
}
trim_repo MyHugeRepo MyABRepo DIR_1 DIR_2
trim_repo MyHugeRepo My12Repo DIR_A DIR_B

You will need to manually delete each repository’s unneeded branches or tags (e.g. if you had a feature-x-for-AB branch, then you probably want to delete that from the “12” repository).

Thanks for your answers but I ended up just copying the repository twice then deleting the files I didn't want from each. I am going to use the filter-branch at a later date to strip out all the commits for the deleted files since they are already version controlled elsewhere.

cp -R MyHugeRepo MyABRepo
cp -R MyHugeRepo My12Repo


cd MyABRepo/
rm -Rf DIR_1/ DIR_2/
git add -A
git commit -a

This worked for what I needed.

EDIT: Of course, the same thing was done in the My12Repo against the A and B directory. This gave me two repos with identical history up to the point I deleted the unwanted directories.

Here is a ruby script that will do it. https://gist.github.com/4341033

The git_split project is a simple script that does exactly what you are looking for. https://github.com/vangorra/git_split

Turn git directories into their very own repositories in their own location. No subtree funny business. This script will take an existing directory in your git repository and turn that directory into an independent repository of its own. Along the way, it will copy over the entire change history for the directory you provided.

./git_split.sh <src_repo> <src_branch> <relative_dir_path> <dest_repo>
src_repo  - The source repo to pull from.
src_branch - The branch of the source repo to pull from. (usually master)
relative_dir_path   - Relative path of the directory in the source repo to split.
dest_repo - The repo to push to.

Although at the time of the question utunbu's answer was best you could get, these days even git itself recommends https://github.com/newren/git-filter-repo

It is orders of magnitude faster and comparatively very easy to use

For example here you would do

git clone MyHugeRepo/ MyABRepo.tmp/
cd MyABRepo.tmp
git filter-repo --path DIR_A/ --path DIR_B/

You can see more examples at https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES