在子目录中合并 git 存储库

我想在我的工作 git 存储库中合并一个远程 git 存储库作为它的子目录。我希望生成的存储库包含两个存储库的合并历史,并且合并后的存储库的每个文件保留其在远程存储库中的历史。我尝试使用 如何使用子树合并策略中提到的子树策略,但是在遵循了这个过程之后,尽管生成的存储库确实包含了两个存储库的合并历史,但是来自远程存储库的单个文件没有保留它们的历史(其中任何文件上的‘ git log’只显示了一条消息“合并分支... ...”)。

另外,我不想使用子模块,因为我不希望这两个组合的 git 存储库再次分开。

是否有可能将远程 git 存储库合并到另一个存储库中作为子目录,并将来自远程存储库的单个文件保留其历史记录?

非常感谢你的帮助。

编辑: 我目前正在尝试使用 git 过滤器分支来重写合并的存储库历史的解决方案。看起来确实有效,但我需要再试一下。我会回来报告我的发现。

编辑2: 为了让自己更加清楚,我给出了使用 git 的子树策略时使用的确切命令,这导致远程存储库文件的历史记录明显丢失。 设 A 是我目前正在处理的 git 回购,B 是我想合并到 A 中作为其子目录的 git 回购。它做到了以下几点:

git remote add -f B <url-of-B>
git merge -s ours --no-commit B/master
git read-tree --prefix=subdir/Iwant/to/put/B/in/ -u B/master
git commit -m "Merge B as subdirectory in subdir/Iwant/to/put/B/in."

在这些命令之后,进入 subdir/Iwant/to/put/B/in 目录,我看到了 B 的所有文件,但是其中任何一个文件上的 git log只显示提交消息“ Merge B as subdirectory in subdir/Iwant/to/put/B/in。”他们的文件历史,因为它是在 B 丢失。

看起来的工作原理(因为我是 git 的初学者,所以我可能是错的)如下:

git remote add -f B <url-of-B>
git checkout -b B_branch B/master  # make a local branch following B's master
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&subdir/Iwant/to/put/B/in/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
git checkout master
git merge B_branch

上面的 filter-Branch 命令取自 git help filter-branch,在 git help filter-branch中,我只更改了 subdir 路径。

34351 次浏览

Have you tried adding the extra repository as a git submodule? It won't merge the history with the containing repository, in fact, it will be an independent repository.

I mention it, because you haven't.

If you are really wanting to stitch things together, look up grafting. You should also be using git rebase --preserve-merges --onto. There is also an option to keep the author date for the committer information.

After getting the fuller explanation of what is going on, I think I understand it and in any case at the bottom I have a workaround. Specifically, I believe what is happening is rename detection is being fooled by the subtree merge with --prefix. Here is my test case:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA
cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB
cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
git read-tree --prefix=bdir -u B/master
git commit -m "subtree merge B into bdir"
cd bdir
echo BBB>>B
git commit -a -m BBB

We make git directories a and b with several commits each. We do a subtree merge, and then we do a final commit in the new subtree.

Running gitk (in z/a) shows that the history does appear, we can see it. Running git log shows that the history does appear. However, looking at a specific file has a problem: git log bdir/B

Well, there is a trick we can play. We can look at the pre-rename history of a specific file using --follow. git log --follow -- B. This is good but isn't great since it fails to link the history of the pre-merge with the post-merge.

I tried playing with -M and -C, but I wasn't able to get it to follow one specific file.

So, the solution, I feel, is to tell git about the rename that will be taking place as part of the subtree merge. Unfortunately git-read-tree is pretty fussy about subtree merges so we have to work through a temporary directory, but that can go away before we commit. Afterwards, we can see the full history.

First, create an "A" repository and make some commits:

mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA

Second, create a "B" repository and make some commits:

cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB

And the trick to making this work: force Git to recognize the rename by creating a subdirectory and moving the contents into it.

mkdir bdir
git mv B bdir
git commit -a -m bdir-rename

Return to repository "A" and fetch and merge the contents of "B":

cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
# According to Alex Brown and pjvandehaar, newer versions of git need --allow-unrelated-histories
# git merge -s ours --allow-unrelated-histories --no-commit B/master
git read-tree --prefix= -u B/master
git commit -m "subtree merge B into bdir"

To show that they're now merged:

cd bdir
echo BBB>>B
git commit -a -m BBB

To prove the full history is preserved in a connected chain:

git log --follow B

We get the history after doing this, but the problem is that if you are actually keeping the old "b" repo around and occasionally merging from it (say it is actually a third party separately maintained repo) you are in trouble since that third party will not have done the rename. You must try to merge new changes into your version of b with the rename and I fear that will not go smoothly. But if b is going away, you win.

I found the following solution workable for me. First I go into project B, create a new branch in which already all files will be moved to the new sub directory. I then push this new branch to origin. Next I go to project A, add and fetch the remote of B, then I checkout the moved branch, I go back into master and merge:

# in local copy of project B
git checkout -b prepare_move
mkdir subdir
git mv <files_to_move> subdir/
git commit -m 'move files to subdir'
git push origin prepare_move


# in local copy of project A
git remote add -f B_origin <remote-url>
git checkout -b from_B B_origin/prepare_move
git checkout master
git merge from_B

If I go to sub directory subdir, I can use git log --follow and still have the history.

I'm not a git expert, so I cannot comment whether this is a particularly good solution or if it has caveats, but so far it seems all fine.

git-subtree is a script designed for exactly this use case of merging multiple repositories into one while preserving history (and/or splitting history of subtrees, though that is seems to be irrelevant to this question). It is distributed as part of the git tree since release 1.7.11.

To merge a repository <repo> at revision <rev> as subdirectory <prefix>, use git subtree add as follows:

git subtree add -P <prefix> <repo> <rev>

git-subtree implements the subtree merge strategy in a more user friendly manner.

The downside is that in the merged history the files are unprefixed (not in a subdirectory). Say you merge repository a into b. As a result git log a/f1 will show you all the changes (if any) except those in the merged history. You can do:

git log --follow -- f1

but that won't show the changes other then in the merged history.

In other words, if you don't change a's files in repository b, then you need to specify --follow and an unprefixed path. If you change them in both repositories, then you have 2 commands, none of which shows all the changes.

More on it here.

I wanted to

  1. keep a linear history without explicit merge, and
  2. make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make git log -- file work without --follow.

Step 1: Rewrite history in the source repository to make it look like all files always existed below the subdirectory.

Create a temporary branch for the rewritten history.

git checkout -b tmp_subdir

Then use git filter-branch as described in How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory?:

git filter-branch --prune-empty --tree-filter '
if [ ! -e foo/bar ]; then
mkdir -p foo/bar
git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files foo/bar
fi'

Step 2: Switch to the target repository. Add the source repository as remote in the target repository and fetch its contents.

git remote add sourcerepo .../path/to/sourcerepo
git fetch sourcerepo

Step 3: Use merge --onto to add the commits of the rewritten source repository on top of the target repository.

git rebase --preserve-merges --onto master --root sourcerepo/tmp_subdir

You can check the log to see that this really got you what you wanted.

git log --stat

Step 4: After the rebase you’re in “detached HEAD” state. You can fast-forward master to the new head.

git checkout -b tmp_merged
git checkout master
git merge tmp_merged
git branch -d tmp_merged

Step 5: Finally some cleanup: Remove the temporary remote.

git remote rm sourcerepo

Say you want to merge repository a into b (I'm assuming they're located alongside one another):

cd a
git filter-repo --to-subdirectory-filter a
cd ..
cd b
git remote add a ../a
git fetch a
git merge --allow-unrelated-histories a/master
git remote remove a

For this you need git-filter-repo installed (filter-branch is discouraged).

An example of merging 2 big repositories, putting one of them into a subdirectory: https://gist.github.com/x-yuri/9890ab1079cf4357d6f269d073fd9731

More on it here.

Similar to hfs' answer I wanted to

  • keep a linear history without explicit merge and
  • make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make git log -- file work without --follow.

However, I chose the more modern filter-repo (assuming the new repo exists and is checked out):

git clone git@host/repo/old.git
cd old
git checkout -b tmp_subdir
git filter-repo --to-subdirectory-filter old


cd ../new
git remote add old ../old
git fetch old
git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-date-is-author-date

you might need to fix conflicts (manually) or change the rebase command to include --merge -s recursive -X theirs if you want to try solving it with theirs version:

git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-
date-is-author-date --merge -s recursive -X theirs

you end up on a detached HEAD, so create a new branch and merge it to main note that modern repositories should not use a "master" branch but a "main"

branch for a more inclusive language.
git checkout -b old_merge
git checkout main
git merge old_merge

cleanup

git branch -d old_merge
git remote rm old