“ git 合并”在细节上是如何工作的?

我想知道“ git merge”背后的确切算法(或接近算法)。至少这些子问题的答案是有帮助的:

  • Git 如何检测特定的非冲突变更的上下文?
  • Git 如何发现在这些确切的行中存在冲突?
  • Git 会自动合并哪些东西?
  • 当没有合并分支的公共基础时,git 如何执行?
  • How does git perform when there are multiple common bases for merging branches?
  • 当我同时合并多个分支时会发生什么?
  • What is a difference between merge strategies?

但是对整个算法的描述会更好。

46610 次浏览

Git 如何检测特定的非冲突变更的上下文?
Git 如何发现在这些确切的行中存在冲突?

如果合并的两边都有相同的行更改,那么这是一个冲突; 如果它们没有更改,则接受来自一边(如果存在)的更改。

Which things does git auto-merge?

不冲突的更改(见上文)

当存在多个用于合并分支的公共基础时,git 如何执行?

根据 Git merge-base的定义,只有一个(最新的共同祖先)。

当我同时合并多个分支时会发生什么?

这取决于合并策略(只有 octopusours/theirs策略支持合并两个以上的分支)。

What is a difference between merge strategies?

这在 git merge手册中有解释。

You might be best off looking for a description of a 3-way merge algorithm. A high-level description would go something like this:

  1. 找到一个合适的合并基地 B-一个版本的文件,这是一个祖先的两个新版本(XY) ,通常是最新的这样的基地(虽然有些情况下,它将不得不回到更远,这是一个功能的 git的默认 recursive合并)
  2. B表演 X,用 B表演 Y
  3. 遍历在两个差异中标识的更改块。如果双方在同一地点引入相同的变化,接受其中任何一个; 如果一方引入变化,另一方单独留下该地区,在决赛中引入变化; 如果双方在同一地点引入变化,但它们不匹配,标记冲突需要手动解决。

完整的算法更详细地处理这个问题,甚至还有一些文档(https://github.com/git/git/blob/master/Documentation/technical/trivial-merge.txt是其中之一,还有 git help XXX页面,其中 XXX 是 merge-basemerge-filemergemerge-one-file之一,可能还有其他一些文档)。如果这还不够深入,还有源代码..。

我也很感兴趣,我不知道答案,但是..。

一个复杂的有效系统总是从一个简单的有效系统演化而来

我认为 git 的合并是非常复杂的,并且非常难以理解——但是一种方法是从它的前身着手,并且专注于您关注的核心。也就是说,给定两个没有共同祖先的文件,git merge 如何解决如何合并它们,以及冲突在哪里?

Let's try to find some precursors. From git help merge-file:

git merge-file is designed to be a minimal clone of RCS merge; that is,
it implements all of RCS merge's functionality which is needed by
git(1).

来自维基百科: http://en.wikipedia.org/wiki/Git_%28software%29-> http://en.wikipedia.org/wiki/Three-way_merge#Three-way_merge-> http://en.wikipedia.org/wiki/Diff3-> http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

That last link is a pdf of a paper describing the diff3 algorithm in detail. Here's a 谷歌 pdf-浏览器版本. It's only 12 pages long, and the algorithm is only a couple of pages - but a full-on mathematical treatment. That might seem a bit too formal, but if you want to understand git's merge, you'll need to understand the simpler version first. I haven't checked yet, but with a name like diff3, you'll probably also need to understand diff (which uses a longest common subsequence algorithm). However, there may be a more intuitive explanation of diff3 out there, if you have a google...


现在,我刚刚做了一个比较 diff3git merge-file的实验。它们采用相同的三个输入文件 旧版本版本2,并以相同的方式标记冲突,与 <<<<<<< version1=======>>>>>>> version2(diff3也有 ||||||| oldversion) ,显示了它们的共同遗产。

我对 老版本使用了一个空文件,对 版本1版本2使用了几乎相同的文件,只在 版本2中添加了一行。

结果: git merge-file将单个更改行识别为冲突,而 diff3将整个两个文件视为冲突。因此,即使对于这种最简单的情况来说,尽管 dev3很复杂,但 git 的合并甚至更复杂。

下面是实际的结果(我在文本中使用了@twalberg 的回答)。

$ git merge-file -p fun1.txt fun0.txt fun2.txt

You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:


Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
<<<<<<< fun1.txt
=======
THIS IS A BIT DIFFERENT
>>>>>>> fun2.txt


The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...

$ diff3 -m fun1.txt fun0.txt fun2.txt

<<<<<<< fun1.txt
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:


Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.


The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
||||||| fun0.txt
=======
You might be best off looking for a description of a 3-way merge algorithm. A
high-level description would go something like this:


Find a suitable merge base B - a version of the file that is an ancestor of
both of the new versions (X and Y), and usually the most recent such base
(although there are cases where it will have to go back further, which is one
of the features of gits default recursive merge) Perform diffs of X with B and
Y with B.  Walk through the change blocks identified in the two diffs. If both
sides introduce the same change in the same spot, accept either one; if one
introduces a change and the other leaves that region alone, introduce the
change in the final; if both introduce changes in a spot, but they don't match,
mark a conflict to be resolved manually.
THIS IS A BIT DIFFERENT


The full algorithm deals with this in a lot more detail, and even has some
documentation (/usr/share/doc/git-doc/technical/trivial-merge.txt for one,
along with the git help XXX pages, where XXX is one of merge-base, merge-file,
merge, merge-one-file and possibly a few others). If that's not deep enough,
there's always source code...
>>>>>>> fun2.txt

如果你真的对这个感兴趣,这有点像兔子洞。在我看来,它就像正则表达式、 diff 的 最长公共子序列最长公共子序列算法、上下文无关语法或关系代数一样深奥。如果你想查个水落石出,我认为你可以,但这需要一些坚定的研究。

这是最初的实现

Http://git.kaarsemaker.net/git/blob/857f26d2f41e16170e48076758d974820af685ff/git-merge-recursive.py

基本上,您可以为两个提交创建一个共同祖先的列表,然后递归地合并它们,或者快速转发它们,或者创建虚拟提交,这些虚拟提交用于三方合并文件的基础。

当存在多个用于合并分支的公共基础时,git 如何执行?

这篇文章非常有帮助: http://codicesoftware.blogspot.com/2011/09/merge-recursive-strategy.html(这里是 第二部分)。

递归使用 def3递归地生成一个将被用作祖先的虚拟分支。

例如:

(A)----(B)----(C)-----(F)
|      |       |
|      |   +---+
|      |   |
|      +-------+
|          |   |
|      +---+   |
|      |       |
+-----(D)-----(E)

Then:

git checkout E
git merge F

有2个最好的共同祖先(共同祖先不是任何其他的祖先) ,CD。Git 将它们合并到一个新的虚拟分支 V中,然后使用 V作为基础。

(A)----(B)----(C)--------(F)
|      |          |
|      |      +---+
|      |      |
|      +----------+
|      |      |   |
|      +--(V) |   |
|          |  |   |
|      +---+  |   |
|      |      |   |
|      +------+   |
|      |          |
+-----(D)--------(E)

我想,如果有更多最好的共同祖先,Git 就会继续,将 V与下一个合并。

本文指出,如果在生成虚拟分支时发生了合并冲突,Git 只是将冲突标记留在它们所在的位置并继续执行。

当我同时合并多个分支时会发生什么?

正如@Nevik Rehnel 解释的那样,这取决于策略,man git-merge MERGE STRATEGIES部分对此做了很好的解释。

只有 octopusours/theirs支持同时合并多个分支,例如 recursive就不支持。

如果存在冲突,则 octopus拒绝合并,而 ours是一个微不足道的合并,因此不存在冲突。

这些命令生成一个新的提交将有超过2个父级。

我在 Git1.8.5上做了一个没有冲突的 merge -X octopus,看看它是如何运行的。

Initial state:

   +--B
|
A--+--C
|
+--D

行动:

git checkout B
git merge -Xoctopus C D

新州:

   +--B--+
|     |
A--+--C--+--E
|     |
+--D--+

不出所料,E有3个父母。

待办事项: 章鱼如何确切地操作在一个单一的文件修改。递归二由二三方合并?

当没有合并分支的公共基础时,git 如何执行?

@ Torek 提到自2.9以来,没有 --allow-unrelated-histories合并就会失败。

我在 Git 1.8.5上进行了实验:

git init
printf 'a\nc\n' > a
git add .
git commit -m a


git checkout --orphan b
printf 'a\nb\nc\n' > a
git add .
git commit -m b
git merge master

a包括:

a
<<<<<<< ours
b
=======
>>>>>>> theirs
c

然后:

git checkout --conflict=diff3 -- .

a包括:

<<<<<<< ours
a
b
c
||||||| base
=======
a
c
>>>>>>> theirs

Interpretation:

  • 基地是空的
  • 当基数为空时,不可能解析对单个文件的任何修改; 只能解析诸如添加新文件之类的事情。上述冲突将解决与基地 a\nc\n作为一个单一的行添加3路合并
  • I 好好想想 that a 3-way merge without a base file is called a 2-way merge, which is just a diff