回溯性地将 Git 文件夹转换为子模块?

通常情况下,您正在编写某种类型的项目,过了一段时间之后,项目的某个组件实际上作为独立组件(也许是库)是有用的,这一点就变得很清楚了。如果您很早就有这样的想法,那么很有可能大部分代码都在它自己的文件夹中。

是否有办法将 Git 项目中的一个子目录转换为子模块?

理想情况下,这种情况会发生,从父项目中删除该目录中的所有代码,然后将子模块项目添加到其位置,并且具有所有适当的历史记录,这样,所有父项目提交都指向正确的子模块提交。

31080 次浏览

可以做到,但不简单。如果你搜索 git filter-branchsubdirectorysubmodule,会有一些关于这个过程的文章。它实际上需要创建项目的两个克隆,使用 git filter-branch删除除一个子目录之外的所有内容,并且只删除另一个子目录中的子目录。然后,可以将第二个存储库建立为第一个存储库的子模块。

要将一个子目录隔离到它自己的存储库中,可以对原始存储库的一个克隆使用 filter-branch:

git clone <your_project> <your_submodule>
cd <your_submodule>
git filter-branch --subdirectory-filter 'path/to/your/submodule' --prune-empty -- --all

然后只需删除原始目录并将子模块添加到父项目中即可。

首先将目录更改为子模块文件夹,然后:

git init
git remote add origin <repourl>
git add .
git commit -am 'first commit in submodule'
git push -u origin master
cd ..
rm -rf <folder> # the folder which will be a submodule
git commit -am 'deleting folder'
git submodule add <repourl> <folder> # add the submodule
git commit -am 'adding submodule'

我知道这是一个老线程,但这里的答案压制了其他分支中的任何相关提交。

克隆并保留所有这些额外分支和提交的简单方法:

1-确保你有这个 Git 化名

git config --global alias.clone-branches '! git branch -a | sed -n "/\/HEAD /d; /\/master$/d; /remotes/p;" | xargs -L1 git checkout -t'

克隆远程,拉所有的分支,改变远程,过滤你的目录,推

git clone git@github.com:user/existing-repo.git new-repo
cd new-repo
git clone-branches
git remote rm origin
git remote add origin git@github.com:user/new-repo.git
git remote -v
git filter-branch --subdirectory-filter my_directory/ -- --all
git push --all
git push --tags

维持现状

假设我们有一个名为 repo-old的存储库,其中包含一个子 目录 sub,我们希望将其转换为具有自己的回购 repo-sub的子 模组

进一步的意图是,原来的回购 repo-old应该转换成一个修改的回购 repo-new,其中所有提交触及以前存在的子目录 sub现在应该指向我们提取的子模块回购 repo-sub的相应提交。

我们换衣服吧

git filter-branch的帮助下,有可能通过两个步骤实现这一目标:

  1. repo-oldrepo-sub的子目录提取(已经在接受的 回答中提到)
  2. repo-oldrepo-new的子目录替换(使用适当的提交映射)

备注 : 我知道这个问题已经很老了,而且已经有人提到过 git filter-branch是一种不受欢迎的,可能是危险的。但是另一方面,它可以帮助其他人使用转换后容易验证的个人存储库。那就做 警告!请让我知道,如果有任何其他工具,做同样的事情没有被否定,是安全使用!

我将在下面解释如何在 linux 上实现 git 2.26.2版本的这两个步骤。旧版本可能在一定程度上有效,但这需要进行测试。

为了简单起见,我将限制自己的情况下,只有一个 master分支和一个 origin远程在原来的回购 repo-old。还要注意的是,我依赖于前缀为 temp_的临时 git 标记,这些标记将在过程中被删除。因此,如果已经有类似命名的标记,您可能需要调整下面的前缀。最后,请注意,我还没有广泛的测试这个,可能有一些角落的情况下,配方失败。所以请 在继续之前备份所有内容

下面的 bash 代码片段可以连接到一个大脚本,然后应该在回购 repo-org所在的同一文件夹中执行。不推荐将所有内容直接复制粘贴到命令窗口中(即使我已经成功地测试了这一点) !

0. 准备工作

变量

# Root directory where repo-org lives
# and a temporary location for git filter-branch
root="$PWD"
temp='/dev/shm/tmp'


# The old repository and the subdirectory we'd like to extract
repo_old="$root/repo-old"
repo_old_directory='sub'


# The new submodule repository, its url
# and a hash map folder which will be populated
# and later used in the filter script below
repo_sub="$root/repo-sub"
repo_sub_url='https://github.com/somewhere/repo-sub.git'
repo_sub_hashmap="$root/repo-sub.map"


# The new modified repository, its url
# and a filter script which is created as heredoc below
repo_new="$root/repo-new"
repo_new_url='https://github.com/somewhere/repo-new.git'
repo_new_filter="$root/repo-new.sh"

过滤脚本

# The index filter script which converts our subdirectory into a submodule
cat << EOF > "$repo_new_filter"
#!/bin/bash


# Submodule hash map function
sub ()
{
local old_commit=\$(git rev-list -1 \$1 -- '$repo_old_directory')


if [ ! -z "\$old_commit" ]
then
echo \$(cat "$repo_sub_hashmap/\$old_commit")
fi
}


# Submodule config
SUB_COMMIT=\$(sub \$GIT_COMMIT)
SUB_DIR='$repo_old_directory'
SUB_URL='$repo_sub_url'


# Submodule replacement
if [ ! -z "\$SUB_COMMIT" ]
then
touch '.gitmodules'
git config --file='.gitmodules' "submodule.\$SUB_DIR.path" "\$SUB_DIR"
git config --file='.gitmodules' "submodule.\$SUB_DIR.url" "\$SUB_URL"
git config --file='.gitmodules' "submodule.\$SUB_DIR.branch" 'master'
git add '.gitmodules'


git rm --cached -qrf "\$SUB_DIR"
git update-index --add --cacheinfo 160000 \$SUB_COMMIT "\$SUB_DIR"
fi
EOF
chmod +x "$repo_new_filter"

1. 子目录提取

cd "$root"


# Create a new clone for our new submodule repo
git clone "$repo_old" "$repo_sub"


# Enter the new submodule repo
cd "$repo_sub"


# Remove the old origin remote
git remote remove origin


# Loop over all commits and create temporary tags
for commit in $(git rev-list --all)
do
git tag "temp_$commit" $commit
done


# Extract the subdirectory and slice commits
mkdir -p "$temp"
git filter-branch --subdirectory-filter "$repo_old_directory" \
--tag-name-filter 'cat' \
--prune-empty --force -d "$temp" -- --all


# Populate hash map folder from our previously created tag names
mkdir -p "$repo_sub_hashmap"
for tag in $(git tag | grep "^temp_")
do
old_commit=${tag#'temp_'}
sub_commit=$(git rev-list -1 $tag)


echo $sub_commit > "$repo_sub_hashmap/$old_commit"
done
git tag | grep "^temp_" | xargs -d '\n' git tag -d 2>&1 > /dev/null


# Add the new url for this repository (and e.g. push)
git remote add origin "$repo_sub_url"
# git push -u origin master

2. 子目录替换

cd "$root"


# Create a clone for our modified repo
git clone "$repo_old" "$repo_new"


# Enter the new modified repo
cd "$repo_new"


# Remove the old origin remote
git remote remove origin


# Replace the subdirectory and map all sliced submodule commits using
# the filter script from above
mkdir -p "$temp"
git filter-branch --index-filter "$repo_new_filter" \
--tag-name-filter 'cat' --force -d "$temp" -- --all


# Add the new url for this repository (and e.g. push)
git remote add origin "$repo_new_url"
# git push -u origin master


# Cleanup (commented for safety reasons)
# rm -rf "$repo_sub_hashmap"
# rm -f "$repo_new_filter"

备注: 如果新创建的回购 repo-newgit submodule update --init期间挂起,那么尝试一次递归地重新克隆存储库:

cd "$root"


# Clone the new modified repo recursively
git clone --recursive "$repo_new" "$repo_new-tmp"


# Now use the newly cloned one
mv "$repo_new" "$repo_new-bak"
mv "$repo_new-tmp" "$repo_new"


# Cleanup (commented for safety reasons)
# rm -rf "$repo_new-bak"

这样就可以就地进行转换,您可以像对待任何过滤器分支一样将其退出(我使用的是 git fetch . +refs/original/*:*)。

我有一个项目与 utils库已经开始在其他项目中有用,并希望将其历史分割成一个子模块。没有想到要先查看 SO,所以我自己写了一个,它在本地构建历史,所以会快一点,之后,如果你想要,你可以设置 helper 命令的 .gitmodules文件之类的,然后把子模块的历史推到任何你想要的地方。

剥离命令本身在这里,文档在注释中,在后面的未剥离命令中。在设置 subdir的情况下,将它作为自己的命令运行,如果要分割 utils目录,则使用类似于 subdir=utils git split-submodule的命令。因为它是一次性的,所以很古怪,但是我在 Git 历史记录中的 Document 子目录中对它进行了测试。

#!/bin/bash
# put this or the commented version below in e.g. ~/bin/git-split-submodule
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}
${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)
[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))
git rm -rq --cached --ignore-unmatch  "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}

#!/bin/bash
# Git filter-branch to split a subdirectory into a submodule history.


# In each commit, the subdirectory tree is replaced in the index with an
# appropriate submodule commit.
# * If the subdirectory tree has changed from any parent, or there are
#   no parents, a new submodule commit is made for the subdirectory (with
#   the current commit's message, which should presumably say something
#   about the change). The new submodule commit's parents are the
#   submodule commits in any rewrites of the current commit's parents.
# * Otherwise, the submodule commit is copied from a parent.


# Since the new history includes references to the new submodule
# history, the new submodule history isn't dangling, it's incorporated.
# Branches for any part of it can be made casually and pushed into any
# other repo as desired, so hooking up the `git submodule` helper
# command's conveniences is easy, e.g.
#     subdir=utils git split-submodule master
#     git branch utils $(git rev-parse master:utils)
#     git clone -sb utils . ../utilsrepo
# and you can then submodule add from there in other repos, but really,
# for small utility libraries and such, just fetching the submodule
# histories into your own repo is easiest. Setup on cloning a
# project using "incorporated" submodules like this is:
#   setup:  utils/.git
#
#   utils/.git:
#       @if _=`git rev-parse -q --verify utils`; then \
#           git config submodule.utils.active true \
#           && git config submodule.utils.url "`pwd -P`" \
#           && git clone -s . utils -nb utils \
#           && git submodule absorbgitdirs utils \
#           && git -C utils checkout $$(git rev-parse :utils); \
#       fi
# with `git config -f .gitmodules submodule.utils.path utils` and
# `git config -f .gitmodules submodule.utils.url ./`; cloners don't
# have to do anything but `make setup`, and `setup` should be a prereq
# on most things anyway.


# You can test that a commit and its rewrite put the same tree in the
# same place with this function:
# testit ()
# {
#     tree=($(git rev-parse `git rev-parse $1`: refs/original/refs/heads/$1));
#     echo $tree `test $tree != ${tree[1]} && echo ${tree[1]}`
# }
# so e.g. `testit make~95^2:t` will print the `t` tree there and if
# the `t` tree at ~95^2 from the original differs it'll print that too.


# To run it, say `subdir=path/to/it git split-submodule` with whatever
# filter-branch args you want.


# $GIT_COMMIT is set if we're already in filter-branch, if not, get there:
${GIT_COMMIT-exec git filter-branch --index-filter "subdir=$subdir; ${debug+debug=$debug;} $(sed 1,/SNIP/d "$0")" "$@"}


${debug+set -x}
fam=(`git rev-list --no-walk --parents $GIT_COMMIT`)
pathcheck=(`printf "%s:$subdir\\n" ${fam[@]} \
| git cat-file --batch-check='%(objectname)' | uniq`)


[[ $pathcheck = *:* ]] || {
subfam=($( set -- ${fam[@]}; shift;
for par; do tpar=`map $par`; [[ $tpar != $par ]] &&
git rev-parse -q --verify $tpar:"$subdir"
done
))


git rm -rq --cached --ignore-unmatch  "$subdir"
if (( ${#pathcheck[@]} == 1 && ${#fam[@]} > 1 && ${#subfam[@]} > 0)); then
# one id same for all entries, copy mapped mom's submod commit
git update-index --add --cacheinfo 160000,$subfam,"$subdir"
else
# no mapped parents or something changed somewhere, make new
# submod commit for current subdir content.  The new submod
# commit has all mapped parents' submodule commits as parents:
subnew=`git cat-file -p $GIT_COMMIT | sed 1,/^$/d \
| git commit-tree $GIT_COMMIT:"$subdir" $(
${subfam:+printf ' -p %s' ${subfam[@]}}) 2>&-
` &&
git update-index --add --cacheinfo 160000,$subnew,"$subdir"
fi
}
${debug+set +x}

使用 filter-branch的@knitl 给出的当前答案让我们非常接近预期效果,但尝试之后,Git 对我发出了警告:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites.  Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead.  See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.

现在,在这个问题首次提出并得到回答9年之后,filter-branch被推翻,支持 git filter-repo。实际上,当我使用 git log --all --oneline --graph查看我的 git 历史时,它充满了不相关的提交。

那么如何使用 git filter-repo呢?Github 有一篇很好的文章概述了 给你。(注意,您需要独立于 git 安装它。我在 pip3 install git-filter-repo中使用了 python 版本)

如果他们决定移动/删除这篇文章,我将总结和概括他们的程序如下:

git clone <your_old_project_remote> <your_submodule>
cd <your_submodule>
git filter-repo --path path/to/your/submodule
git remote set-url origin <your_new_submodule_remote>
git push -u origin <branch_name>

从这里开始,您只需要将新存储库注册为您希望它存在的子模块:

cd <path/to/your/parent/module>
git submodule add <your_new_submodule_remote>
git submodule update
git commit

官方的 git 项目现在推荐使用 Git-filter-repo

# install git-filter-repo, see [1] for install via pip, or other OS's.
sudo apt-get install git-filter-repo


# copy your repo; everything EXCEPT the subdir will be deleted, and the subdir will become root.
# --no-local is required to prevent git from hard linking to files in the original, and is checked by `filter-branch`
git clone working-dir/.git working-dir-copy --no-local
cd working-dir-copy


# extract the desired subdirectory and its history.
git filter-repo --subdirectory-filter foodir


# foodir is now its own directory. Push it to github/gitlab etc
git remote add origin user@hosting/project.git
git push -u origin --all
git push -u origin --tags

也要感谢 这个要点

编辑: 对于 LFS 用户(穷人)来说,git 克隆不会提取图像的整个 LFS 历史,这会导致 git 推送失败。

// Original branch needs to get history of all images
git lfs fetch --all


// clone needs to copy the history
git lfs install --skip-smudge
git lfs pull working-dir --all

Https://github.com/newren/git-filter-repo/blob/main/install.md