用黄金代替 ld 有经验吗?

有人试过用 gold代替 ld吗?

gold 我保证ld快得多,因此它可能有助于加快大型 C + + 应用程序的测试周期,但是它可以用作 ld 的插入式替代品吗?

gcc/g++可以直接调用 gold吗?

是否有任何已知的错误或问题?

虽然 gold是 GNU binutils 的一部分,但是我在 Web 中几乎没有发现任何“成功故事”,甚至也没有发现任何“ Howtos”。

(更新: 添加链接到黄金和博客条目解释它)

56955 次浏览

You could link ld to gold (in a local binary directory if you have ld installed to avoid overwriting):

ln -s `which gold` ~/bin/ld

or

ln -s `which gold` /usr/local/bin/ld

At the moment it is compiling bigger projects on Ubuntu 10.04. Here you can install and integrate it easily with the binutils-gold package (if you remove that package, you get your old ld). Gcc will automatically use gold then.

Some experiences:

  • gold doesn't search in /usr/local/lib
  • gold doesn't assume libs like pthread or rt, had to add them by hand
  • it is faster and needs less memory (the later is important on big C++ projects with a lot of boost etc.)

What does not work: It cannot compile kernel stuff and therefore no kernel modules. Ubuntu does this automatically via DKMS if it updates proprietary drivers like fglrx. This fails with ld-gold (you have to remove gold, restart DKMS, reinstall ld-gold.

As it took me a little while to find out how to selectively use gold (i.e. not system-wide using a symlink), I'll post the solution here. It's based on http://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Linking_using_gold .

  1. Make a directory where you can put a gold glue script. I am using ~/bin/gold/.
  2. Put the following glue script there and name it ~/bin/gold/ld:

    #!/bin/bash
    gold "$@"
    

    Obviously, make it executable, chmod a+x ~/bin/gold/ld.

  3. Change your calls to gcc to gcc -B$HOME/bin/gold which makes gcc look in the given directory for helper programs like ld and thus uses the glue script instead of the system-default ld.

As a Samba developer, I have been using the gold linker almost exclusively on Ubuntu, Debian, and Fedora since several years now. My assessment:

  • gold is many times (felt: 5-10 times) faster than the classical linker.
  • Initially, there were a few problems, but they have gone since roughly around Ubuntu 12.04.
  • The gold linker even found some dependency problems in our code, since it seems to be more correct than the classical one with respect to some details. See, e.g. this Samba commit.

I have not used gold selectively, but have been using symlinks or the alternatives mechanism if the distribution provides it.

Some projects seem to be incompatible with gold, because of some incompatible differences between ld and gold. Example: OpenFOAM, see http://www.openfoam.org/mantisbt/view.php?id=685 .

DragonFlyBSD switched over to gold as their default linker. So it seems to be ready for a variety of tools.
More details: http://phoronix.com/scan.php?page=news_item&px=DragonFlyBSD-Gold-Linker

Can gcc/g++ directly call gold.?

Just to complement the answers: there is a gcc's option -fuse-ld=gold (see gcc doc). Though, AFAIK, it is possible to configure gcc during the build in a way that the option will not have any effect.

Minimal synthetic benchmark: LD vs gold vs LLVM LLD

Outcome:

  • gold was about 3x to 4x faster for all values I've tried when using -Wl,--threads -Wl,--thread-count=$(nproc) to enable multithreading
  • LLD was about 2x faster than gold!

Tested on:

  • Ubuntu 20.04, GCC 9.3.0, binutils 2.34, sudo apt install lld LLD 10
  • Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

Simplified description of the benchmark parameters:

  • 1: number of object files providing symbols
  • 2: number of symbols per symbol provider object file
  • 3: number of object files using all provided symbols symbols

Results for different benchmark parameters:

10000 10 10
nogold:  wall=4.35s user=3.45s system=0.88s 876820kB
gold:    wall=1.35s user=1.72s system=0.46s 739760kB
lld:     wall=0.73s user=1.20s system=0.24s 625208kB


1000 100 10
nogold:  wall=5.08s user=4.17s system=0.89s 924040kB
gold:    wall=1.57s user=2.18s system=0.54s 922712kB
lld:     wall=0.75s user=1.28s system=0.27s 664804kB


100 1000 10
nogold:  wall=5.53s user=4.53s system=0.95s 962440kB
gold:    wall=1.65s user=2.39s system=0.61s 987148kB
lld:     wall=0.75s user=1.30s system=0.25s 704820kB


10000 10 100
nogold:  wall=11.45s user=10.14s system=1.28s 1735224kB
gold:    wall=4.88s user=8.21s system=0.95s 2180432kB
lld:     wall=2.41s user=5.58s system=0.74s 2308672kB


1000 100 100
nogold:  wall=13.58s user=12.01s system=1.54s 1767832kB
gold:    wall=5.17s user=8.55s system=1.05s 2333432kB
lld:     wall=2.79s user=6.01s system=0.85s 2347664kB


100 1000 100
nogold:  wall=13.31s user=11.64s system=1.62s 1799664kB
gold:    wall=5.22s user=8.62s system=1.03s 2393516kB
lld:     wall=3.11s user=6.26s system=0.66s 2386392kB

This is the script that generates all the objects for the link tests:

generate-objects

#!/usr/bin/env bash
set -eu


# CLI args.


# Each of those files contains n_ints_per_file ints.
n_int_files="${1:-10}"
n_ints_per_file="${2:-10}"


# Each function adds all ints from all files.
# This leads to n_int_files x n_ints_per_file x n_funcs relocations.
n_funcs="${3:-10}"


# Do a debug build, since it is for debug builds that link time matters the most,
# as the user will be recompiling often.
cflags='-ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic'


# Cleanup previous generated files objects.
./clean


# Generate i_*.c, ints.h and int_sum.h
rm -f ints.h
echo 'return' > int_sum.h
int_file_i=0
while [ "$int_file_i" -lt "$n_int_files" ]; do
int_i=0
int_file="${int_file_i}.c"
rm -f "$int_file"
while [ "$int_i" -lt "$n_ints_per_file" ]; do
echo "${int_file_i} ${int_i}"
int_sym="i_${int_file_i}_${int_i}"
echo "unsigned int ${int_sym} = ${int_file_i};" >> "$int_file"
echo "extern unsigned int ${int_sym};" >> ints.h
echo "${int_sym} +" >> int_sum.h
int_i=$((int_i + 1))
done
int_file_i=$((int_file_i + 1))
done
echo '1;' >> int_sum.h


# Generate funcs.h and main.c.
rm -f funcs.h
cat <<EOF >main.c
#include "funcs.h"


int main(void) {
return
EOF
i=0
while [ "$i" -lt "$n_funcs" ]; do
func_sym="f_${i}"
echo "${func_sym}() +" >> main.c
echo "int ${func_sym}(void);" >> funcs.h
cat <<EOF >"${func_sym}.c"
#include "ints.h"


int ${func_sym}(void) {
#include "int_sum.h"
}
EOF
i=$((i + 1))
done
cat <<EOF >>main.c
1;
}
EOF


# Generate *.o
ls | grep -E '\.c$' | parallel --halt now,fail=1 -t --will-cite "gcc $cflags -c -o '{.}.o' '{}'"

GitHub upstream.

Note that the object file generation can be quite slow, since each C file can be quite large.

Given an input of type:

./generate-objects [n_int_files [n_ints_per_file [n_funcs]]]

it generates:

main.c

#include "funcs.h"


int main(void) {
return f_0() + f_1() + ... + f_<n_funcs>();
}

f_0.c, f_1.c, ..., f_<n_funcs>.c

extern unsigned int i_0_0;
extern unsigned int i_0_1;
...
extern unsigned int i_1_0;
extern unsigned int i_1_1;
...
extern unsigned int i_<n_int_files>_<n_ints_per_file>;


int f_0(void) {
return
i_0_0 +
i_0_1 +
...
i_1_0 +
i_1_1 +
...
i_<n_int_files>_<n_ints_per_file>
}

0.c, 1.c, ..., <n_int_files>.c

unsigned int i_0_0 = 0;
unsigned int i_0_1 = 0;
...
unsigned int i_0_<n_ints_per_file> = 0;

which leads to:

n_int_files x n_ints_per_file x n_funcs

relocations on the link.

Then I compared:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic               -o main *.o
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=gold -Wl,--threads -Wl,--thread-count=`nproc` -o main *.o
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=lld  -o main *.o

Some limits I've been trying to mitigate when selecting the test parameters:

  • at 100k C files, both methods get failed mallocs occasionally
  • GCC cannot compile a function with 1M additions

I have also observed a 2x in the debug build of gem5: https://gem5.googlesource.com/public/gem5/+/fafe4e80b76e93e3d0d05797904c19928587f5b5

Similar question: https://unix.stackexchange.com/questions/545699/what-is-the-gold-linker

Phoronix benchmarks

Phoronix did some benchmarking in 2017 for some real world projects, but for the projects they examined, the gold gains were not so significant: https://www.phoronix.com/scan.php?page=article&item=lld4-linux-tests&num=2 (archive).

Known incompatibilities

LLD benchmarks

At https://lld.llvm.org/ they give build times for a few well known projects. with similar results to my synthetic benchmarks. Project/linker versions are not given unfortunately. In their results:

  • gold was about 3x/4x faster than LD
  • LLD was 3x/4x faster than gold, so a greater speedup than in my synthetic benchmark

They comment:

This is a link time comparison on a 2-socket 20-core 40-thread Xeon E5-2680 2.80 GHz machine with an SSD drive. We ran gold and lld with or without multi-threading support. To disable multi-threading, we added -no-threads to the command lines.

and results look like:

Program      | Size     | GNU ld  | gold -j1 | gold    | lld -j1 |    lld
-------------|----------|---------|----------|---------|---------|-------
ffmpeg dbg |   92 MiB |   1.72s |   1.16s  |   1.01s |   0.60s |  0.35s
mysqld dbg |  154 MiB |   8.50s |   2.96s  |   2.68s |   1.06s |  0.68s
clang dbg | 1.67 GiB | 104.03s |  34.18s  |  23.49s |  14.82s |  5.28s
chromium dbg | 1.14 GiB | 209.05s |  64.70s  |  60.82s |  27.60s | 16.70s