如何从bash脚本并行运行多个程序?

我试图写一个. sh文件运行许多程序同时

我试过了

prog1
prog2

但是它会运行prog1,然后等待prog1结束,然后启动prog2……

那么如何并行运行呢?

472636 次浏览

并行运行多个程序:

prog1 &
prog2 &

如果你需要脚本等待程序完成,你可以添加:

wait

在您希望脚本等待它们的地方。

#!/bin/bash
prog1 & 2> .errorprog1.log; prog2 & 2> .errorprog2.log

将错误重定向到单独的日志。

有一个非常有用的程序调用nohup。

     nohup - run a command immune to hangups, with output to a non-tty

你可以尝试pps(废弃)。PPSS非常强大——你甚至可以创建一个迷你集群。 xargs -P如果你有一批尴尬的并行处理要做,它也很有用

使用GNU Parallel http://www.gnu.org/software/parallel/,它就像:

(echo prog1; echo prog2) | parallel

或者如果你喜欢:

parallel ::: prog1 prog2

了解更多:

如何:

prog1 & prog2 && fg

这将:

  1. prog1开始。
  2. 将其发送到后台,但继续打印其输出。
  3. 启动prog2把它放在前台,这样你就可以用ctrl-c来关闭它。
  4. 当你关闭prog2时,你将返回到prog1前景,所以你也可以用ctrl-c关闭它。

我最近遇到了类似的情况,我需要同时运行多个程序,将它们的输出重定向到独立的日志文件中,然后等待它们完成,最后我得到了这样的结果:

#!/bin/bash


# Add the full path processes to run to the array
PROCESSES_TO_RUN=("/home/joao/Code/test/prog_1/prog1" \
"/home/joao/Code/test/prog_2/prog2")
# You can keep adding processes to the array...


for i in ${PROCESSES_TO_RUN[@]}; do
${i%/*}/./${i##*/} > ${i}.log 2>&1 &
# ${i%/*} -> Get folder name until the /
# ${i##*/} -> Get the filename after the /
done


# Wait for the processes to finish
wait

来源:http://joaoperibeiro.com/execute-multiple-programs-and-redirect-their-outputs-linux/

你可以使用wait:

some_command &
P1=$!
other_command &
P2=$!
wait $P1 $P2

它将后台程序PID分配给变量($!是最后启动进程的PID),然后wait命令等待它们。这很好,因为如果您终止了脚本,它也会终止进程!

下面是我为了并行运行最多n个进程而使用的函数(示例中n=4):

max_children=4


function parallel {
local time1=$(date +"%H:%M:%S")
local time2=""


# for the sake of the example, I'm using $2 as a description, you may be interested in other description
echo "starting $2 ($time1)..."
"$@" && time2=$(date +"%H:%M:%S") && echo "finishing $2 ($time1 -- $time2)..." &


local my_pid=$$
local children=$(ps -eo ppid | grep -w $my_pid | wc -w)
children=$((children-1))
if [[ $children -ge $max_children ]]; then
wait -n
fi
}


parallel sleep 5
parallel sleep 6
parallel sleep 7
parallel sleep 8
parallel sleep 9
wait

如果max_children被设置为核数,该函数将尝试避免空闲核。

xargs -P <n>允许你并行运行<n>命令。

虽然-P是一个非标准选项,但GNU (Linux)和macOS/BSD实现都支持它。

示例如下:

  • 一次并行运行最多 3个命令,
  • 只有在先前启动的进程终止时才启动附加命令。
time xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF

输出如下所示:

1   # output from 1st command
4   # output from *last* command, which started as soon as the count dropped below 3
2   # output from 2nd command
3   # output from 3rd command


real    0m3.012s
user    0m0.011s
sys 0m0.008s

计时显示这些命令是并行运行的(最后一个命令仅在最初3个命令中的第一个命令终止后启动,但执行得非常快)。

xargs命令本身不会返回,直到所有命令都完成,但你可以在后台执行它,通过使用控制操作符&终止它,然后使用内置的wait来等待整个xargs命令完成。

{
xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
} &


# Script execution continues here while `xargs` is running
# in the background.
echo "Waiting for commands to finish..."


# Wait for `xargs` to finish, via special variable $!, which contains
# the PID of the most recently started background process.
wait $!

注意:

  • BSD/macOS xargs要求你指定并行运行显式地的命令数量,而GNU xargs允许你指定-P 0来并行运行尽可能多的尽可能的

  • 并行运行的进程的输出到达因为它是生成的,因此它将是不可预知的交叉

    • GNU parallel,正如Ole的回答中提到的(大多数平台的标准是吗),方便地在每个进程的基础上序列化(分组)输出,并提供了许多更高级的特性。

进程生成管理器

当然,从技术上讲,这些都是进程,这个程序实际上应该被称为进程生成管理器,但这只是由于BASH在使用&号进行分叉时的工作方式,它使用fork()或clone()系统调用将克隆到一个单独的内存空间,而不是像pthread_create()那样共享内存。如果BASH支持后者,那么每个“执行序列”的操作将完全相同,并且可以被称为传统线程,同时获得更有效的内存占用。然而,在功能上它的工作原理是一样的,尽管有点困难,因为GLOBAL变量在每个工作克隆中都是不可用的,因此需要使用进程间通信文件和基本的flock信号量来管理临界区。从BASH分叉当然是这里的基本答案,但我觉得好像人们知道这一点,但实际上是在管理派生的内容,而不是仅仅分叉然后忘记它。这演示了一种管理多达200个fork进程实例的方法,这些实例都访问单个资源。显然这有点过分了,但我喜欢写,所以我继续写下去。相应地增加终端的大小。我希望这对你有用。

ME=$(basename $0)
IPC="/tmp/$ME.ipc"      #interprocess communication file (global thread accounting stats)
DBG=/tmp/$ME.log
echo 0 > $IPC           #initalize counter
F1=thread
SPAWNED=0
COMPLETE=0
SPAWN=1000              #number of jobs to process
SPEEDFACTOR=1           #dynamically compensates for execution time
THREADLIMIT=50          #maximum concurrent threads
TPS=1                   #threads per second delay
THREADCOUNT=0           #number of running threads
SCALE="scale=5"         #controls bc's precision
START=$(date +%s)       #whence we began
MAXTHREADDUR=6         #maximum thread life span - demo mode


LOWER=$[$THREADLIMIT*100*90/10000]   #90% worker utilization threshold
UPPER=$[$THREADLIMIT*100*95/10000]   #95% worker utilization threshold
DELTA=10                             #initial percent speed change


threadspeed()        #dynamically adjust spawn rate based on worker utilization
{
#vaguely assumes thread execution average will be consistent
THREADCOUNT=$(threadcount)
if [ $THREADCOUNT -ge $LOWER ] && [ $THREADCOUNT -le $UPPER ] ;then
echo SPEED HOLD >> $DBG
return
elif [ $THREADCOUNT -lt $LOWER ] ;then
#if maxthread is free speed up
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1-($DELTA/100))"|bc)
echo SPEED UP $DELTA%>> $DBG
elif [ $THREADCOUNT -gt $UPPER ];then
#if maxthread is active then slow down
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1+($DELTA/100))"|bc)
DELTA=1                            #begin fine grain control
echo SLOW DOWN $DELTA%>> $DBG
fi


echo SPEEDFACTOR $SPEEDFACTOR >> $DBG


#average thread duration   (total elapsed time / number of threads completed)
#if threads completed is zero (less than 100), default to maxdelay/2  maxthreads


COMPLETE=$(cat $IPC)


if [ -z $COMPLETE ];then
echo BAD IPC READ ============================================== >> $DBG
return
fi


#echo Threads COMPLETE $COMPLETE >> $DBG
if [ $COMPLETE -lt 100 ];then
AVGTHREAD=$(echo "$SCALE;$MAXTHREADDUR/2"|bc)
else
ELAPSED=$[$(date +%s)-$START]
#echo Elapsed Time $ELAPSED >> $DBG
AVGTHREAD=$(echo "$SCALE;$ELAPSED/$COMPLETE*$THREADLIMIT"|bc)
fi
echo AVGTHREAD Duration is $AVGTHREAD >> $DBG


#calculate timing to achieve spawning each workers fast enough
# to utilize threadlimit - average time it takes to complete one thread / max number of threads
TPS=$(echo "$SCALE;($AVGTHREAD/$THREADLIMIT)*$SPEEDFACTOR"|bc)
#TPS=$(echo "$SCALE;$AVGTHREAD/$THREADLIMIT"|bc)  # maintains pretty good
#echo TPS $TPS >> $DBG


}
function plot()
{
echo -en \\033[${2}\;${1}H


if [ -n "$3" ];then
if [[ $4 = "good" ]];then
echo -en "\\033[1;32m"
elif [[ $4 = "warn" ]];then
echo -en "\\033[1;33m"
elif [[ $4 = "fail" ]];then
echo -en "\\033[1;31m"
elif [[ $4 = "crit" ]];then
echo -en "\\033[1;31;4m"
fi
fi
echo -n "$3"
echo -en "\\033[0;39m"
}


trackthread()   #displays thread status
{
WORKERID=$1
THREADID=$2
ACTION=$3    #setactive | setfree | update
AGE=$4


TS=$(date +%s)


COL=$[(($WORKERID-1)/50)*40]
ROW=$[(($WORKERID-1)%50)+1]


case $ACTION in
"setactive" )
touch /tmp/$ME.$F1$WORKERID  #redundant - see main loop
#echo created file $ME.$F1$WORKERID >> $DBG
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID INIT    " good
;;
"update" )
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID AGE:$AGE" warn
;;
"setfree" )
plot $COL $ROW "Worker$WORKERID: FREE                         " fail
rm /tmp/$ME.$F1$WORKERID
;;
* )


;;
esac
}


getfreeworkerid()
{
for i in $(seq 1 $[$THREADLIMIT+1])
do
if [ ! -e /tmp/$ME.$F1$i ];then
#echo "getfreeworkerid returned $i" >> $DBG
break
fi
done
if [ $i -eq $[$THREADLIMIT+1] ];then
#echo "no free threads" >> $DBG
echo 0
#exit
else
echo $i
fi
}


updateIPC()
{
COMPLETE=$(cat $IPC)        #read IPC
COMPLETE=$[$COMPLETE+1]     #increment IPC
echo $COMPLETE > $IPC       #write back to IPC
}




worker()
{
WORKERID=$1
THREADID=$2
#echo "new worker WORKERID:$WORKERID THREADID:$THREADID" >> $DBG


#accessing common terminal requires critical blocking section
(flock -x -w 10 201
trackthread $WORKERID $THREADID setactive
)201>/tmp/$ME.lock


let "RND = $RANDOM % $MAXTHREADDUR +1"


for s in $(seq 1 $RND)               #simulate random lifespan
do
sleep 1;
(flock -x -w 10 201
trackthread $WORKERID $THREADID update $s
)201>/tmp/$ME.lock
done


(flock -x -w 10 201
trackthread $WORKERID $THREADID setfree
)201>/tmp/$ME.lock


(flock -x -w 10 201
updateIPC
)201>/tmp/$ME.lock
}


threadcount()
{
TC=$(ls /tmp/$ME.$F1* 2> /dev/null | wc -l)
#echo threadcount is $TC >> $DBG
THREADCOUNT=$TC
echo $TC
}


status()
{
#summary status line
COMPLETE=$(cat $IPC)
plot 1 $[$THREADLIMIT+2] "WORKERS $(threadcount)/$THREADLIMIT  SPAWNED $SPAWNED/$SPAWN  COMPLETE $COMPLETE/$SPAWN SF=$SPEEDFACTOR TIMING=$TPS"
echo -en '\033[K'                   #clear to end of line
}


function main()
{
while [ $SPAWNED -lt $SPAWN ]
do
while [ $(threadcount) -lt $THREADLIMIT ] && [ $SPAWNED -lt $SPAWN ]
do
WID=$(getfreeworkerid)
worker $WID $SPAWNED &
touch /tmp/$ME.$F1$WID    #if this loops faster than file creation in the worker thread it steps on itself, thread tracking is best in main loop
SPAWNED=$[$SPAWNED+1]
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep $TPS
if ((! $[$SPAWNED%100]));then
#rethink thread timing every 100 threads
threadspeed
fi
done
sleep $TPS
done


while [ "$(threadcount)" -gt 0 ]
do
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep 1;
done


status
}


clear
threadspeed
main
wait
status
echo

使用bashj (https://sourceforge.net/projects/bashj/),你不仅可以运行多个流程(其他人建议的方式),还可以在一个由脚本控制的JVM中运行多个线程。当然,这需要java JDK。线程比进程消耗更少的资源。

下面是一个工作代码:

#!/usr/bin/bashj


#!java


public static int cnt=0;


private static void loop() {u.p("java says cnt= "+(cnt++));u.sleep(1.0);}


public static void startThread()
{(new Thread(() ->  {while (true) {loop();}})).start();}


#!bashj


j.startThread()


while [ j.cnt -lt 4 ]
do
echo "bash views cnt=" j.cnt
sleep 0.5
done

如果你想用ctrl-c轻松地运行和杀死多个进程,这是我最喜欢的方法:在(…)子shell中生成多个后台进程,并捕获SIGINT来执行kill 0,这将杀死子shell组中生成的所有内容:

(trap 'kill 0' SIGINT; prog1 & prog2 & prog3)

你可以拥有复杂的进程执行结构,并且所有内容都将以单个ctrl-c结束(只需确保最后一个进程在前台运行,即在prog1.3之后不包含&):

(trap 'kill 0' SIGINT; prog1.1 && prog1.2 & (prog2.1 | prog2.2 || prog2.3) & prog1.3)

如果最后一个命令有可能提前退出,而你想保持其他所有命令的运行,则添加wait作为最后一个命令。在下面的例子中,sleep 2会先退出,在sleep 4结束之前杀死它;添加wait可以让两者都运行完成:

(trap 'kill 0' SIGINT; sleep 4 & sleep 2 & wait)

你的脚本应该是这样的:

prog1 &
prog2 &
.
.
progn &
wait
progn+1 &
progn+2 &
.
.

假设你的系统一次可以处理n个任务。使用wait一次只运行n个作业。

如果你是:

  • 在Mac上使用iTerm
  • 想要启动各种长期打开的进程,直到按Ctrl+C
  • 希望能够轻松地看到每个进程的输出
  • 希望能够轻松地使用Ctrl+C停止特定进程

如果你的用例更多的是应用监控/管理,一种选择是编写终端本身的脚本。

例如,我最近做了以下事情。当然,它是特定于Mac的,特定于iTerm的,并且依赖于已弃用的Apple Script API (iTerm有一个更新的Python选项)。它没有赢得任何优雅奖,但完成了任务。

#!/bin/sh
root_path="~/root-path"
auth_api_script="$root_path/auth-path/auth-script.sh"
admin_api_proj="$root_path/admin-path/admin.csproj"
agent_proj="$root_path/agent-path/agent.csproj"
dashboard_path="$root_path/dashboard-web"


osascript <<THEEND
tell application "iTerm"
set newWindow to (create window with default profile)


tell current session of newWindow
set name to "Auth API"
write text "pushd $root_path && $auth_api_script"
end tell


tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Admin API"
write text "dotnet run --debug -p $admin_api_proj"
end tell
end tell


tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Agent"
write text "dotnet run --debug -p $agent_proj"
end tell
end tell


tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Dashboard"
write text "pushd $dashboard_path; ng serve -o"
end tell
end tell


end tell
THEEND

iTerm 2截图多个选项卡结果

这对我来说很漂亮(在这里找到):

sh -c 'command1 & command2 & command3 & wait'

它混合输出每个命令的所有日志(这是我想要的),并使用ctrl+c杀死所有日志。

如果您有一个GUI终端,您可以为希望并行运行的每个进程生成一个新的选项卡终端实例。

这样做的好处是,每个程序都在自己的选项卡中运行,它可以独立于其他运行的程序进行交互和管理。

例如,在Ubuntu 20.04上:

gnome-terminal --tab -- bash -c 'prog1'
gnome-terminal --tab -- bash -c 'prog2'

要连续运行某些程序或其他命令,可以添加;

gnome-terminal --tab -- bash -c 'prog1_1; prog1_2'
gnome-terminal --tab -- bash -c 'prog2'

我发现对于某些程序,终端在启动之前就关闭了。对于这些程序,我用; wait; sleep 1附加terminal命令

gnome-terminal --tab -- bash -c 'prog1; wait'

对于Mac OS,你必须为你正在使用的终端找到一个等效的命令——我没有在Mac OS上测试,因为我没有Mac。

由于某种原因,我不能使用wait,我想出了这个解决方案:

# create a hashmap of the tasks name -> its command
declare -A tasks=(
["Sleep 3 seconds"]="sleep 3"
["Check network"]="ping imdb.com"
["List dir"]="ls -la"
)


# execute each task in the background, redirecting their output to a custom file descriptor
fd=10
for task in "${!tasks[@]}"; do
script="${tasks[${task}]}"
eval "exec $fd< <(${script} 2>&1 || (echo $task failed with exit code \${?}! && touch tasks_failed))"
((fd+=1))
done


# print the outputs of the tasks and wait for them to finish
fd=10
for task in "${!tasks[@]}"; do
cat <&$fd
((fd+=1))
done


# determine the exit status
#   by checking whether the file "tasks_failed" has been created
if [ -e tasks_failed ]; then
echo "Task(s) failed!"
exit 1
else
echo "All tasks finished without an error!"
exit 0
fi

这里有很多有趣的答案,但我从这个答案中获得灵感,并将一个简单的脚本放在一起,并行运行多个进程,并在完成后处理结果。你可以在这个要点或下面找到它:

#!/usr/bin/env bash


# inspired by https://stackoverflow.com/a/29535256/2860309


pids=""
failures=0


function my_process() {
seconds_to_sleep=$1
exit_code=$2
sleep "$seconds_to_sleep"
return "$exit_code"
}


(my_process 1 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to success"


(my_process 1 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to failure"


(my_process 2 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to success"


(my_process 2 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to failure"


echo "..."


for pid in $pids; do
if wait "$pid"; then
echo "Process $pid succeeded"
else
echo "Process $pid failed"
failures=$((failures+1))
fi
done


echo
echo "${failures} failures detected"

结果是:

86400: 1 second to success
86401: 1 second to failure
86402: 2 seconds to success
86404: 2 seconds to failure
...
Process 86400 succeeded
Process 86401 failed
Process 86402 succeeded
Process 86404 failed


2 failures detected