获取后台进程的退出代码

我有一个命令 CMD 从我的 main bourne shell 脚本中调用,这需要很长时间。

我想修改脚本如下:

  1. 作为后台进程并行运行命令 CMD (CMD &)。
  2. 在主脚本中,每隔几秒钟有一个循环来监视衍生命令。循环还回显指示脚本进度的一些消息到 stdout。
  3. 当衍生命令终止时退出循环。
  4. 捕获并报告衍生进程的退出代码。

有人能给我点建议吗?

194946 次浏览
#/bin/bash


#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5


done


wait $pid
echo $?

1: 在 bash 中,$!保存执行的最后一个后台进程的 PID。无论如何,它将告诉您要监视的过程。

4: wait <n>等待,直到 PID 为 <n>的进程完成(它将阻塞,直到进程完成,所以您可能不想调用它,直到您确信进程完成) ,然后返回已完成进程的退出代码。

2,3: psps | grep " $! "可以告诉您进程是否仍在运行。这取决于您如何理解输出,并决定如何接近完成。(ps | grep不是防白痴的。如果您有时间,可以想出一个更健壮的方法来判断进程是否仍在运行)。

下面是一个基本剧本:

# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!


while   ps | grep " $my_pid "     # might also need  | grep -v grep  here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done


echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status

我会稍微改变你的做法。与其每隔几秒检查命令是否仍然存在并报告消息,不如使用另一个进程,该进程每隔几秒报告命令仍然在运行,然后在命令完成时终止该进程。例如:

#!/bin/sh


cmd() { sleep 5; exit 24; }


cmd &   # Run the long running process
pid=$!  # Record the pid


# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!


# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0


# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi

一个简单的例子,类似于上面的解决方案。这不需要监视任何进程输出。下一个示例使用 tail 跟踪输出。

$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+  Exit 5                  ./tmp.sh
$ echo $?
5

使用 tail 跟踪流程输出并在流程完成时退出。

$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+  Exit 5                  ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5

这可能超出了您的问题范围,但是如果您关心进程运行的时间长度,您可能有兴趣在一段时间之后检查运行后台进程的状态。使用 pgrep -P $$检查哪些子 PID 仍在运行很容易,但是我想出了以下解决方案来检查那些已经过期的 PID 的退出状态:

cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }


pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")


lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))


# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )


lasttimeout=$timeout
echo
done

产出:

interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);

注意: 如果愿意,可以将 $pids改为字符串变量而不是数组,以简化操作。

另一个解决方案是通过 proc 文件系统监视进程(比 ps/grep 组合安全) ; 当您启动一个进程时,它在/proc/$pid 中有一个相应的文件夹,因此解决方案可以是

#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....

现在,您可以随心所欲地使用 $exit _ status 变量。

正如我所看到的,几乎所有的答案都使用外部实用程序(主要是 ps)来轮询后台进程的状态。还有一个更为 Unixesh 的解决方案,捕获 SIGCHLD 信号。在信号处理程序中,必须检查哪个子进程停止了。它可以通过 kill -0 <PID>内置(通用)或检查是否存在 /proc/<PID>目录(特定于 Linux)或使用 jobs内置(特定于 )来完成。jobs -l还报告 PID。在这种情况下,输出的第3个字段可以是 Stop | Running | Done | Exit。).

这是我的例子。

启动的进程称为 loop.sh。它接受 -x或数字作为参数。因为 -x是出口代码为1的出口。对于一个数字,它等待 num * 5秒。它每5秒就打印出自己的 PID。

发射过程称为 launch.sh:

#!/bin/bash


handle_chld() {
local tmp=()
for((i=0;i<${#pids[@]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[@]})
}


set -o monitor
trap "handle_chld" CHLD


# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)


# Wait until all background processes are stopped
while [ ${#pids[@]} -gt 0 ]; do echo "WAITING FOR: ${pids[@]}"; sleep 2; done
echo STOPPED

有关详细说明,请参阅: 从 bash 脚本启动进程失败

当我有类似的需求时,我就是这样解决的:

# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}


pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done


# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done

使用这种方法,您的脚本不必等待后台进程,您只需要监视一个临时文件的退出状态。

FUNCmyCmd() { sleep 3;return 6; };


export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&

现在,您的脚本可以执行任何其他操作,而您只需要不断监视 retFile 的内容(它还可以包含您想要的任何其他信息,比如退出时间)。

顺便说一句,我在 bash 中编写了思维代码

后台子进程的 pid 存储在 $!中。 您可以将所有子进程的 pid 存储到一个数组中,例如 小儿麻痹症

wait [-n] [jobspec or pid …]

等待,直到每个进程 ID pid 或作业规范 jobspec 指定的子进程退出,并返回等待的最后一个命令的退出状态。如果给定了作业规范,则等待作业中的所有进程。如果没有给出参数,则等待所有当前活动的子进程,并且返回状态为零。如果提供了 -n 选项,则等待任何作业终止并返回其退出状态。如果 jobspec 和 pid 都没有指定 shell 的活动子进程,则返回状态为127。

使用 等等命令可以等待所有子进程完成,同时可以通过 $?获得每个子进程的退出状态,并将状态存储到 状态[]中。然后你可以根据状态做一些事情。

我已经尝试了以下两种解决方案,他们运行良好 更简洁,而 解决方案02有点复杂。

解决方案01

#!/bin/bash


# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
./${app} &
PIDS+=($!)
done


# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[@]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done


# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done

解决方案02

#!/bin/bash


# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done


# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[@]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done


# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done

我们的团队对远程 SSH 执行脚本也有同样的需求,这个脚本在25分钟的不活动后会超时。下面是一个解决方案,监视循环每秒检查一次后台进程,但是每10分钟才打印一次,以抑制不活动超时。

long_running.sh &
pid=$!


# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done


# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}

我的解决方案是使用一个匿名管道将状态传递给一个监视循环。没有用于交换状态的临时文件,因此没有需要清理的内容。如果你不确定后台作业的数量,那么中断条件可能是 [ -z "$(jobs -p)" ]

#!/bin/bash


exec 3<> <(:)


{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &


while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done

不如..。

# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done


# (optional) report on the status of that stuff as it exits
for pid in "${PID[@]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done


# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'


# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[@]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done


echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))