17

I have a bash script similar to:

NUM_PROCS=$1
NUM_ITERS=$2

for ((i=0; i<$NUM_ITERS; i++)); do
    python foo.py $i arg2 &
done

What's the most straightforward way to limit the number of parallel processes to NUM_PROCS? I'm looking for a solution that doesn't require packages/installations/modules (like GNU Parallel) if possible.

When I tried Charles Duffy's latest approach, I got the following error from bash -x:

+ python run.py args 1
+ python run.py ... 3
+ python run.py ... 4
+ python run.py ... 2
+ read -r line
+ python run.py ... 1
+ read -r line
+ python run.py ... 4
+ read -r line
+ python run.py ... 2
+ read -r line
+ python run.py ... 3
+ read -r line
+ python run.py ... 0
+ read -r line

... continuing with other numbers between 0 and 5, until too many processes were started for the system to handle and the bash script was shut down.

15
  • 2
    Take a look at: GNU Parallel Commented Aug 4, 2016 at 18:04
  • See: Parallelize Bash Script with maximum number of processes or Bash: limit the number of concurrent jobs? Commented Aug 4, 2016 at 18:13
  • ...unfortunately, the accepted answer there (err, as-edited, on the first proposed duplicate) is pretty awful. Commented Aug 4, 2016 at 18:14
  • (btw, seq isn't a standardized command -- not part of bash, and not part of POSIX, so there's no reason to believe it'll be present or behave a particular way on any given operating system. And re: case for shell variables, keeping in mind that they share a namespace with environment variables, see fourth paragraph of pubs.opengroup.org/onlinepubs/009695399/basedefs/… for POSIX conventions). Commented Aug 4, 2016 at 18:42
  • 1
    wait -n was introduced in bash 4.3. Commented Aug 4, 2016 at 19:57

6 Answers 6

14

bash 4.4 will have an interesting new type of parameter expansion that simplifies Charles Duffy's answer.

#!/bin/bash

num_procs=$1
num_iters=$2
num_jobs="\j"  # The prompt escape for number of jobs currently running
for ((i=0; i<num_iters; i++)); do
  while (( ${num_jobs@P} >= num_procs )); do
    wait -n
  done
  python foo.py "$i" arg2 &
done
Sign up to request clarification or add additional context in comments.

Comments

13

GNU, macOS/OSX, FreeBSD and NetBSD can all do this with xargs -P, no bash versions or package installs required. Here's 4 processes at a time:

printf "%s\0" {1..10} | xargs -0 -I @ -P 4 python foo.py @ arg2

Comments

8

As a very simple implementation, depending on a version of bash new enough to have wait -n (to wait until only the next job exits, as opposed to waiting for all jobs):

#!/bin/bash
#      ^^^^ - NOT /bin/sh!

num_procs=$1
num_iters=$2

declare -A pids=( )

for ((i=0; i<num_iters; i++)); do
  while (( ${#pids[@]} >= num_procs )); do
    wait -n
    for pid in "${!pids[@]}"; do
      kill -0 "$pid" &>/dev/null || unset "pids[$pid]"
    done
  done
  python foo.py "$i" arg2 & pids["$!"]=1
done

If running on a shell without wait -n, one can (very inefficiently) replace it with a command such as sleep 0.2, to poll every 1/5th of a second.


Since you're actually reading input from a file, another approach is to start N subprocesses, each of processes only lines where (linenum % N == threadnum):

num_procs=$1
infile=$2
for ((i=0; i<num_procs; i++)); do
  (
    while read -r line; do
      echo "Thread $i: processing $line"
    done < <(awk -v num_procs="$num_procs" -v i="$i" \
                 'NR % num_procs == i { print }' <"$infile")
  ) &
done
wait # wait for all the $num_procs subprocesses to finish

24 Comments

I tried both your earlier solution and this one. The first solution didn't parallelize at all (ran one process); this one ran all num_iters at once and then crashed the system.
What's the meaning of wait -n?
@tomas, wait -n waits only for a single process, as opposed for all background processes to exit.
Ahh. The advantage of read -a to read into an array, and then ${#array[@]} to test that array's length, is that unlike wc or tr, it's built into the shell itself -- the code in the first answer requires no external commands, whereas your pipeline has several mkfifo/fork/exec sequences required to execute. I'd have to repro the failure to speak to it.
@Amirmasudzarebidaki, ...that said, I replaced the first answer with an implementation that doesn't depend on process substitutions having access to the parent's job table -- some shell version without that property being the most obvious reason for the first implementation to fail.
|
3

A relatively simple way to accomplish this with only two additional lines of code. Explanation is inline.

NUM_PROCS=$1
NUM_ITERS=$2

for ((i=0; i<$NUM_ITERS; i++)); do
    python foo.py $i arg2 &
    let 'i>=NUM_PROCS' && wait -n # wait for one process at a time once we've spawned $NUM_PROC workers
done
wait # wait for all remaining workers

2 Comments

But, is it possible to abort the command am I executing? I already searched SGNINT approaches but I don't find anything useful which I can apply to this approach, did you achieve it? Thanks.
@z3nth10n that's a more complex question that should be posted separately.
2

Are you aware that if you are allowed to write and run your own scripts, then you can also use GNU Parallel? In essence it is a Perl script in one single file.

From the README:

= Minimal installation =

If you just need parallel and do not have 'make' installed (maybe the system is old or Microsoft Windows):

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
seq $2 | parallel -j$1 python foo.py {} arg2

parallel --embed (available since 20180322) even makes it possible to distribute GNU Parallel as part of a shell script (i.e. no extra files needed):

parallel --embed >newscript

Then edit the end of newscript.

Comments

1

This isn't the simplest solution, but if your version of bash doesn't have "wait -n" and you don't want to use other programs like parallel, awk etc, here is a solution using while and for loops.

num_iters=10
total_threads=4
iter=1
while [[ "$iter" -lt "$num_iters" ]]; do
    iters_remainder=$(echo "(${num_iters}-${iter})+1" | bc)
    if [[ "$iters_remainder" -lt "$total_threads" ]]; then
        threads=$iters_remainder
    else
        threads=$total_threads
    fi
    for ((t=1; t<="$threads"; t++)); do
        (
            # do stuff
        ) &
        ((++iter))
    done 
    wait
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.