5

I am a bit new to bash, and I need to run a short command several hundred times in parallel but print output sequentially. The command prints a fairly short output to stdout that is I do not want to loose or for it to get garbled/mixed up with the output of another thread. Is there a way in Linux to run several commands (e.g. no more than N threads in parallel) so that all command outputs are printed sequentially (in any order, as long as they don't overlap).

Current bash script (full code here)

declare -a UPDATE_ERRORS
UPDATE_ERRORS=( )

function pull {
    git pull  # Assumes current dir is set
    if [[ $? -ne 0 ]]; then
      UPDATE_ERRORS+=("error message")
    fi

for f in extensions/*; do
  if [[ -d $f ]]; then
    ########## This code should run in parallel, but output of each thread
    ########## should be cached and printed sequentially one after another
    ########## pull function also updates a global var that will be used later
    pushd $f > /dev/null
    pull
    popd > /dev/null
  fi
done

if [[ ${#UPDATE_ERRORS[@]} -ne 0 ]]; then
  # print errors again
fi
3
  • 3
    Take a look at gnu.org/software/parallel Commented Aug 12, 2014 at 5:21
  • Thanks, looks promising, but how would I make each thread add an error message to the global array in case of a failure? Commented Aug 12, 2014 at 6:16
  • Point 1) Add -k to your invocation of GNU Parallel to keep the outputs in order. Point 2) Define a function, and be sure to export it, and pass the function to GNU Parallel to execute - inside the function, append the error message to your array. gnu.org/software/parallel/… Commented Aug 12, 2014 at 8:36

3 Answers 3

3

You can use flock for this. I have emulate the similar situation to test. do_the_things proc generates overlapping in time output. In a for loop text generation called several times simultaneously. Output should mess, but output is feeded to procedure locked_print which waits until lock is freed and then prints recieved input to stdout. Exports are needed to call procedure from inside of a pipe.

#!/bin/bash

do_the_things()
        {
        rand="$((RANDOM % 10))"
        sleep $rand
        for i in `seq 1 10`; do sleep 1; echo "${rand}-$i"; done
        }

locked_print()
        {
        echo Started
        flock -e testlock cat
        }

export -f do_the_things
export -f locked_print

for f in a b c d; do
        (do_the_things | locked_print) &
done
wait
Sign up to request clarification or add additional context in comments.

2 Comments

@user313294 If there are lots of things to do, say 500, it will just slam them all on to the CPUs to do at once, won't it? Sometimes that actually slows things up when tasks compete for CPU/network/disk bandwidth. +1 for a nice, tidy solution.
@MarkSetchell I think 499 of them will fill their pipe buffers and will be blocked on io until lock is freed while 1 will pipe its output out.
1

Try something like this. I don't have/use git so I have done a dummy command to simulate it in my version.

#!/bin/bash
declare -a ERRORS
ERRORS=( )

function pull {
    cd "$1"
    echo Starting pull in $1
    for i in {0..9}; do echo "$1 Output line $i";done
    sleep 5
    echo "GITERROR: Dummy error in directory $1"
}

export -f pull

for f in extensions/*; do
  if [[ -d $f ]]; then
    ########## This code should run in parallel, but output of each thread
    ########## should be cached and printed sequentially one after another
    ########## pull function also updates a global var that will be used later
    echo $f
  fi
done | parallel -k pull | tee errors.tmp

IFS=$'\n' ERRORS=($(grep "^GITERROR:" errors.tmp))
rm errors.tmp

for i in "${ERRORS[@]}"; do
   echo $i
done

You will see that even if there are 4 directories to pull, the entire script will only take 5 seconds - despite executing 4 lots of sleep 5.

2 Comments

It would be better to use find extensions -type d | parallel -k pull | tee errors.tmp
@user3132194 I just re-used the OP's code where possible to demonstrate the technique. I think that find command will also find the extensions parent directory which the OP's command will not. Actually, I think bash4 would do best with for d in exrensions/*/ but I don't have bash4 on my OSX, and as I said, I was concentrating on demonstrating the joys of GNU Parallel :-)
0

List dirs by adding /. Parallel spawns a shell that cd's to the dir. If git pull fails a magic string is printed. All output is also kept as copies in out/1/*. When all pulls are done, check which files the magic strings occurs in and print out the STDOUT/STDERR of those commands. Cleanup.

parallel --results out 'cd {} && (git pull || echo e_R_r_O_r)' :::  extensions/*/
grep -l e_R_r_O_r out/*/stdout | parallel 'grep -v e_R_r_O_r {//}/stdout; cat {//}/stderr >&2'
rm -r out

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.