How to get result from background process linux shell script?

Question

For example let's say I want to count the number of lines of 10 BIG files and print a total.

for f in files
do
    #this creates a background process for each file
    wc -l $f | awk '{print $1}' &
done

I was trying something like:

for f in files
do
    #this does not work :/
    n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done

echo $n

hek2mgl · Accepted Answer · 2013-08-09 02:24:15Z

3

I finally found a working solution using anonymous pipes and bash:

#!/bin/bash

# this executes a separate shell and opens a new pipe, where the 
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the 
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)


# ... do other stuff


# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)

edited Aug 9, 2013 at 2:24

answered Aug 9, 2013 at 0:04

hek2mgl

159k31 gold badges263 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1214120 Over a year ago

I think this will create a file named "3". I'm Trying not to write into disk.

idfah · Accepted Answer · 2013-08-09 05:20:51Z

1

You should probably use gnu parallel:

find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'

or else xargs in parallel mode:

find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'

Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.

#!/bin/bash

declare -a temp_files

count=0
for f in *
do
  if [[ -f "$f" ]]; then
    temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
    ((count++))
  fi
done

count=0
for f in *
do
  if [[ -f "$f" ]]; then
    cat "$f" | wc -l > "${temp_files[$count]}" &
    ((count++))
  fi
done

wait

cat "${temp_files[@]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'

for tf in "${temp_files[@]}"
do
  rm "$tf"
done

By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.

answered Aug 9, 2013 at 5:20

idfah

1,46812 silver badges16 bronze badges

3 Comments

user1214120 Over a year ago

find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}' works perfect. But I went with the /dev/shm since I like to have a dynamic number of background processes.

idfah Over a year ago

You may want to be a bit careful, if you have a 1000 files you will end up with 1000 processes, this might really bog down your machine. It takes some extra work to ensure a limit on the number of processes.

user1214120 Over a year ago

I will, for now I have 10 files that have around 20-30GB size.

Alberto Zaccagni · Accepted Answer · 2013-08-09 00:20:17Z

0

You could write that to a file or better, listen to a fifo as soon as data arrives.

Here is a small example on how they work:

# create the fifo
mkfifo test

# listen to it
while true; do if read line <test; then echo $line; fi done

# in another shell 
echo 'hi there'

# notice 'hi there' being printed in the first shell

So you could

for f in files
do
    #this creates a background process for each file
    wc -l $f | awk '{print $1}' > fifo &
done

and listen on the fifo for sizes.

edited Aug 9, 2013 at 0:20

answered Aug 9, 2013 at 0:13

Alberto Zaccagni

31.8k11 gold badges75 silver badges108 bronze badges

Collectives™ on Stack Overflow

How to get result from background process linux shell script?

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related