3

For example let's say I want to count the number of lines of 10 BIG files and print a total.

for f in files
do
    #this creates a background process for each file
    wc -l $f | awk '{print $1}' &
done

I was trying something like:

for f in files
do
    #this does not work :/
    n=$( expr $(wc -l $f | awk '{print $1}') + $n ) &
done

echo $n

3 Answers 3

3

I finally found a working solution using anonymous pipes and bash:

#!/bin/bash

# this executes a separate shell and opens a new pipe, where the 
# reading endpoint is fd 3 in our shell and the writing endpoint
# stdout of the other process. Note that you don't need the 
# background operator (&) as exec starts a completely independent process.
exec 3< <(./a.sh 2&1)


# ... do other stuff


# write the contents of the pipe to a variable. If the other process
# hasn't already terminated, cat will block.
output=$(cat <&3)
Sign up to request clarification or add additional context in comments.

1 Comment

I think this will create a file named "3". I'm Trying not to write into disk.
1

You should probably use gnu parallel:

find . -maxdepth 1 -type f | parallel --gnu 'wc -l' | awk 'BEGIN {n=0} {n += $1} END {print n}'

or else xargs in parallel mode:

find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}'

Another option, if this doesn't fit your needs, is to write to temp files. If you don't want to write to disk, just write to /dev/shm. This is a ramdisk on most Linux systems.

#!/bin/bash

declare -a temp_files

count=0
for f in *
do
  if [[ -f "$f" ]]; then
    temp_files[$count]="$(mktemp /dev/shm/${f}-XXXXXX)"
    ((count++))
  fi
done

count=0
for f in *
do
  if [[ -f "$f" ]]; then
    cat "$f" | wc -l > "${temp_files[$count]}" &
    ((count++))
  fi
done

wait

cat "${temp_files[@]}" | awk 'BEGIN {n=0} {n += $1} END {print n}'

for tf in "${temp_files[@]}"
do
  rm "$tf"
done

By the way, this can be though of as a map-reduce with wc doing the mapping and awk doing the reduction.

3 Comments

find . -maxdepth 1 -type f | xargs -n1 -P4 wc -l | awk 'BEGIN {n=0} {n += $1} END {print n}' works perfect. But I went with the /dev/shm since I like to have a dynamic number of background processes.
You may want to be a bit careful, if you have a 1000 files you will end up with 1000 processes, this might really bog down your machine. It takes some extra work to ensure a limit on the number of processes.
I will, for now I have 10 files that have around 20-30GB size.
0

You could write that to a file or better, listen to a fifo as soon as data arrives.

Here is a small example on how they work:

# create the fifo
mkfifo test

# listen to it
while true; do if read line <test; then echo $line; fi done

# in another shell 
echo 'hi there'

# notice 'hi there' being printed in the first shell

So you could

for f in files
do
    #this creates a background process for each file
    wc -l $f | awk '{print $1}' > fifo &
done

and listen on the fifo for sizes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.