0

I have numerous(nTotal) number of files each with one column of length L of float numbers, I want to add entry in line i_th of all these file and at the end. Compute its average and standard deviation. I first read each file. Then I try to add this array to a an array, which gives me a syntax error: (standard_in) 2: syntax error. I expect that suma[i] contains sum of all the entries on line i_th of all the files now. Then I find the average Edit I changed for loops as suggested.

for (( n= 1 ; n < $nTotal; n++ ))
do

   IFS=$'\n'
   arr1=($(./a.out filename | sed 's/:.*//'))

   for (( i= 1 ; i < $L; i++ ))       
   do
       sum[i]=`echo "${sum[i]} - ${arr1[i]}" | bc`
   done
done

for (( i= 1 ; i < $L; i++ ))  
do
   ya=$(echo -1*${sum[i]} | bc)
   aveSum=$(echo $ya/$nTotal | bc -l)
done

Edit: ./a.out produces files with one column of float numbers.

To find standard deviation though, I again read data files and store them in arrays (I'm sure this is not the smartest way of doing it but I couldn't think of anything else.). I also could not find the standard deviation using:

for (( i= 1 ; i < $L; i++ ))  
 do
    ya=$(echo -1*${sum[i]} | bc)
    ta=$(echo $ya/$nTotal | bc -l)

    tempval=`echo "${arr1[i]} - $ta * ${arr1[i]} - $ta" | bc`
    val[i]=`echo "${val[i]} - $tempval" | bc`
 done

Here I get zero for val[i] elements, I can't figure what is wrong. I would really appreciate it if you can guide me for this problem.

14
  • 3
    for ((i=1; i<=L; i++)); do ...; done is the Right Way to write what you might mean by for i in {1..L}. Commented Oct 28, 2014 at 15:12
  • 2
    ...by the way, that's entry 33 in mywiki.wooledge.org/BashPitfalls Commented Oct 28, 2014 at 15:13
  • 1
    I'd also suggest using set -x to look at what commends your script actually invokes when run (maybe with PS4=':$LINENO+' to show which line it's on at any given time), finding the first place it behaves unexpectedly, and asking a question focused on that behavior specifically (if it isn't obvious). Commented Oct 28, 2014 at 15:14
  • 2
    Brace expansions do not create an arithmetic environment, so even if parameter expansions could be used in braces, you would still have to write {1..$L}, not {1..L}. Commented Oct 28, 2014 at 15:30
  • 2
    In case you haven't figured this out by now you have a number of serious problems with this script, sufficient in number and severity to make it virtually impossible for any one answer to actually be of help to you short of simply rewriting the script for you. Perhaps you should retract the question. Go over some of the suggestions you've been given (particularly the set -x suggestion from @CharlesDuffy) fix the errors as best you can and come back with a better script and a more focused question. Commented Oct 28, 2014 at 15:34

1 Answer 1

1

Bash might not be exactly the easiest for this problem, particularly since it doesn't implement non-integer arithmetic. I'd use awk:

awk '{ n[FNR]++;
       delta = $1 - mean[FNR];
       mean[FNR] += delta / n[FNR];
       m2[FNR] += delta * ($1 - mean[FNR]);
     }
     END {for (i=1; i in n; ++i)
            print mean[i], sqrt(m2[i]/(n[i]-1));
     }' file1 file2 ...

The math is taken directly from the well-known "online" mean and variance algorithms. The program assumes that all files have exactly L lines, but if a few have more or less, the missing data will just be ignored; you might want to do a better validity test. In the particular case that only one file has too many lines, the standard deviation computation will trap a divide-by-zero; in one reading, that doesn't matter since the correct data will already have been printed, but you might want to fix that, too.

The program makes use of a couple of awk features: first, arrays are automatically (and lazily) initialized to 0 (if used as numbers); second, FNR is the line number in the current file. (NR is the line number in the input as a whole, but in this case FNR is more useful.)

Sign up to request clarification or add additional context in comments.

1 Comment

thanks. This is very nice. I ended up using a c code, however I was still curious as what is the way around this in shell. Thanks :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.