1

I have below code and I want to get daily_balue, monthly_value, priorday_value, priorweek_value, priormonth_value from log file. As per the code the daily_value, monthly value, priorday value, prior week and priormonth value are stored in first index of array.

How to get daily,monthly value priority prior week and prior month values from first index of array ?

sql file as shown below:

first input file columns order:

cash , monthly_value, null,null, 0.000
cash, daily_value,2022-02-22,2022-02-22,3
cash, priormonth_value,2022-02-22,2022-02,8
cash , priorday_value, null,null, 10
cash, priorweek_value,2022-02-22,2022-02-22,32
loans, monthly_value, 2022-02-22,2022-02-22, 21.000
loans, daily_value,2022-02-22,2022-02-22,4
loans,priormonth_value,2022-02-22,2022-02,9
loans , priorday_value, null,null, 12
loans,priorweek_value,2022-02-22,2022-02-22,23

Output for first input should be stored in the below format in a csv file format : $date, $row, $daily_balue, $monthly_value, $priorday_value, $priorweek_value, priormonth_value

2022-02-22, cash, 3, 0.00, 10, 32, 8
2022-02-22, loans, 4, 21.00, 12, 23, 9

In some case the priormonth_values might be missing, so need to add zero in the output file second input file columns order:

cash , monthly_value, null,null, 0.000
cash, daily_value,2022-02-22,2022-02-22,3
cash , priorday_value, null,null, 10
cash, priorweek_value,2022-02-22,2022-02-22,32
loans, monthly_value, 2022-02-22,2022-02-22, 21.000
loans, daily_value,2022-02-22,2022-02-22,4
loans , priorday_value, null,null, 12
loans,priorweek_value,2022-02-22,2022-02-22,23

Output for second input should be stored in the below format in a csv file format : $date, $row, $daily_balue, $monthly_vale, $priorday_value, $priorweek_value, priormonth_value

2022-02-22, cash, 3, 0.00, 10, 32, 0
2022-02-22, loans, 4, 21.00, 12, 23, 0

third input file columns order . here daily_value and monthly_value are missing from the file

cash, priormonth_value,2022-02-22,2022-02,8
cash , priorday_value, null,null, 10
cash, priorweek_value,2022-02-22,2022-02-22,32
loans,priormonth_value,2022-02-22,2022-02,9
loans , priorday_value, null,null, 12
loans, priorweek_value,2022-02-22, 2022-02-02,11

Output for third input should be stored in the below format in a csv file. daily_value and monthly_value are assigned 0 format : $date, $row, $daily_balue, $monthly_vale, $priorday_value, $priorweek_value, priormonth_value

2022-02-22, cash, 0, 0, 10, 32, 8
2022-02-22, loans, 0, 0, 12, 11, 9
declare -A arr


if [ -s $sqlfile ]
then 
  while IFS=, read key value; do
          arr[$key] = "${arr[$key]}$arr[$key]:+,}$value"
  done < $sqlfile

       i=0
       for key in "${!arr[@]}"; do
       row=`echo $key | xargs`
       values = ${arr[$key]}

       daily_value = `echo $values | cut -d ',' -f8 | xargs`
       priomonth_value = `echo $value | cut -d ',' -f12 | xargs`

       i = $((i+1))
       done
else
       echo "$sqlfile is empty"
fi

exit 0

6
  • You have kinda sorta a CSV file. Do you really have variable spacing around the comma delimiters or not? Commented Feb 24, 2022 at 17:26
  • Sorry... no .csv file available to load. Commented Feb 24, 2022 at 17:39
  • 1
    @EdMorton, I have provided expected output files and their corresponding input files. Commented Feb 25, 2022 at 6:15
  • @ted Lyngmo ,The lines don't always start with cash. Commented Feb 25, 2022 at 6:26
  • Please read the description of tags you apply! Neither "linux" nor "shell" are good choices. Also, which shell are you using even? Commented Feb 25, 2022 at 6:52

2 Answers 2

1

You could use (GNU's version of ) to do this. The sporadic spaces around , in the input could be dealt with by using *, * as the field separator (FS).

In some cases, there is null where the date should be, so I add the value in the date array with the value in the null array. Since only one of them is set, the sum will be the correct value.

This also solves the case if neither the date nor the null array contains a value. blank + blank becomes 0.

#!/usr/bin/gawk -f

BEGIN {
  FS=" *, *";
  OFS=", ";
}
{
  # [date][row][value_type] = value
  arr[$3][$1][$2] = $5;
}
END {
  cl[0] = "cash";
  cl[1] = "loans";
  for(date in arr) {
    if(date == "null") continue;
    for(i = 0; i < 2; ++i) {
      row = cl[i];
      print date, row,
        arr[date][row]["daily_value"]     +arr["null"][row]["daily_value"],
        arr[date][row]["monthly_value"]   +arr["null"][row]["monthly_value"],
        arr[date][row]["priorday_value"]  +arr["null"][row]["priorday_value"],
        arr[date][row]["priorweek_value"] +arr["null"][row]["priorweek_value"],
        arr[date][row]["priormonth_value"]+arr["null"][row]["priormonth_value"];
    }
  }
}
Output

It's not exactly what you've listed as your expected output. When you expect 21.00, it'll be just 21 here. If .00 is important it can be fixed by using printf("%.2f", the_value); instead of just printing the result of the addition above.

File 1:

2022-02-22, cash, 3, 0, 10, 32, 8
2022-02-22, loans, 4, 21, 12, 23, 9

File 2:

2022-02-22, cash, 3, 0, 10, 32, 0
2022-02-22, loans, 4, 21, 12, 23, 0

File 3:

2022-02-22, cash, 0, 0, 10, 32, 8
2022-02-22, loans, 0, 0, 12, 11, 9


A version with a scale of 2:

#!/usr/bin/gawk -f

BEGIN {
  FS=" *, *";
}
{
  # [date][row][value_type] = value
  arr[$3][$1][$2] = $5;
}
END {
  cl[0] = "cash";
  cl[1] = "loans";
  vt[0] = "daily_value";
  vt[1] = "monthly_value";
  vt[2] = "priorday_value";
  vt[3] = "priorweek_value";
  vt[4] = "priormonth_value";
  for(date in arr) {
    if(date == "null") continue;
    for(i = 0; i < 2; ++i) {
      row = cl[i];
      printf("%s, %s", date, row);
      for(j = 0; j < 5; ++j) {
        vtype=vt[j];
        printf(", %.2f", arr[date][row][vtype] + arr["null"][row][vtype]);
      }
      printf("\n");
    }
  }
}
Output

File 1:

2022-02-22, cash, 3.00, 0.00, 10.00, 32.00, 8.00
2022-02-22, loans, 4.00, 21.00, 12.00, 23.00, 9.00

File 2:

2022-02-22, cash, 3.00, 0.00, 10.00, 32.00, 0.00
2022-02-22, loans, 4.00, 21.00, 12.00, 23.00, 0.00

File 3:

2022-02-22, cash, 0.00, 0.00, 10.00, 32.00, 8.00
2022-02-22, loans, 0.00, 0.00, 12.00, 11.00, 9.00
Sign up to request clarification or add additional context in comments.

2 Comments

I have edited the files and expected output. Thanks
@blackhole I've updated the output part of my answer too
0

Given:

$ head file{1..3}
==> file1 <==
cash , monthy_value, null,null, 0.000
cash,daily_value,2022-02-22,2022-02-22,3
cash,priormonth_value,20shelll22-02-22,2022-02,8

==> file2 <==
cash,priormonth_value,2022-02-22,2022-02,8
cash , monthy_value, null,null, 0.000
cash,daily_value,2022-02-22,2022-02-22,3

==> file3 <==
cash , monthy_value, null,null, 0.000
cash,daily_value,2022-02-22,2022-02-22,3

I personally would use a combination of awk and Bash like so:

for fn in file{1..3}; do 
    declare -A arr
    
    while read k v; do 
        arr["$k"]="$v"
    done <<< $(awk -F, -v keys='daily_value priormonth_value' '
    BEGIN{split(keys,tmp,"[ \t]+"); for (e in tmp) tgt[tmp[e]]}
    $2 in tgt {
        gsub(/\s*,\s*/,",")
        printf("%s %s\n",$2,$(NF))
    }' "$fn")
    echo "$fn"
    declare -p arr
    unset arr 
    echo 
done 

Prints:

file1
declare -A arr=([daily_value]="3" [priormonth_value]="8" )

file2
declare -A arr=([daily_value]="3" [priormonth_value]="8" )

file3
declare -A arr=([daily_value]="3" )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.