How to check if string is present in bash array using awk

Question

I've got a file that looks like this:

a    12345
b    3456
c    45678

and i've got bash array of strings:

mylist=("a" "b")

What I want to do is to sum numbers in second column but only for rows where first column value (aka "a" or "b") is present in mylist.

My not-working code:

cat myfile.txt | awk -F'\t' '{BEGIN{sum=0} {if ($1 in ${mylist[@]}) sum+=$2} END{print sum}}'

Expected result is 12345+3456=15801. I understand that problem is in if-statement but can't figure out how to rearrange this code to work.

awk can't see bash variables; they're two different interpreters in two different processes. It's not clear how you'd expect this to work -- and you don't need awk for the job you're doing anyhow; native bash can do it just fine. — Charles Duffy
– Charles Duffy, Commented Feb 3, 2023 at 14:09
Or if you want something faster than native bash when operating on very large input files, the standard UNIX toolkit has join, perfectly well-suited to extracting only the lines you care about. — Charles Duffy
– Charles Duffy, Commented Feb 3, 2023 at 14:11
Thanks, Shawn, yes it was my typo, i didn't use them in original code, edited it — kirill tarasov
– kirill tarasov, Commented Feb 3, 2023 at 14:14
(And if you want to quickly check if a bash array contains a string, you should make it an associative array with that string as the key instead of the value; that way it's an O(1) lookup instead of an O(n) one). — Charles Duffy
– Charles Duffy, Commented Feb 3, 2023 at 14:14
you actually think an approach involving unnecessary pre-sorting is a good solution to big data joining ? ha — RARE Kpop Manifesto
– RARE Kpop Manifesto, Commented Feb 3, 2023 at 15:50

Shawn · Accepted Answer · 2023-02-03 14:16:02Z

2

Doing it in pure bash by making the elements of the original array keys in an associative one:

#!/usr/bin/env bash

mylist=(a b)

# Use the elements of the array as the keys in an associative array
declare -A keys
for elem in "${mylist[@]}"; do
    keys[$elem]=1
done


declare -i sum=0
# Read the lines on standard input
# For example, ./sum.sh < input.txt
while read -r name num; do
    # If the name is a key in the associative array, add to the sum
    if [[ -v keys[$name] ]]; then
        sum+=$num
    fi
done

printf "%d\n" "$sum"

answered Feb 3, 2023 at 14:16

Shawn

53.9k3 gold badges29 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Charles Duffy Over a year ago

Maybe also show the alternative declare -A keys=([a]=1 [b]=1) defining an associative array up-front, vs starting from an indexed array and transforming?

M. Nejat Aydin · Accepted Answer · 2023-02-03 15:11:33Z

2

One method would be:

#!/bin/bash

mylist=(a b)

awk '
    FNR==NR { a[$1]; next }
    $1 in a { sum += $2 }
        END { print sum }
' <(printf '%s\n' "${mylist[@]}") file

Note that, when initializing an array in bash, array elements are separated by whitespaces, not commas.

edited Feb 3, 2023 at 15:11

answered Feb 3, 2023 at 14:16

M. Nejat Aydin

10.3k1 gold badge10 silver badges22 bronze badges

Comments

Charles Duffy · Accepted Answer · 2023-02-03 14:21:51Z

1

There's no good reason to make awk read the array in the first place. Let join quickly pick out the matching lines -- that's what it's specialized to do.

And if in real life your array and input file keys are guaranteed to be sorted as they are in the example, you can take the sort uses out of the code below.

# Cautious code that doesn't assume input sort order
LC_ALL=C join -1 1 -2 1 -o1.2 \
  <(LC_ALL=C sort <myfile.txt) \
  <(printf '%s\n' "${mylist[@]}" | LC_ALL=C sort) \
  | awk '{ sum += $1 } END { print sum }'

...or...

# Fast code that requires both the array and the file to be pre-sorted
join -1 1 -2 1 -o1.2 myfile.txt <(printf '%s\n' "${mylist[@]}") \
  | awk '{ sum += $1 } END { print sum }'

edited Feb 3, 2023 at 14:21

answered Feb 3, 2023 at 14:16

Charles Duffy

299k43 gold badges441 silver badges497 bronze badges

Collectives™ on Stack Overflow

How to check if string is present in bash array using awk

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related