Replace array iteration with regex

Question

I want to find partially matching ipv6 prefixes in two arrays. For instance, 2001:db8: from one array will match 2001:db8:1::/48 and 2001:db8:2::/48 from another.

I already have it working by iterating one array other another:

ru_routes=( $(curl -4 ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest | egrep -o '\|RU\|ipv6\|.+?::\|[0-9]+' | cut -d'|' -f4 | sed 's/::$/:/g') );
msk_ix_routes=( $(curl -4 http://www.msk-ix.ru/download/lg/msk_ipv6_pfx.txt.gz | gunzip | egrep -o '\b.*::/[0-9]*') );
routes=();
for item1 in ${msk_ix_routes[@]}; do
    for item2 in ${ru_routes[@]}; do
        if [[ $item1 = $item2* ]]; then
            routes+=( $item1 );
            break
        fi
    done
done

But it works kinda slow on my mips router (~90sec). I found this useful answer, which runs much faster but I cannot get it to work same way as the one above. And I don't think I need "if" construction as in example, because it will do the same thing twice. My not-working version:

msk=" ${msk_ix_routes[*]} ";         # add framing blanks

for item in ${ru_routes[@]}; do
  routes+=( egrep -o "$item[\S]*/g" <<< $msk );
done

I guess there are problems with quoting and escaping here, but I cannot solve it. Please help) I am open to suggestions.

Btw, I used "comm" in first version which runs even faster, but then it does exact match only, hence I started to play with loops:

routes=( $(comm -12 <(printf '%s\n' "${ru_routes[@]}" | LC_ALL=C sort) <(printf '%s\n' "${msk_ix_routes[@]}" | LC_ALL=C sort)) );

Unrelated to anything else you want to quote the [@] list expansions to prevent word splitting of the array elements (probably not an issue in your case but the right way to do things in general). — Etan Reisner
– Etan Reisner, Commented Aug 31, 2014 at 23:48
What about those two non-working options is not working? What are they doing? (The second one looks like it will be creating an empty list since the [[ test doesn't return any contents (only a return code). You almost certainly want that test in an if block and then to append $item to the list (like in the linked question). — Etan Reisner
– Etan Reisner, Commented Aug 31, 2014 at 23:50
I agree about second option (removed it) First one gives me 889111 matches instead of 4xx valid matches. $item would be an exact match and I want to get all longer matches (substring) — Xand
– Xand, Commented Aug 31, 2014 at 23:55

Dmitry Alexandrov · Accepted Answer · 2014-09-01 15:07:33Z

1

Bash scripts are not good in efficiency at all. Try this:

#!/bin/bash

# e. g.: ripencc|RU|ipv6|2001:640::|32|19991115|allocated -> ^2001:640:
awk -v FS='|' \
    '$2 == "RU" && $3 == "ipv6" { sub(/::/, ":", $4); print "^" $4 }' \
    <(curl -4 ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest) \
|\
# grep e. g. '^2001:640:' in '2001:640:8000::/33'
grep --basic-regexp --file - \
    <(curl -4 http://www.msk-ix.ru/download/lg/msk_ipv6_pfx.txt.gz | gunzip)

edited Sep 1, 2014 at 15:07

answered Sep 1, 2014 at 0:06

Dmitry Alexandrov

1,77313 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Xand Over a year ago

Thanks comrade Dmitry, your last edit nailed it. And it is waay faster than loops. Any way together remove duplicated records coming from msk_ipv6_pfx.txt? For example 2a02:bc8::/32 and 2a02:bc8:fffe::/48. Both routes will go to IX, but higher prefix is enough (2a02:bc8::/32 is delegated). Thank you again

Dmitry Alexandrov Over a year ago

@Xand You’re welcome. :-) As for removing duplicating entries in msk_ipv6_pfx.txt, it’s looks non-trivial enough to be a separate question, actually here, on SO. But yes, of course, it’s possible. For instance: $ tac msk_ipv6_pfx.txt | awk -F '::' -v P='^$' '!/^[# ]/ && $0 != "" && $1 !~ P { P = "^" $1; print }'. That is not an optimal way, though – it can be accomplished without reversing a file. Do you need any comments?

Xand Over a year ago

I do see that number of lines reduced, but both prefixes mentioned above remains (would need to sleepover it anyway). But I would really appreciate hints on redirects to feed grep with msk-ix file after curl and gunzip. Otherwise I guess has to define it as a separate variable. Thank you

Dmitry Alexandrov Over a year ago

@Xand As for curl, if temporary files does not suit – see the edited answer.

Dmitry Alexandrov Over a year ago

@Xand As for removing redundancy of msk_ipv6_pfx.txt, well, I wasn’t attentive – actually tac is not enough, we have re-sort a file: $ sort -n msk_ipv6_pfx.txt | awk -F '::' -v P='^$' '!/^[# ]/ && $0 != "" && $1 !~ P { P = "^" $1; print }'.

|

Collectives™ on Stack Overflow

Replace array iteration with regex

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related