1

I want to find partially matching ipv6 prefixes in two arrays. For instance, 2001:db8: from one array will match 2001:db8:1::/48 and 2001:db8:2::/48 from another.

I already have it working by iterating one array other another:

ru_routes=( $(curl -4 ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest | egrep -o '\|RU\|ipv6\|.+?::\|[0-9]+' | cut -d'|' -f4 | sed 's/::$/:/g') );
msk_ix_routes=( $(curl -4 http://www.msk-ix.ru/download/lg/msk_ipv6_pfx.txt.gz | gunzip | egrep -o '\b.*::/[0-9]*') );
routes=();
for item1 in ${msk_ix_routes[@]}; do
    for item2 in ${ru_routes[@]}; do
        if [[ $item1 = $item2* ]]; then
            routes+=( $item1 );
            break
        fi
    done
done

But it works kinda slow on my mips router (~90sec). I found this useful answer, which runs much faster but I cannot get it to work same way as the one above. And I don't think I need "if" construction as in example, because it will do the same thing twice. My not-working version:

msk=" ${msk_ix_routes[*]} ";         # add framing blanks

for item in ${ru_routes[@]}; do
  routes+=( egrep -o "$item[\S]*/g" <<< $msk );
done

I guess there are problems with quoting and escaping here, but I cannot solve it. Please help) I am open to suggestions.

Btw, I used "comm" in first version which runs even faster, but then it does exact match only, hence I started to play with loops:

routes=( $(comm -12 <(printf '%s\n' "${ru_routes[@]}" | LC_ALL=C sort) <(printf '%s\n' "${msk_ix_routes[@]}" | LC_ALL=C sort)) );
3
  • Unrelated to anything else you want to quote the [@] list expansions to prevent word splitting of the array elements (probably not an issue in your case but the right way to do things in general). Commented Aug 31, 2014 at 23:48
  • What about those two non-working options is not working? What are they doing? (The second one looks like it will be creating an empty list since the [[ test doesn't return any contents (only a return code). You almost certainly want that test in an if block and then to append $item to the list (like in the linked question). Commented Aug 31, 2014 at 23:50
  • I agree about second option (removed it) First one gives me 889111 matches instead of 4xx valid matches. $item would be an exact match and I want to get all longer matches (substring) Commented Aug 31, 2014 at 23:55

1 Answer 1

1

Bash scripts are not good in efficiency at all. Try this:

#!/bin/bash

# e. g.: ripencc|RU|ipv6|2001:640::|32|19991115|allocated -> ^2001:640:
awk -v FS='|' \
    '$2 == "RU" && $3 == "ipv6" { sub(/::/, ":", $4); print "^" $4 }' \
    <(curl -4 ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest) \
|\
# grep e. g. '^2001:640:' in '2001:640:8000::/33'
grep --basic-regexp --file - \
    <(curl -4 http://www.msk-ix.ru/download/lg/msk_ipv6_pfx.txt.gz | gunzip)
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks comrade Dmitry, your last edit nailed it. And it is waay faster than loops. Any way together remove duplicated records coming from msk_ipv6_pfx.txt? For example 2a02:bc8::/32 and 2a02:bc8:fffe::/48. Both routes will go to IX, but higher prefix is enough (2a02:bc8::/32 is delegated). Thank you again
@Xand You’re welcome. :-) As for removing duplicating entries in msk_ipv6_pfx.txt, it’s looks non-trivial enough to be a separate question, actually here, on SO. But yes, of course, it’s possible. For instance: $ tac msk_ipv6_pfx.txt | awk -F '::' -v P='^$' '!/^[# ]/ && $0 != "" && $1 !~ P { P = "^" $1; print }'. That is not an optimal way, though – it can be accomplished without reversing a file. Do you need any comments?
I do see that number of lines reduced, but both prefixes mentioned above remains (would need to sleepover it anyway). But I would really appreciate hints on redirects to feed grep with msk-ix file after curl and gunzip. Otherwise I guess has to define it as a separate variable. Thank you
@Xand As for curl, if temporary files does not suit – see the edited answer.
@Xand As for removing redundancy of msk_ipv6_pfx.txt, well, I wasn’t attentive – actually tac is not enough, we have re-sort a file: $ sort -n msk_ipv6_pfx.txt | awk -F '::' -v P='^$' '!/^[# ]/ && $0 != "" && $1 !~ P { P = "^" $1; print }'.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.