Perl6 Regexen:
Reducing line noise in your code.
Steven Lembark
Workhorse Computing
lembark@wrkhors.com
The difference
… prefer readabity over compactness.
– Larry Wall
Regexes you can read.
Code you can maintain.
Do what I mean need.
What is line noise?
Random garbage on the screen.
Due to signal noise.
Eliminted by error correcting modems.
More effective use of Regex
Daisy-chaining tokens.
RX-from-hell with alternations.
Q: Why?
A: Parsing.
What we had to do.
Without grammars.
Parsing order implicit in RX, code.
Branching logic on matched tokens.
Nested-if in for loops.
Parse *.ini file
my $header = qr{ ^ h* [ (?<name> [^][]+ ) ] h* $ }xm;
my $property = qr{ ^ h* (?<key> .+? ) h* = h* (?<value> .+ ) $ }xm;
my $comment = qr{ ^ h* # }xm;
my $empty_line = qr{ ^ h* $}xm;
Regexen tokenize input.
Parse *.ini
for my $nextline (readline($INI_FILE)) {
# If it's a header line, start a new section...
if ($nextline =~ /$header/) {
$section = $config{ $+{name} } //= {};
}
# If it's a property, add the key and value to the current section...
elsif ($nextline =~ /$property/) {
$section->{ $+{key} } = $+{value};
}
# Ignore comments or empty lines
elsif ($nextline =~ /$comment|$empty_line/) {
# Do nothing
}
# Report anything else as a probable error...
else {
warn "Invalid data in INI file at line $.n"
. "t$nextlinen";
}
}
Code processes it.
Maintainable?
Inter-related order of code and rx.
Code changes affect rx?
Rx changes affect code?
Find out: Try it and see...
Perl6: Grammars
Structure in one place.
Declarative.
No iterative code.
Tokens & Structure
grammar INI
{
token TOP { <section>* }
token section { [ ^ | <header> ] <block> }
token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n }
token block { [ <property> | <.emptylines> | <.comment> ]* }
token property { h* $<name>=N+? h* '=' h* $<value>=N+ n }
token comment { ^^ h* '#' N* n }
token emptylines { [ ^^ h* n ]+ }
}
Tokens & Structure
grammar INI
{
token TOP { <section>* }
token section { [ ^ | <header> ] <block> }
token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n }
token block { [ <property> | <.emptylines> | <.comment> ]* }
token property { h* $<name>=N+? h* '=' h* $<value>=N+ n }
token comment { ^^ h* '#' N* n }
token emptylines { [ ^^ h* n ]+ }
}
Tokens & Structure
grammar INI
{
token TOP { <section>* }
token section { [ ^ | <header> ] <block> }
token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n }
token block { [ <property> | <.emptylines> | <.comment> ]* }
token property { h* $<name>=N+? h* '=' h* $<value>=N+ n }
token comment { ^^ h* '#' N* n }
token emptylines { [ ^^ h* n ]+ }
}
Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;
Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;
Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;
Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;
Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;
Care to guess what this does?
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
Care to guess what this does?
Q: Which char's match themselves?
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
Care to guess what this does?
Q: Which char's match themselves?
A: None, in Perl6, since they are punctuation.
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' /
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Consume WS
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Literals
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Non-capturing match
Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Separator
Aside: Future speed
Perl6 regexen execute as DFA where possible.
Opportunity for much faster execution.
For some definition of ...
Aside: Future speed
Perl6 regexen execute as DFA where possible.
Also benefit from saner results.
DFA more likely to Do What I Mean.
Alternations match best, not first match.
Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Perl 6:
say $/ if 'Which regex engine is smartest?'
~~ /smart|smarter|smartest/;
Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Perl 6:
say $/ if 'Which regex engine is smartest?'
~~ /smart|smarter|smartest/;
Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Perl 6:
say $/ if 'Which regex engine is smartest?'
~~ /smart|smarter|smartest/;
Perl5 nested structure
#! /usr/bin/env perl
use 5.010;
# We're going to need this to extract hierarchical data structures...
our @stack = [];
my $LIST = qr{
# Match this...
(?&NESTED)
# Which is defined as...
(?(DEFINE)
(?<NESTED>
# Keep track of recursions on a stack...
(?{ local @::stack = (@::stack, []); })
# Match a list of items...
[ s* (?>
(?&ITEM)
(?:
s* , s* (?&ITEM)
)*+
)? s*
]
# Pop the stack and add that frame to the growing data structure...
(?{ local @::stack = @::stack;
my $nested = pop @stack;
push @{$::stack[-1]}, $nested;
})
)
# For each item, push it onto the stack if it's a leaf node...
(?<ITEM>
(d+) (?{ push @{$stack[-1]}, $^N })
| (?&NESTED)
)
)
}x;
# Match, extracting a data structure...
'[1,2,[3,3,[4,4]],5]' =~ /$LIST/;
# Retrieve the data structure...
my $parse_tree = pop @stack;
# Show it...
use Data::Dumper 'Dumper';
say Dumper($parse_tree);
Perl6 nested grammar
#! /usr/bin/env perl6
use v6;
# Define the structure of a list...
grammar LIST {
rule TOP { '[' <ITEM>* % ',' ']' }
token ITEM { d+ | <TOP> }
}
# Define how to convert list elements to a suitable data structure...
class TREE {
method TOP ($/) { make [ $<ITEM>».ast ] }
method ITEM ($/) { make $<TOP>.ast // +$/ }
}
# Parse, extracting the data structure...
my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast;
# Show what we got...
say $parse_tree.perl;
Perl6 nested regex
#! /usr/bin/env perl6
use v6;
'[1,2,[3,3,[4,4]],5]'
~~ /'[' [ (d+) | $<0>=<~~> ]* % ',' ']' /;
say $/;
Perl6 nested regex
#! /usr/bin/env perl6
use v6;
'[1,2,[3,3,[4,4]],5]'
~~ /'[' [ (d+) | $<0>=<~~> ]* % ',' ']' /;
say $/;
In Perl6
Regexen are saner.
In Perl6
Regexen are saner.
Grammars offer cleaner code.
In Perl6
Regexen are saner.
Grammars offer cleaner code.
Smart matching works.
In Perl6
Regexen are saner.
Grammars offer cleaner code.
Smart matching works.
Objects have useful methods.
In Perl6
Regexen are saner.
Grammars offer cleaner code.
Smart matching works.
Objects have useful methods.
It's new.
In Perl6
Regexen are saner.
Grammars offer cleaner code.
Smart matching works.
Objects have useful methods.
It's worth learning.

Perl6 Regexen: Reduce the line noise in your code.

  • 1.
    Perl6 Regexen: Reducing linenoise in your code. Steven Lembark Workhorse Computing lembark@wrkhors.com
  • 2.
    The difference … preferreadabity over compactness. – Larry Wall Regexes you can read. Code you can maintain. Do what I mean need.
  • 3.
    What is linenoise? Random garbage on the screen. Due to signal noise. Eliminted by error correcting modems.
  • 4.
    More effective useof Regex Daisy-chaining tokens. RX-from-hell with alternations. Q: Why? A: Parsing.
  • 5.
    What we hadto do. Without grammars. Parsing order implicit in RX, code. Branching logic on matched tokens. Nested-if in for loops.
  • 6.
    Parse *.ini file my$header = qr{ ^ h* [ (?<name> [^][]+ ) ] h* $ }xm; my $property = qr{ ^ h* (?<key> .+? ) h* = h* (?<value> .+ ) $ }xm; my $comment = qr{ ^ h* # }xm; my $empty_line = qr{ ^ h* $}xm; Regexen tokenize input.
  • 7.
    Parse *.ini for my$nextline (readline($INI_FILE)) { # If it's a header line, start a new section... if ($nextline =~ /$header/) { $section = $config{ $+{name} } //= {}; } # If it's a property, add the key and value to the current section... elsif ($nextline =~ /$property/) { $section->{ $+{key} } = $+{value}; } # Ignore comments or empty lines elsif ($nextline =~ /$comment|$empty_line/) { # Do nothing } # Report anything else as a probable error... else { warn "Invalid data in INI file at line $.n" . "t$nextlinen"; } } Code processes it.
  • 8.
    Maintainable? Inter-related order ofcode and rx. Code changes affect rx? Rx changes affect code? Find out: Try it and see...
  • 9.
    Perl6: Grammars Structure inone place. Declarative. No iterative code.
  • 10.
    Tokens & Structure grammarINI { token TOP { <section>* } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { h* $<name>=N+? h* '=' h* $<value>=N+ n } token comment { ^^ h* '#' N* n } token emptylines { [ ^^ h* n ]+ } }
  • 11.
    Tokens & Structure grammarINI { token TOP { <section>* } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { h* $<name>=N+? h* '=' h* $<value>=N+ n } token comment { ^^ h* '#' N* n } token emptylines { [ ^^ h* n ]+ } }
  • 12.
    Tokens & Structure grammarINI { token TOP { <section>* } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { h* $<name>=N+? h* '=' h* $<value>=N+ n } token comment { ^^ h* '#' N* n } token emptylines { [ ^^ h* n ]+ } }
  • 13.
    Process the content classINI::hash_builder { method TOP ($/) { make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  • 14.
    Process the content classINI::hash_builder { method TOP ($/) { make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  • 15.
    Process the content classINI::hash_builder { method TOP ($/) { make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  • 16.
    Process the content classINI::hash_builder { method TOP ($/) { make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  • 17.
    Process the content classINI::hash_builder { method TOP ($/) { make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  • 18.
    Care to guesswhat this does? /{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
  • 19.
    Care to guesswhat this does? Q: Which char's match themselves? /{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
  • 20.
    Care to guesswhat this does? Q: Which char's match themselves? A: None, in Perl6, since they are punctuation. /{~}!@#$%^(&*)-+=[/]:;"'<.,>?/
  • 21.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
  • 22.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x Perl 6: / :s '[' [d+]* % ',' ']' /
  • 23.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x Perl 6: / :s '[' [d+]* % ',' ']' / Consume WS
  • 24.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x Perl 6: / :s '[' [d+]* % ',' ']' / Literals
  • 25.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x Perl 6: / :s '[' [d+]* % ',' ']' / Non-capturing match
  • 26.
    Saner metachars Match integersenclosed in braces. Perl 5: / [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x Perl 6: / :s '[' [d+]* % ',' ']' / Separator
  • 27.
    Aside: Future speed Perl6regexen execute as DFA where possible. Opportunity for much faster execution. For some definition of ...
  • 28.
    Aside: Future speed Perl6regexen execute as DFA where possible. Also benefit from saner results. DFA more likely to Do What I Mean. Alternations match best, not first match.
  • 29.
    Smartest matching Perl 5: say$& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/;
  • 30.
    Smartest matching Perl 5: say$& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/;
  • 31.
    Smartest matching Perl 5: say$& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  • 32.
    Smartest matching Perl 5: say$& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  • 33.
    Smartest matching Perl 5: say$& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  • 34.
    Perl5 nested structure #!/usr/bin/env perl use 5.010; # We're going to need this to extract hierarchical data structures... our @stack = []; my $LIST = qr{ # Match this... (?&NESTED) # Which is defined as... (?(DEFINE) (?<NESTED> # Keep track of recursions on a stack... (?{ local @::stack = (@::stack, []); }) # Match a list of items... [ s* (?> (?&ITEM) (?: s* , s* (?&ITEM) )*+ )? s* ] # Pop the stack and add that frame to the growing data structure... (?{ local @::stack = @::stack; my $nested = pop @stack; push @{$::stack[-1]}, $nested; }) ) # For each item, push it onto the stack if it's a leaf node... (?<ITEM> (d+) (?{ push @{$stack[-1]}, $^N }) | (?&NESTED) ) ) }x; # Match, extracting a data structure... '[1,2,[3,3,[4,4]],5]' =~ /$LIST/; # Retrieve the data structure... my $parse_tree = pop @stack; # Show it... use Data::Dumper 'Dumper'; say Dumper($parse_tree);
  • 35.
    Perl6 nested grammar #!/usr/bin/env perl6 use v6; # Define the structure of a list... grammar LIST { rule TOP { '[' <ITEM>* % ',' ']' } token ITEM { d+ | <TOP> } } # Define how to convert list elements to a suitable data structure... class TREE { method TOP ($/) { make [ $<ITEM>».ast ] } method ITEM ($/) { make $<TOP>.ast // +$/ } } # Parse, extracting the data structure... my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast; # Show what we got... say $parse_tree.perl;
  • 36.
    Perl6 nested regex #!/usr/bin/env perl6 use v6; '[1,2,[3,3,[4,4]],5]' ~~ /'[' [ (d+) | $<0>=<~~> ]* % ',' ']' /; say $/;
  • 37.
    Perl6 nested regex #!/usr/bin/env perl6 use v6; '[1,2,[3,3,[4,4]],5]' ~~ /'[' [ (d+) | $<0>=<~~> ]* % ',' ']' /; say $/;
  • 38.
  • 39.
    In Perl6 Regexen aresaner. Grammars offer cleaner code.
  • 40.
    In Perl6 Regexen aresaner. Grammars offer cleaner code. Smart matching works.
  • 41.
    In Perl6 Regexen aresaner. Grammars offer cleaner code. Smart matching works. Objects have useful methods.
  • 42.
    In Perl6 Regexen aresaner. Grammars offer cleaner code. Smart matching works. Objects have useful methods. It's new.
  • 43.
    In Perl6 Regexen aresaner. Grammars offer cleaner code. Smart matching works. Objects have useful methods. It's worth learning.