Perl6 Regexen: Reduce the line noise in your code.

Perl6 Regexen:
Reducing line noise in your code.
Steven Lembark
Workhorse Computing
lembark@wrkhors.com

The difference
… prefer readabity over compactness.
– Larry Wall
Regexes you can read.
Code you can maintain.
Do what I mean need.

What is line noise?
Random garbage on the screen.
Due to signal noise.
Eliminted by error correcting modems.

More effective use of Regex
Daisy-chaining tokens.
RX-from-hell with alternations.
Q: Why?
A: Parsing.

What we had to do.
Without grammars.
Parsing order implicit in RX, code.
Branching logic on matched tokens.
Nested-if in for loops.

Parse *.ini file
my $header = qr{ ^ h* [ (?<name> [^][]+ ) ] h* $ }xm;
my $property = qr{ ^ h* (?<key> .+? ) h* = h* (?<value> .+ ) $ }xm;
my $comment = qr{ ^ h* # }xm;
my $empty_line = qr{ ^ h* $}xm;
Regexen tokenize input.

Parse *.ini
for my $nextline (readline($INI_FILE)) {
# If it's a header line, start a new section...
if ($nextline =~ /$header/) {
$section = $config{ $+{name} } //= {};
}
# If it's a property, add the key and value to the current section...
elsif ($nextline =~ /$property/) {
$section->{ $+{key} } = $+{value};
}
# Ignore comments or empty lines
elsif ($nextline =~ /$comment|$empty_line/) {
# Do nothing
}
# Report anything else as a probable error...
else {
warn "Invalid data in INI file at line $.n"
. "t$nextlinen";
}
}
Code processes it.

Maintainable?
Inter-related order of code and rx.
Code changes affect rx?
Rx changes affect code?
Find out: Try it and see...

Perl6: Grammars
Structure in one place.
Declarative.
No iterative code.

Tokens & Structure
grammar INI
{
token TOP { <section>* }
token section { [ ^ | <header> ] <block> }
token header { '[' $<ID> = <-[ [ ] n ]>+ ']' h* n }
token block { [ <property> | <.emptylines> | <.comment> ]* }
token property { h* $<name>=N+? h* '=' h* $<value>=N+ n }
token comment { ^^ h* '#' N* n }
token emptylines { [ ^^ h* n ]+ }
}

Process the content
class INI::hash_builder
{
method TOP ($/) { make %( $<section>».ast ) }
method section ($/) { make ~($<header><ID>//'') => $<block>.ast }
method block ($/) { make %( $<property>».ast ) }
method property ($/) { make ~$<name> => ~$<value> }
}
my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
say %config.perl;

Care to guess what this does?
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/

Q: Which char's match themselves?
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/

Q: Which char's match themselves?
A: None, in Perl6, since they are punctuation.
/{~}!@#$%^(&*)-+=[/]:;"'<.,>?/

Saner metachars
Match integers enclosed in braces.
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x

Saner metachars
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' /

Saner metachars
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Consume WS

Saner metachars
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Literals

Saner metachars
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Non-capturing match

Saner metachars
Perl 5:
/ [ s* (?: d+ (?: s* , s* d+ )* s* )? ] /x
Perl 6:
/ :s '[' [d+]* % ',' ']' / Separator

Aside: Future speed
Perl6 regexen execute as DFA where possible.
Opportunity for much faster execution.
For some definition of ...

Aside: Future speed
Perl6 regexen execute as DFA where possible.
Also benefit from saner results.
DFA more likely to Do What I Mean.
Alternations match best, not first match.

Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;

Smartest matching
Perl 5:
say $& if 'Which regex engine is smartest?'
=~ /smart|smarter|smartest/;
Perl 6:
say $/ if 'Which regex engine is smartest?'
~~ /smart|smarter|smartest/;

Perl5 nested structure
#! /usr/bin/env perl
use 5.010;
# We're going to need this to extract hierarchical data structures...
our @stack = [];
my $LIST = qr{
# Match this...
(?&NESTED)
# Which is defined as...
(?(DEFINE)
(?<NESTED>
# Keep track of recursions on a stack...
(?{ local @::stack = (@::stack, []); })
# Match a list of items...
[ s* (?>
(?&ITEM)
(?:
s* , s* (?&ITEM)
)*+
)? s*
]
# Pop the stack and add that frame to the growing data structure...
(?{ local @::stack = @::stack;
my $nested = pop @stack;
push @{$::stack[-1]}, $nested;
})
)
# For each item, push it onto the stack if it's a leaf node...
(?<ITEM>
(d+) (?{ push @{$stack[-1]}, $^N })
| (?&NESTED)
)
)
}x;
# Match, extracting a data structure...
'[1,2,[3,3,[4,4]],5]' =~ /$LIST/;
# Retrieve the data structure...
my $parse_tree = pop @stack;
# Show it...
use Data::Dumper 'Dumper';
say Dumper($parse_tree);

Perl6 nested grammar
#! /usr/bin/env perl6
use v6;
# Define the structure of a list...
grammar LIST {
rule TOP { '[' <ITEM>* % ',' ']' }
token ITEM { d+ | <TOP> }
}
# Define how to convert list elements to a suitable data structure...
class TREE {
method TOP ($/) { make [ $<ITEM>».ast ] }
method ITEM ($/) { make $<TOP>.ast // +$/ }
}
# Parse, extracting the data structure...
my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast;
# Show what we got...
say $parse_tree.perl;

Perl6 nested regex
#! /usr/bin/env perl6
use v6;
'[1,2,[3,3,[4,4]],5]'
~~ /'[' [ (d+) | $<0>=<~~> ]* % ',' ']' /;
say $/;

In Perl6
Regexen are saner.
Grammars offer cleaner code.

In Perl6
Regexen are saner.
Smart matching works.

In Perl6
Regexen are saner.
Objects have useful methods.

In Perl6
Regexen are saner.
It's new.

In Perl6
Regexen are saner.
It's worth learning.

Perl6 Regexen: Reduce the line noise in your code.

More Related Content

What's hot

Similar to Perl6 Regexen: Reduce the line noise in your code.

More from Workhorse Computing

Recently uploaded

Perl6 Regexen: Reduce the line noise in your code.