Ruby REGEX parser

Question

Can someone have a look at the below code and tell me whether this is truly the correct way to go about parsing text after the ":" sign.

require 'yaml'

the_file = ARGV[0]
f =  File.open(the_file)
content = f.read
r = Regexp.new(/((?=:).+)/)
emails = content.scan(r).uniq
puts YAML.dump(emails)

This script parses email addresses from text files to clean out junk. TEXT:email_address.

I'm trying to make my scripts a bit more efficient. So all my ruby/regex scripts look the same, only with different regex patterns. I wrote them in ruby by cutting an dpasting here and there, and because I have ruby on the majority of my servers, so it's easier to run any script anywhere.

Any help would be appreciated.

Kyle · Accepted Answer · 2014-02-13 14:57:36Z

3

If you truly just want text after the first :, I would not use a Regex. I would use String#split

lines = File.readlines(the_file)
emails = lines.map { |line| line.split(':', 2).last }.uniq

answered Feb 13, 2014 at 14:57

Kyle

22.3k2 gold badges63 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hirolau · Accepted Answer · 2014-02-13 15:29:11Z

1

If you only want valid emails, I would just search for a regexp that captures emails:

email_regexp = /[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}/
puts YAML.dump(
  File.read(ARGV[0]).scan(email_regexp)
)

answered Feb 13, 2014 at 15:29

hirolau

14k9 gold badges39 silver badges51 bronze badges

2 Comments

Toto Over a year ago

TLDs may have much more than 6 character length, see: data.iana.org/TLD/tlds-alpha-by-domain.txt

Toto Over a year ago

This regex produces a lot of false negative, see: en.wikipedia.org/wiki/Email_address#Valid_email_addresses

SciPhi · Accepted Answer · 2014-02-13 15:06:49Z

0

If you know the colon is the left delimiter before the email, and a close paren on the right, then you can just use

:(.+[^)])

as your regex to extract whatever is in between. There are some very specific email-matching regexen out there though, which may be more appropriate (for when the source text is less 'regular')

answered Feb 13, 2014 at 15:06

SciPhi

2,6752 gold badges19 silver badges20 bronze badges

Collectives™ on Stack Overflow

Ruby REGEX parser

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related