0

Can someone have a look at the below code and tell me whether this is truly the correct way to go about parsing text after the ":" sign.

require 'yaml'

the_file = ARGV[0]
f =  File.open(the_file)
content = f.read
r = Regexp.new(/((?=:).+)/)
emails = content.scan(r).uniq
puts YAML.dump(emails)

This script parses email addresses from text files to clean out junk. TEXT:email_address.

I'm trying to make my scripts a bit more efficient. So all my ruby/regex scripts look the same, only with different regex patterns. I wrote them in ruby by cutting an dpasting here and there, and because I have ruby on the majority of my servers, so it's easier to run any script anywhere.

Any help would be appreciated.

3 Answers 3

3

If you truly just want text after the first :, I would not use a Regex. I would use String#split

lines = File.readlines(the_file)
emails = lines.map { |line| line.split(':', 2).last }.uniq
Sign up to request clarification or add additional context in comments.

Comments

1

If you only want valid emails, I would just search for a regexp that captures emails:

email_regexp = /[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}/
puts YAML.dump(
  File.read(ARGV[0]).scan(email_regexp)
)

2 Comments

TLDs may have much more than 6 character length, see: data.iana.org/TLD/tlds-alpha-by-domain.txt
This regex produces a lot of false negative, see: en.wikipedia.org/wiki/Email_address#Valid_email_addresses
0

If you know the colon is the left delimiter before the email, and a close paren on the right, then you can just use

:(.+[^)])

as your regex to extract whatever is in between. There are some very specific email-matching regexen out there though, which may be more appropriate (for when the source text is less 'regular')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.