1

I have a grouping of string variables that will be something like "height_low". I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height". Does someone have a solution for this? Thanks.

2
  • Wrong question. Thanks for the help but I posted all relevant information. If you have any ideas I would be glad to hear. Commented Feb 25, 2014 at 13:50
  • Yes I understood you. Thanks. I was just wondering why you commented on this question regarding another. Commented Feb 25, 2014 at 13:53

7 Answers 7

3

Try this:

strings.map! {|s| s.split('_').first}
Sign up to request clarification or add additional context in comments.

3 Comments

@lipanski I'm guessing Linuxios saw "I have a grouping of string variables", and assumed Brent was talking about an Array.
strings being here an Array of Strings. Is important to mention that, because individual Strings work like Arrays, but don't respond to map.
Thanks. Your answer got me where I needed to be.
1

Shorter:

my_string.split('_').first

1 Comment

All very good answers but yours gets the check because it's the version I used. I actually was looping over a hash and needed to convert each key to a string and then cut off the end of each. I used x.to_s.split('_').first where x is the hash key to accomplish this. Thanks guys.
1

The unavoidable regex answer. (Assuming strings is an array of strings.)

strings.map! { |s| s[/^.+?(?=_)/] }

Comments

1

FWIW, solutions based on String#split perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:

string[0, string.index("_") || string.length]

Benchmark results (with number of underscores in parenthesis):

                       user     system      total        real
String#split (0)   0.640000   0.000000   0.640000 (  0.650323)
String#split (1)   0.760000   0.000000   0.760000 (  0.759951)
String#split (9)   2.180000   0.010000   2.190000 (  2.192356)
String#index (0)   0.610000   0.000000   0.610000 (  0.625972)
String#index (1)   0.580000   0.010000   0.590000 (  0.589463)
String#index (9)   0.600000   0.000000   0.600000 (  0.605253)

Benchmarks:

strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]

Benchmark.bm(16) do |bm|
    strings.each do |string|
        bm.report("String#split (#{string.count("_")})") do
            1000000.times { string.split("_").first }
        end
    end
    strings.each do |string|
        bm.report("String#index (#{string.count("_")})") do
            1000000.times { string[0, string.index("_") || string.length] }
        end
    end
end

Comments

1

Try as below using str[regexp, capture] → new_str or nil:

If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.

strings.map { |s|  s[/(.*?)_.*$/,1] }

1 Comment

Beware of greediness in the regex engine: 'a_b_c'[/(.*)_.*$/, 1] # => "a_b". You can fix that using: 'a_b_c'[/(.*?)_.*$/, 1] # => "a" but anchoring and simplification will do as well: 'a_b_c'[/^(.*?)_/, 1] # => "a". When writing patterns, don't add more than necessary, otherwise you can cause the engine to waste a lot of CPU doing wasted effort.
0

If you're looking for something "like gsub", why not just use gsub?

"height_low".gsub(/_.*$/, "") #=> "height"

In my opinion though, this is a bit cleaner:

"height_low".split('_').first #=> "height"

Another option is to use partition:

"height_low".partition("_").first #=> "height"

Comments

0

Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.

Consider this:

'a_b_c'[/^(.*?)_/, 1] # => "a"

It looks for only what you want, which is the text from the start of the string, up to _. Everything preceding _ is captured, and returned.

The alternates:

'a_b_c'.sub(/_.+$/, '')  # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"

have to look backwards until the engine is sure there are no more _, then the string can be truncated.

Here's a little benchmark showing how that affects things:

require 'fruity'

compare do
  string_capture { 'a_b_c'[/^(.*?)_/, 1]                }
  string_sub     { 'a_b_c'.sub(/_.+$/, '')              }
  string_gsub    { 'a_b_c'.gsub(/_.+$/, '')             }
  look_ahead     { 'a_b_c'[/^.+?(?=_)/]                 }
  string_index   { 'a_b_c'[0, s.index("_") || s.length] }
end

# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1

Again, searching is going to be faster than any sort of replace, so think about what you're doing.

The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _, so if there's any question whether your string will, or will not, have _, then use the "string_index" method which will fall-back to using string.length to grab the entire string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.