Manipulate string in ruby

Question

I have a grouping of string variables that will be something like "height_low". I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height". Does someone have a solution for this? Thanks.

Wrong question. Thanks for the help but I posted all relevant information. If you have any ideas I would be glad to hear. — Brent Moses
– Brent Moses, Commented Feb 25, 2014 at 13:50
Yes I understood you. Thanks. I was just wondering why you commented on this question regarding another. — Brent Moses
– Brent Moses, Commented Feb 25, 2014 at 13:53

Linuxios · Accepted Answer · 2014-01-20 20:11:56Z

3

Try this:

strings.map! {|s| s.split('_').first}

answered Jan 20, 2014 at 20:11

Linuxios

35.9k13 gold badges96 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ajedi32 Over a year ago

@lipanski I'm guessing Linuxios saw "I have a grouping of string variables", and assumed Brent was talking about an Array.

lipanski Over a year ago

strings being here an Array of Strings. Is important to mention that, because individual Strings work like Arrays, but don't respond to map.

Brent Moses Over a year ago

Thanks. Your answer got me where I needed to be.

lipanski · Accepted Answer · 2014-01-20 20:19:03Z

1

Shorter:

my_string.split('_').first

answered Jan 20, 2014 at 20:19

lipanski

1,76313 silver badges12 bronze badges

1 Comment

Brent Moses Over a year ago

All very good answers but yours gets the check because it's the version I used. I actually was looping over a hash and needed to convert each key to a string and then cut off the end of each. I used x.to_s.split('_').first where x is the hash key to accomplish this. Thanks guys.

Marcelo De Polli · Accepted Answer · 2014-01-20 20:23:17Z

1

The unavoidable regex answer. (Assuming strings is an array of strings.)

strings.map! { |s| s[/^.+?(?=_)/] }

answered Jan 20, 2014 at 20:23

Marcelo De Polli

29.4k4 gold badges41 silver badges47 bronze badges

Comments

jonahb · Accepted Answer · 2014-01-20 21:16:26Z

FWIW, solutions based on String#split perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:

string[0, string.index("_") || string.length]

Benchmark results (with number of underscores in parenthesis):

                       user     system      total        real
String#split (0)   0.640000   0.000000   0.640000 (  0.650323)
String#split (1)   0.760000   0.000000   0.760000 (  0.759951)
String#split (9)   2.180000   0.010000   2.190000 (  2.192356)
String#index (0)   0.610000   0.000000   0.610000 (  0.625972)
String#index (1)   0.580000   0.010000   0.590000 (  0.589463)
String#index (9)   0.600000   0.000000   0.600000 (  0.605253)

Benchmarks:

strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]

Benchmark.bm(16) do |bm|
    strings.each do |string|
        bm.report("String#split (#{string.count("_")})") do
            1000000.times { string.split("_").first }
        end
    end
    strings.each do |string|
        bm.report("String#index (#{string.count("_")})") do
            1000000.times { string[0, string.index("_") || string.length] }
        end
    end
end

Arup Rakshit · Accepted Answer · 2014-01-20 21:27:14Z

1

Try as below using str[regexp, capture] → new_str or nil:

If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.

strings.map { |s|  s[/(.*?)_.*$/,1] }

edited Jan 20, 2014 at 21:27

answered Jan 20, 2014 at 20:28

Arup Rakshit

119k30 gold badges270 silver badges328 bronze badges

1 Comment

the Tin Man Over a year ago

Beware of greediness in the regex engine: 'a_b_c'[/(.*)_.*$/, 1] # => "a_b". You can fix that using: 'a_b_c'[/(.*?)_.*$/, 1] # => "a" but anchoring and simplification will do as well: 'a_b_c'[/^(.*?)_/, 1] # => "a". When writing patterns, don't add more than necessary, otherwise you can cause the engine to waste a lot of CPU doing wasted effort.

Ajedi32 · Accepted Answer · 2014-01-20 20:29:51Z

0

If you're looking for something "like gsub", why not just use gsub?

"height_low".gsub(/_.*$/, "") #=> "height"

In my opinion though, this is a bit cleaner:

"height_low".split('_').first #=> "height"

Another option is to use partition:

"height_low".partition("_").first #=> "height"

edited Jan 20, 2014 at 20:29

answered Jan 20, 2014 at 20:22

Ajedi32

48.9k22 gold badges135 silver badges177 bronze badges

Comments

the Tin Man · Accepted Answer · 2014-01-20 23:37:36Z

Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.

Consider this:

'a_b_c'[/^(.*?)_/, 1] # => "a"

It looks for only what you want, which is the text from the start of the string, up to _. Everything preceding _ is captured, and returned.

The alternates:

'a_b_c'.sub(/_.+$/, '')  # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"

have to look backwards until the engine is sure there are no more _, then the string can be truncated.

Here's a little benchmark showing how that affects things:

require 'fruity'

compare do
  string_capture { 'a_b_c'[/^(.*?)_/, 1]                }
  string_sub     { 'a_b_c'.sub(/_.+$/, '')              }
  string_gsub    { 'a_b_c'.gsub(/_.+$/, '')             }
  look_ahead     { 'a_b_c'[/^.+?(?=_)/]                 }
  string_index   { 'a_b_c'[0, s.index("_") || s.length] }
end

# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1

Again, searching is going to be faster than any sort of replace, so think about what you're doing.

The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _, so if there's any question whether your string will, or will not, have _, then use the "string_index" method which will fall-back to using string.length to grab the entire string.

Collectives™ on Stack Overflow

Manipulate string in ruby

7 Answers 7

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related