I have a grouping of string variables that will be something like "height_low". I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height". Does someone have a solution for this? Thanks.
-
Wrong question. Thanks for the help but I posted all relevant information. If you have any ideas I would be glad to hear.Brent Moses– Brent Moses2014-02-25 13:50:18 +00:00Commented Feb 25, 2014 at 13:50
-
Yes I understood you. Thanks. I was just wondering why you commented on this question regarding another.Brent Moses– Brent Moses2014-02-25 13:53:44 +00:00Commented Feb 25, 2014 at 13:53
7 Answers
Try this:
strings.map! {|s| s.split('_').first}
3 Comments
strings being here an Array of Strings. Is important to mention that, because individual Strings work like Arrays, but don't respond to map.Shorter:
my_string.split('_').first
1 Comment
FWIW, solutions based on String#split perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:
string[0, string.index("_") || string.length]
Benchmark results (with number of underscores in parenthesis):
user system total real
String#split (0) 0.640000 0.000000 0.640000 ( 0.650323)
String#split (1) 0.760000 0.000000 0.760000 ( 0.759951)
String#split (9) 2.180000 0.010000 2.190000 ( 2.192356)
String#index (0) 0.610000 0.000000 0.610000 ( 0.625972)
String#index (1) 0.580000 0.010000 0.590000 ( 0.589463)
String#index (9) 0.600000 0.000000 0.600000 ( 0.605253)
Benchmarks:
strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]
Benchmark.bm(16) do |bm|
strings.each do |string|
bm.report("String#split (#{string.count("_")})") do
1000000.times { string.split("_").first }
end
end
strings.each do |string|
bm.report("String#index (#{string.count("_")})") do
1000000.times { string[0, string.index("_") || string.length] }
end
end
end
Comments
Try as below using str[regexp, capture] → new_str or nil:
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
strings.map { |s| s[/(.*?)_.*$/,1] }
1 Comment
'a_b_c'[/(.*)_.*$/, 1] # => "a_b". You can fix that using: 'a_b_c'[/(.*?)_.*$/, 1] # => "a" but anchoring and simplification will do as well: 'a_b_c'[/^(.*?)_/, 1] # => "a". When writing patterns, don't add more than necessary, otherwise you can cause the engine to waste a lot of CPU doing wasted effort.Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.
Consider this:
'a_b_c'[/^(.*?)_/, 1] # => "a"
It looks for only what you want, which is the text from the start of the string, up to _. Everything preceding _ is captured, and returned.
The alternates:
'a_b_c'.sub(/_.+$/, '') # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"
have to look backwards until the engine is sure there are no more _, then the string can be truncated.
Here's a little benchmark showing how that affects things:
require 'fruity'
compare do
string_capture { 'a_b_c'[/^(.*?)_/, 1] }
string_sub { 'a_b_c'.sub(/_.+$/, '') }
string_gsub { 'a_b_c'.gsub(/_.+$/, '') }
look_ahead { 'a_b_c'[/^.+?(?=_)/] }
string_index { 'a_b_c'[0, s.index("_") || s.length] }
end
# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1
Again, searching is going to be faster than any sort of replace, so think about what you're doing.
The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _, so if there's any question whether your string will, or will not, have _, then use the "string_index" method which will fall-back to using string.length to grab the entire string.