Ruby String Naive Split because split is to clever

Problem

"aaa".split('a') == []
"aaa".split('a').join('a') == ""

Standard split is often ‘clever’, but not logical and not symmetric to join. To fix this here is a naive alternative that behaves ‘dumb’ but logical.

Solution

class String
  # https://grosser.it/2011/08/28/ruby-string-naive-split-because-split-is-to-clever/
  # "    ".split(' ') == []
  # "    ".naive_split(' ') == ['','','','']
  # "".split(' ') == []
  # "".naive_split(' ') == ['']
  def naive_split(pattern)
    pattern = /#{Regexp.escape(pattern)}/ unless pattern.is_a?(Regexp)
    result = split(pattern, -1)
    result.empty? ? [''] : result
  end
end

5 thoughts on “Ruby String Naive Split because split is to clever”

Gregor Schmidt says:

2011-08-28 at 18:16:52

I’ve got a simpler implementation for you:

class String
def naive_split(pattern)
split(pattern, -1)
end
end

or in other words. It is already part of the standard library, though a bit hidden.

Reply
- pragmatig says:
  
  2011-08-29 at 6:44:02
  
  would not work:
  ” “.split(‘ ‘,-1) == [“”]
  
  Reply
Kieran P (@k776) says:

2011-08-28 at 20:09:17

Probably need to escape that regexp.

pattern = /#{Regexp.escape(pattern)}/ unless pattern.is_a?(Regexp)

If you don’t, I’m pretty sure this’ll break it:

“hello?”.naive_split(‘?’)

Reply
- pragmatig says:
  
  2011-08-29 at 6:45:31
  
  Tanks, just added it 🙂
  
  Reply
karatedog says:

2015-12-30 at 20:53:37

Both #split and #join understands empty separator, like (”). However #split eats the separator String, #scan does not, so either you #scan on ‘a’, or #split on an empty string, like ”:
> “aaa”.scan(‘a’)
=> [“a”, “a”, “a”]
OR
> “aaa”.split(”)
=> [“a”, “a”, “a”]
#join inserts new characters when joining, so .join(‘a’) will insert multiple ‘a’s into the string (which contradicts symmetry). But you can insert the empty string, so either way you have split, its reverse is .join(”) and not .join(‘a’).

Reply