Splitting 1 big CSV file into multiple smaller without parsing it

How to turn a 300mb csv into 3x100mb ?
Cut and slice with head/tail and add the header on top!
Code

require 'rake' # for `sh` helper

# split giga-csv into n smaller files
def split_csv(original, file_count)
  header_lines = 1
  lines = Integer(`cat #{original} | wc -l`) - header_lines
  lines_per_file = (lines / file_count.to_f).ceil + header_lines
  header = `head -n #{header_lines} #{original}`

  start = header_lines
  file_count.times.map do |i|
    finish = start + lines_per_file
    file = "#{original}-#{i}.csv"

    File.write(file, header)
    sh "tail -n #{lines - start} #{original} | head -n #{lines_per_file} >> #{file}"

    start = finish
    file
  end
end

Ruby String Naive Split because split is to clever

Problem

"aaa".split('a') == []
"aaa".split('a').join('a') == ""

Standard split is often ‘clever’, but not logical and not symmetric to join. To fix this here is a naive alternative that behaves ‘dumb’ but logical.

Solution

class String
  # https://grosser.it/2011/08/28/ruby-string-naive-split-because-split-is-to-clever/
  # "    ".split(' ') == []
  # "    ".naive_split(' ') == ['','','','']
  # "".split(' ') == []
  # "".naive_split(' ') == ['']
  def naive_split(pattern)
    pattern = /#{Regexp.escape(pattern)}/ unless pattern.is_a?(Regexp)
    result = split(pattern, -1)
    result.empty? ? [''] : result
  end
end

Ruby Hash leaves (leafs)

Get all leaves of a Hash (like recursive values).

Usage
{:x => 1, :y => {:z => 2}}.leaves == [1,2]

Code

class Hash
  # {'x'=>{'y'=>{'z'=>1,'a'=>2}}}.leaves == [1,2]
  def leaves
    leaves = []

    each_value do |value|
      value.is_a?(Hash) ? value.leaves.each{|l| leaves << l } : leaves << value
    end

    leaves
  end
end

Ruby Array.diff(other) difference between 2 Arrays

Diff is defined on Set, but not on Array, so we patch it in… (thanks to reto)
Usage
[1,2] ^ [2,3,4] == [1,3,4]

Code

class Array
  def ^(other)
    result = dup
    other.each{|e| result.include?(e) ? result.delete(e) : result.push(e) }
    result
  end unless method_defined?(:^)
  alias diff ^ unless method_defined?(:diff)
end

puts ([] ^ [1]).inspect          # [1]
puts ([1] ^ []).inspect          # [1]
puts ([1] ^ [2]).inspect         # [1,2]
puts ([] ^ []).inspect           # []
puts ([1,1] ^ [1,1,2,2]).inspect # [1]

The same could be done with (self | other) – (self & other) but would be less performant.