Splitting 1 big CSV file into multiple smaller without parsing it

How to turn a 300mb csv into 3x100mb ?
Cut and slice with head/tail and add the header on top!
Code

require 'rubygems'
require 'rake'

# split giga-csv into n smaller files
def self.split_csv(original, file_count)
  header_lines = 1
  lines = `cat #{original} | wc -l`.to_i - header_lines
  lines_per_file = (lines / file_count) + header_lines
  header = `head -n #{header_lines} #{original}`

  start = header_lines
  generated_files = []
  file_count.times do |i|
    finish = start + lines_per_file
    file = "#{original}-#{i}.csv"

    File.open(file,'w'){|f| f.write header }
    sh "tail -n #{lines - start} #{original} | head -n #{lines_per_file} >> #{file}"

    start = finish
    generated_files << file
  end

  generated_files
end
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s