Splitting 1 big CSV file into multiple smaller without parsing it

How to turn a 300mb csv into 3x100mb ?
Cut and slice with head/tail and add the header on top!
Code

require 'rake' # for `sh` helper

# split giga-csv into n smaller files
def split_csv(original, file_count)
  header_lines = 1
  lines = Integer(`cat #{original} | wc -l`) - header_lines
  lines_per_file = (lines / file_count.to_f).ceil + header_lines
  header = `head -n #{header_lines} #{original}`

  start = header_lines
  file_count.times.map do |i|
    finish = start + lines_per_file
    file = "#{original}-#{i}.csv"

    File.write(file, header)
    sh "tail -n #{lines - start} #{original} | head -n #{lines_per_file} >> #{file}"

    start = finish
    file
  end
end

2 thoughts on “Splitting 1 big CSV file into multiple smaller without parsing it

  1. I Like this script very much. Works fine for me. In the first file, i loose the first line. All other files and headers look fine, only first line from original is missing. Maybe because of the header? Anyway thanks for the help!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s