
I use Postgresql and Rails. I use Postgres because I like a number of features more than MySQL - it has GiST indexes, and its SQL interpreter feels more robust to me - there are some valid SQL statements that don’t work in MySQL that do work in Postgres.
I have a need to load large delimited text files into my database periodically. There are many ways to do this slowly but only one way to do it quickly: “COPY FROM” - this command makes loading data into Postgres almost unbelievably fast. Try it! This article provides code for using the COPY FROM command with Rails and ActiveRecord..
Essentially, in order to use COPY FROM you either need admin rights on Postgres or you have to feed the data to Pg from STDIN (as opposed to loading it from a file). The following code shows how to load a file and feed it to Pg as STDIN over an ActiveRecord raw_connection. Because I’m using raw_connection, this will only work for Postgres but there are probably equivalent ways to accomplish this for MySQL and other databases as well. Happy file loading!
module TableLoaderclass CoreERR_SQLError < StandardError; enddef TableLoader.copy_from_file_to_table(table_name, field_names, import_file_path, options = {}) delete_existing_data = options[:delete_existing_data] || false sql_parameters = options[:sql_parameters] || ” skip_header_row = options[:skip_header_row] || false File::open(import_file_path, “r”) do |import_file| # eat first line if there is a header row import_file::gets if skip_header_row # CUSTOMSQL create dynamic import sql code using ‘COPY FROM’ sql statement import_sql = ” import_sql += “TRUNCATE TABLE #{table_name}; ” if delete_existing_data import_sql += “COPY #{table_name} ” import_sql += “(#{field_names})” import_sql += “FROM STDIN #{sql_parameters}; ” # exit if there are no data rows to process if import_file.eof? : return false; end # CUSTOMSQL we obtain a raw PGconn SQL connection to the database so we can ship data # directly to STDIN on the Postgres SQL connection raw_conn = ActiveRecord::Base.connection.raw_connection # execute the import SQL, which will leave the connection open so we can ship # raw records directly to STDIN on the server, via PGconn.putline command (below) raw_conn.exec(import_sql) # write all data rows to sql server - we write one line at a time to make debugging easier loop = TableLoader::get_line_from_file(import_file) while !loop.blank? raw_conn.putline(loop) if !loop.blank? loop = TableLoader::get_line_from_file(import_file) end # this alternative writes the entire file to disk at once # raw_conn.putline(import_file::gets(nil)) raw_conn.putline(”.\n”) # send Postgres EOF signal: .<new_line> raw_conn.endcopy # tell the driver we are done sending data if raw_conn.status != 0 raise CoreERR_SQLError, “SQL Server reports error code of #{conn.status}. Status code should have been 0″ end end # File::open end # return each line with the newline value for the platform in question # we strip any newlines from the end of each line and replace them with # Ruby “\n” which should be platform specific def TableLoader::get_line_from_file(file_handle) retval = file_handle.gets if retval retval.chomp! retval += “\n” end end end
Highly useful!
Thanks for posting.
(Now if only my parsing code didn’t suck!)
I didn’t know about raw_connection and putline/endcopy. I was struggling against COPY from within Rails and it saved me a lot of time. Thanks!
Putline no longer exists in rails 2.1. Any one know what to do?
Heya Keith,
Putline isn’t part of the ActiveRecord spec for Postgres but the “C” version of the adapter still supports it:
http://ruby.scripting.ca/postgres/rdoc/
Other Pg Ruby adapters may support it also, not sure. Let me know if you can’t get it working. -S