Importing to Postgresql from ActiveRecord and Rails 4

Posted by scientific on February 16, 2007


I use Postgresql and Rails. I use Postgres because I like a number of features more than MySQL – it has GiST indexes, and its SQL interpreter feels more robust to me – there are some valid SQL statements that don’t work in MySQL that do work in Postgres.

I have a need to load large delimited text files into my database periodically. There are many ways to do this slowly but only one way to do it quickly: “COPY FROM” – this command makes loading data into Postgres almost unbelievably fast. Try it! This article provides code for using the COPY FROM command with Rails and ActiveRecord..
Essentially, in order to use COPY FROM you either need admin rights on Postgres or you have to feed the data to Pg from STDIN (as opposed to loading it from a file). The following code shows how to load a file and feed it to Pg as STDIN over an ActiveRecord raw_connection. Because I’m using raw_connection, this will only work for Postgres but there are probably equivalent ways to accomplish this for MySQL and other databases as well. Happy file loading!

module TableLoader
    class CoreERR_SQLError < StandardError; end
    def TableLoader.copy_from_file_to_table(table_name, field_names, import_file_path, options = {})
      delete_existing_data = options[:delete_existing_data] || false
      sql_parameters = options[:sql_parameters] || ''
      skip_header_row = options[:skip_header_row] || false
      File::open(import_file_path, "r") do |import_file|
        # eat first line if there is a header row
        import_file::gets if skip_header_row
        # CUSTOMSQL create dynamic import sql code using 'COPY FROM' sql statement
        import_sql = ''
        import_sql += "TRUNCATE TABLE #{table_name}; " if delete_existing_data
        import_sql += "COPY #{table_name} "
        import_sql += "(#{field_names})"
        import_sql += "FROM STDIN #{sql_parameters}; "
        # exit if there are no data rows to process
        if import_file.eof? : return false; end
        # CUSTOMSQL we obtain a raw PGconn SQL connection to the database so we can ship data
        # directly to STDIN on the Postgres SQL connection
        raw_conn = ActiveRecord::Base.connection.raw_connection
        # execute the import SQL, which will leave the connection open so we can ship
        # raw records directly to STDIN on the server, via PGconn.putline command (below)
        raw_conn.exec(import_sql)

        # write all data rows to sql server - we write one line at a time to make debugging easier
        loop = TableLoader::get_line_from_file(import_file)
        while !loop.blank?
          raw_conn.putline(loop) if !loop.blank?
          loop = TableLoader::get_line_from_file(import_file)
        end
        # this alternative writes the entire file to disk at once
        # raw_conn.putline(import_file::gets(nil))
        raw_conn.putline(".\n") # send Postgres EOF signal: .<new_line>
        raw_conn.endcopy # tell the driver we are done sending data
        if raw_conn.status != 0
          raise CoreERR_SQLError, "SQL Server reports error code of #{conn.status}. Status code should have been 0"
        end
      end  # File::open
    end

    # return each line with the newline value for the platform in question
    # we strip any newlines from the end of each line and replace them with
    # Ruby "\n" which should be platform specific
    def TableLoader::get_line_from_file(file_handle)
      retval = file_handle.gets
      if retval
        retval.chomp!
        retval += "\n"
      end
    end
end
Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. Ben W Thu, 20 Mar 2008 13:07:24 UTC

    Highly useful!

    Thanks for posting.

    (Now if only my parsing code didn’t suck!)

  2. Paolo Montrasio Mon, 26 May 2008 04:46:55 UTC

    I didn’t know about raw_connection and putline/endcopy. I was struggling against COPY from within Rails and it saved me a lot of time. Thanks!

  3. Keith Carter Fri, 04 Jul 2008 12:01:40 UTC

    Putline no longer exists in rails 2.1. Any one know what to do?

  4. science Fri, 04 Jul 2008 18:01:04 UTC

    Heya Keith,

    Putline isn’t part of the ActiveRecord spec for Postgres but the “C” version of the adapter still supports it:

    http://ruby.scripting.ca/postgres/rdoc/

    Other Pg Ruby adapters may support it also, not sure. Let me know if you can’t get it working. -S

Comments