Converting Rails apps from MySQL to PostgreSQL 4

Posted by Tom Copeland Wed, 20 Oct 2010 20:06:00 GMT

My roving Google bots turn up various notes on folks moving from MySQL to PostgreSQL. Here are a few accumulated entries:

  • Yangthaman describes the process of converting his app's database from MySQL to PostgreSQL. He notes that 'Postgres does not allow option for “text” data type' - in Rails 3 with the latest pg gem this works fine, though. He cites Heroku's PostgreSQL requirement as a motivator - this line of reasoning turned up a couple of times.

  • A detailed post from Mike Castleman recommends ensuring that config.active_record.schema_format is set to :ruby and has a pointer to SQL Fairy as a solution for complex schemas. He notes differences in quoting styles (PostgreSQL uses double quotes) and reminds the reader of ActiveRecord's quote_column_name, e.g., Book.connection.quote_column_name('when') #=> "\"when\"". He notes that PostgreSQL has a native boolean type, so if code was dependent on MySQL's tinyint columns it will need to be cleaned up. He mentions a few other changes they needed to make; definitely worth a read. Finally, they open sourced their data conversion rake task; good stuff!

  • minlev converted his database using Michael Siebert's AR-DBCopy. This utility creates ActiveRecord classes on the fly to copy records from one AR connection to another. Mindlev noted that this took 5 minutes for 100K records, so it's not something for anything other than smallish databases... but, it has its place.

  • Mike Williamson explores some PostgreSQL interactions using the various command line tools - dropdb, psql, pg_dump, and so forth. He also notes the PostgreSQL native boolean type vs MySQL's tinyint.

  • StackOverflow has a long article with lots of resources on moving from MySQL to PostgreSQL. It includes a pointer to the PostgreSQL wiki page on switching from MySQL, which contains a variety of links to articles on the topic. There's also a less detailed article on more or less the same thing on StackOverflow here.

  • miles wrote up his notes on converting a MySQL typo blog instance to PostgreSQL. The main hitch is the tinyint to boolean conversion, but he has the complete details of how he handled that.

  • Adam Wiggins (of Heroku fame) describes using taps to transfer data from a MySQL database to a PostgreSQL database. This tool results in about the same transfer speeds as AR-DBCopy, so it's not suitable for hefty databases... but, still. The taps git repo has had a bit of activity lately, so looks like that project is still going.

  • Harold Giménez wrote up a general overview of PostgreSQL's goodness, noting a variety of features like table partitioning, partial indexes, MVCC, tsearch2, PostGIS, and so on. He notes that replication is getting better soon... and since PostgreSQL 9.0 is out, soon is now here.

In summary, thanks to Heroku for providing a nudge for folks to switch to PostgreSQL!

Reducing log output from the Rails PostgreSQL connection adapter 4

Posted by Tom Copeland Wed, 06 Oct 2010 20:36:00 GMT

When you write a Rails app that uses PostgreSQL you'll probably see your development log filled with log entries like this:

 SQL (0.5ms)   SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull
 FROM pg_attribute a LEFT JOIN pg_attrdef d
 ON a.attrelid = d.adrelid AND a.attnum = d.adnum
 WHERE a.attrelid = '"reading_lists"'::regclass
 AND a.attnum > 0 AND NOT a.attisdropped
 ORDER BY a.attnum

This is all boilerplate stuff and definitely not part of your app. So, how to get rid of it? Just use Evgeniy Dolzhenko's silent-postgres gem. Put this in your Rails 3 Gemfile (or add a config.gem entry in your environment.rb file for Rails 2):

group :development, :test do 
  gem 'silent-postgres'
end

And that's it. Cleaner logs, huzzah!

The gem itself is only about 60 lines of code. It creates an initializer that includes itself into ActiveRecord::ConnectionAdapters::PostgreSQLAdapter and uses alias_method_chain to silence output from a couple of methods - tables, pk_and_sequence_for, and a few others. Good stuff!

Rails 3, bundler, and the pg gem 3

Posted by Tom Copeland Thu, 16 Sep 2010 21:54:00 GMT

I was moving a Rails 2 app up to Rails 3 and, because I have PostgreSQL installed in a non-standard location on my server, I ran into problems when bundler was trying to install the pg gem. After fiddling about for a bit I ended up with this in my config/deploy.rb:

require 'bundler/capistrano'

task :set_config_for_pg_gem, :roles => [:app, :db] do
  run "cd #{current_path} && bundle config build.pg --with-pg-config=/usr/local/pgsql/bin/pg_config --no-rdoc --no-ri"
end

before "bundle:install", :set_config_for_pg_gem

This sets up the appropriate command line flags for the pg gem so that they're in place when Capistrano runs the bundle:install task. The --no-rdoc --no-ri part isn't necessary, but I figured it'll save a second or two. Note that these flags end up in the deploy user's home directory on the server:

$ cat ~/.bundle/config 
--- 
BUNDLE_BUILD__PG: --with-pg-config=/usr/local/pgsql/bin/pg_config --no-rdoc --no-ri

Running this task every time you deploy is a little wasteful since it sets the configuration unnecessarily - really you just need it before the first time you deploy. So you could optimize things by touching a file in shared/system/ or some such and checking it as part of this task.

Rails and PostgreSQL job at Paperless Post 3

Posted by Tom Copeland Tue, 10 Aug 2010 13:17:00 GMT

Another Rails and PostgreSQL job, this time at Paperless Post (via the Github jobs board). Looks like they have a few more open positions. And you'd work with Aaron Quint, who's their CTO. Good times!

Rails and PostgreSQL job at elevenlearning.com 1

Posted by Tom Copeland Fri, 06 Aug 2010 09:33:00 GMT

PGCon 2010 talk on Rails and PostgreSQL 4

Posted by Tom Copeland Tue, 03 Aug 2010 17:04:00 GMT

A while back I posted a link to a talk by Gleb Arshinov that he gave at the SF PUG. This talk was on "PostgreSQL for high performance Rails apps", and was full of fine suggestions from their experiences with their Rails apps.

Gleb is back again, this time on May 21 2010 at PGCon where he and Alexander Dymo talked about PostgreSQL as a secret weapon for Rails apps. Some of the same ground is covered (use SQL DDL vs ActiveRecord create_table, etc), but there's lots of new information too. Here are some notes:


  • 1:10 They're using PostgreSQL 8.4, nginx, and mongrel

  • 4:00-6:00 Talks about dropping down into SQL via ActiveRecord

  • 6:30 Use include to eliminate N+1 queries.

  • 7:30 Watch for things like acts_as_tree that reintroduce lots of queries in exchange for the improvement in abstraction.

  • 9:00 One query, 12 joins - complicated, but query time goes from 8 seconds to 60 ms.

  • 14:00-17:00 A technique for recording SQL queries; this helps ensure you're not running unexpected queries

  • 19:00 Suggests use straight SQL for DDL rather than the ActiveRecord DSL

  • 20:00 Use constraints, FKs, etc to preserve data integrity - "anything you don't have a constraint on will get corrupted"

  • 23:00 Don't use CASCADE since app won't know about the deletions

  • 28:00 Keep a log of times for the most frequent user requests. Alex suggests using integration tests for this; code is at 29:10 and 29:30.

  • 32:30 A technique for loading data with ActiveRecord's select option with PostgreSQL arrays to save on object creation. Questions from the audience about normalization vs efficiency.

  • 38:50 Role/user/privilege checking can be slow; shows a technique for using PostgreSQL's bool_or and GROUP BY to get the data in one fell swoop. Query time went from 2+ seconds to 64 ms.

  • 42:00 Do analytics in the database. Saw speed improve from 90s to 5s and saved tons of RAM.

  • 44:40 Some excellent new PostgreSQL features that are either here now or are on the way (replication, windowing functions)

  • 46:30 Demonstrates a problem with PostgreSQL's LIMIT and OFFSET when used with subselects. Some discussion of pagination with the audience. Here's an excellent discussion of pagination alternatives written by Justin French.

  • 50:30 How to force PostgreSQL to use a subselect vs a join; the example goes from 605ms to 325 ms.

  • 52:20 Be careful with generate_series. Apparently these functions cannot generate hints for the planner.

  • 55:30 General props to PostgreSQL community.

  • 59:40 Need to test queries both in cold state and hot state; they saw 14x speed difference.

  • 1:01:40 Tune PostgreSQL - shared_buffers, work_mem, autovacuum, etc. Rely on community knowledge for initial configuration.

Lots of good stuff there, enjoy!

Pivotal Labs Talk - Scaling a Rails App with Postgres 5

Posted by Tom Copeland Fri, 23 Jul 2010 21:01:00 GMT

I'm slowly catching up with my podcast backlog and came across a Pivotal Labs talk from May 2009. In this talk Josh Susser and Damon McCormick are presenting on Scaling a Rails App with Postgres . It's a little dated now - this talk was given was when PostgreSQL 8.4 was in beta - but, still, lots of good stuff. Here are some notes:
  • They started with an existing Rails app with lots of data, so they had some constraints - not greenfield development.

  • Around the 5-6 minute mark there's a good discussion of PostgreSQL's query optimizer and how it analyzes a table's data distribution. One takeaway (mentioned around 16:20) is to run vacuum more often on a particular table if there are a lot of writes.

  • 10:00 How to set STATISTICS for a particular table.

  • 11:00 Using partial indexes.

  • 14:00 Indexing on expressions.

  • 18:10-23:00 A nice discussion of the EXPLAIN output.

  • 23:45 Here they talk about wide columns. I've seen this in MySQL as well, where splitting text data out into a separate table yielded some good speedups.

  • 26:10 Some discussion of pg_bench.

  • 30:20 Discusses the PostgreSQL log analyzer pgFouine.

  • 35:30 How long does it take to add an index to large tables? They saw times of up to an hour for tables with millions of rows.

  • 36:30 clustering your data in order to get PostgreSQL to write it more efficiently.

  • 37:30-48:00 A thorough discussion of partitioning tables via table inheritance. They used an ActiveRecord model (39:23) with a bunch of utility methods. They also had a cron to periodically create new partitions. At 45:15 they make a nice distinction between using partial indexes and partitions - one advantage is that a partition's indexes can be different than its parents indexes. At 49:00 they mention maybe doing a plugin, not sure if that happened.

  • 52:00 Some discussion of full text search via tsearch.

  • 53:00 PostgreSQL's lack of built in replication outside of WAL shipping, Slony, etc. Thank goodness 9.0 will address this!

  • 54:00 Some props to Engine Yard on their PostgreSQL support.

Good stuff all around, and thanks to Pivotal for posting these great talks!

Intro to PostGIS 2

Posted by Doug Cole Thu, 05 Nov 2009 01:25:00 GMT

If you are planning on building a Rails application that uses spatial data in any way, then you owe it to yourself to take the time to investigate Postgis. Out of the box you’ll be able to perform an array of powerful functions on your spatial data: from bounding boxes and distance queries to polygon area calculations.

Installation

I won’t go into setup in too much detail, under most linux distributions postgis is a simple package install, it’s in macports if you’re a mac user, so either way installation shouldn’t be too hard. For integration with rails I recommend the GeoRuby gem and spatial_adapter. From the docs: “GeoRuby provides data types intended to hold data returned from PostGIS…”. spatial adapter is a rails plugin that extends ActiveRecord so that it understands geometric columns, transparently converting them to subclasses of the GeoRuby::Geometry class, supporting geometry columns in migrations, etc. spatial_adapter is hosted on github: http://github.com/fragility/spatial_adapter.

Data Storage

Postgis supports lots of datatypes. For the purpose of this blog post we’ll focus on just using points, all other datatypes are composed of points and most of the concepts are easily applied to the other datatypes. Points are made up of at least three points of data: x, y, and Spatial Reference System Identifier (SRID). SRIDs are used to describe the coordinate system used by the point. My gis background is poor so I’ll just suggest that if you are planning on working in lat/long values, SRID 4269 is probably the right choice for you (or maybe 4326, see the comments). The one downside of using lat/long data is it makes distance queries more difficult, 1 degree isn’t a fixed number of meters so you can’t directly ask the database for all rows within 100 meters of a given point, or at least not without having the database run a sequential scan of all rows. If people are interested in distance queries I’m happy to write about the subject.

Queries

The most basic query is the bounding box - which geometries lay within the given box? It is defined in postgis as && and is able to use spatial indexes directly so is very fast. Beyond that most queries take the form of a function and are well documented in the PostGIS documentation, but enough talk - let’s start an example.

Example

Let’s assume we have a table of restaurants that we want to display on a map. First let’s add a geometry column to the restaurants table, with spatial_adapter it is a simple migration:

add_column :restaurants, :the_geom, :point, :srid => 4269
add_index :restaurants, :the_geom, :spatial => true

If you open up a psql console and look at the table definition you will see that this adds the column the_geom with the type the_geom as well as adding three new table constraints: srid=4269, number of dimensions is 2 (postgis supports more), and the geometry type is a point. Handy! You can see we’ve also added an index, spatial_adapter adds the :spatial option to indexes to specify simplify the creation of spatial indexes.

Now to add points let’s open up a console and add geometry data to our restaurants:

r = Resaurant.first r.the_geom = Point.from_x_y(-122.39, 47.5123, 4269) r.save!

Of course in real life you aren’t going to just make up data, you’ll likely want to use a geocoding service like google’s geocoding api to determine the correct lat/long information. You’ll probably also want to store your chosen SRID in a constant somewhere, but now I’m nitpicking - let’s looks query our new data.

Restaurant.first(:conditions => ["the_geom && ?", Polygon.from_coordinates([[[x_min, y_min], [x_min, y_max], [x_max, y_max], [x_max, y_min], [x_min, y_min]]], 4269)])

Simple! Hope this helps show the basics of using PostGIS with rails. This post just barely scrapes the surface, luckily PostGIS and GeoRuby have excellent documentation, but if you have any more questions don’t hesitate to ask in the comments and I’ll try and help.

Doug Cole is the CTO of www.estately.com a real estate search website. Interested in working with Estately? Let us know!

Rails and PostgreSQL job in Honolulu

Posted by Tom Copeland Tue, 03 Nov 2009 18:20:00 GMT

Saw this advertisement for a Ruby/Rails/PostgreSQL job in Honolulu today. It's for eggup.com, which unfortunately is completely protected by a HTTP basic authentication challenge. First thing to do if you get that job - put up a nice "coming soon" page!

RailsOnPg by Alexander Tretyakov

Posted by Tom Copeland Fri, 23 Oct 2009 00:39:00 GMT

Thanks to Robby on Rails I heard about Alexander Tretyakov's interesting RailsOnPg plugin. This plugin makes it a bit easier to create PostgreSQL functions, triggers, views, and foreign keys by providing a nicer front end to calls to ActiveRecord::Base.connection.execute.

For example, here's a migration to add a foreign key to a Comment model that belongs to a User:

class AddForeignKeyFromCommentsToUsers < ActiveRecord::Migration 
  def self.up
    add_foreign_key :comments, :user_id, :users
  end
  def self.down
    remove_foreign_key :comments, :user_id, :users
  end
end

This results in the following SQL:

ALTER TABLE comments
                 ADD CONSTRAINT fk_comments_user_id
                 FOREIGN KEY (user_id)
                 REFERENCES users(id) 
                 ON UPDATE NO ACTION 
                 ON DELETE NO ACTION 

As you can see, this provides some sensible defaults and a consistent naming scheme so that you can reliably roll back a migration that created a foreign key.

I ran into some problems when creating a function; my migration failed with a PGError. Turns out that the plugin attempts to execute CREATE LANGUAGE plpgsql before it creates a function; in my case that language was already in place. I commented out line 16 of railsonpg/lib/functions.rb (the call to setlang) and everything worked fine. It looks like this need for a CREATE LANGUAGE IF NOT EXISTS (or something) has come up before, but I'm not sure what the status is. I'm using PostgreSQL 8.4.1 and that statement doesn't seem to be supported.

At any rate, this looks like a handy plugin that could remove a lot of raw SQL from your migrations. Good stuff!