Posts Tagged ruby

On Moving to WordPress from Pebble

Posted by on Tuesday, 1 December, 2009

So, you may have noticed that I recently migrated this blog from Pebble to WordPress. Actually, I migrated two former Pebble blogs, plus one former WordPress blog, to this new one. The WP install was simple, and so was the transfer of blog entries from one WP instance to another. Migrating from Pebble was a little more challenging, particularly since I had a lot of posts that I didn’t want to handle manually.

I started out by looking at the Pebble2Wordpress importer code, which is written in Ruby. (I first found a reference to this project from The Spritle Blog blog post, which has some more information about how it’s meant to be used.) I selected this project in particular because it’s written in a language I can fine tune as needed.

I’m not sure whether this project ever really worked as-is out of the source repository, but I can say for sure that it didn’t work for me using the instructions provided on the blog post. In an attempt to understand where the developer was coming from, and determine whether I could cobble together something to work for my own particular migration, I dug into the source. What I found seemed pretty overwrought and confusing, particularly for something that amounts to a simple migration script to morph XML documents into DB calls (not something that will ever attract a huge user base). After refactoring the code to provide minimal error handling, and to inline all class definitions into one ruby script alongside the code that uses these classes, I thought I had something understandable to start testing.

The general approach of the code is to create ActiveRecord representations for terms (read: categories), term-post relationships, posts, and comments. Once the database is connected, the script iterates through all *.xml files in the inputXML directory, and parses the Pebble XML format into Post/Comment/Category instances which are then saved into the WP database. During the parsing process, certain things like image URLs are massaged from the old Pebble layout to something more compatible with WP. Since the default WP uploads directory has the date baked into the file path, I opted to simplify things and put all migrated files into the uploads/ base directory. This has an added benefit of giving me one-stop shopping for anything that breaks during the migration process, instead of picking over a nested directory structure.

Before you can start using the code, you’ll need to install some RubyGems and OS hooks:

$  sudo apt-get -y install rubygems mysql-dev
...
$  gem install activerecord activerecord-jdbcmysql-adapter
...

Next, you need to create a new directory alongside the convert.rb script, called inputXML/. Into this you should copy all of the XML files from the pebble data directory. My data directory was in /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/, so I used the following commands to achieve this:

$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2009 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2008 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2007 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2006 -type f -name '*.xml' -exec cp '{}' inputXML/ \;

I know, using some more arcane shell-fu I could have combined the individual commands into one. However, rather than take a chance of fouling something up with an overly-complex command line, I chose to use the Base arrow-up feature to retrieve and modify my previous command. Simple, effective, and nearly fool-proof.

Luckily, in my case I had a bare WP install that I could abuse a bit while I fine tuned the migration code. Along with a decent database client (I love the free version of DBVisualizer for this), rolling back failed migrations are a breeze. After a failed migration attempt, I simply issued the following SQL commands to get back near enough to the base install:

DELETE FROM wp_posts;

DELETE FROM wp_comments;

DELETE FROM wp_term_relationships WHERE object_id $  999

NOTE: I chose to delete term relationships where the object ID is 1000 or greater, since it seems like the auto-increment field for posts starts at around 1000. This may not be the case in all environments.

I’m not going to pollute this blog post by giving a full listing of the migration script, but you can download it here: convert.rb. It has an accompanying database configuration YAML file that looks like this: database.yml. The script expects the inputXML/ directory and the database.yml file to be in the same directory as the script itself. Use the script if you want to, enjoy it, but above all, DON’T BLAME ME IF YOU BLOW SOMETHING UP.

I then tested, rinsed, and repeated until I was pleased with the result. As a final step, I copied the contents of the files/ and images/ directories from the Pebble data directory into my WP install:

$  cp -rf /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/images/* /opt/web/sites/johnofalltrades.name/htdocs/wp-content/uploads
...
$  cp -rf /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/files/* /opt/web/sites/johnofalltrades.name/htdocs/wp-content/uploads
...

That’s about all there is to it. Hackish? Maybe. Effective? Definitely. I’m very happy in my new WordPress digs!

UPDATE: I’ve submitted my changes back to the project in this issue.