Archive for category Code

On Moving to WordPress from Pebble

Posted by john on Tuesday, 1 December, 2009

So, you may have noticed that I recently migrated this blog from Pebble to WordPress. Actually, I migrated two former Pebble blogs, plus one former WordPress blog, to this new one. The WP install was simple, and so was the transfer of blog entries from one WP instance to another. Migrating from Pebble was a little more challenging, particularly since I had a lot of posts that I didn’t want to handle manually.

I started out by looking at the Pebble2Wordpress importer code, which is written in Ruby. (I first found a reference to this project from The Spritle Blog blog post, which has some more information about how it’s meant to be used.) I selected this project in particular because it’s written in a language I can fine tune as needed.

I’m not sure whether this project ever really worked as-is out of the source repository, but I can say for sure that it didn’t work for me using the instructions provided on the blog post. In an attempt to understand where the developer was coming from, and determine whether I could cobble together something to work for my own particular migration, I dug into the source. What I found seemed pretty overwrought and confusing, particularly for something that amounts to a simple migration script to morph XML documents into DB calls (not something that will ever attract a huge user base). After refactoring the code to provide minimal error handling, and to inline all class definitions into one ruby script alongside the code that uses these classes, I thought I had something understandable to start testing.

The general approach of the code is to create ActiveRecord representations for terms (read: categories), term-post relationships, posts, and comments. Once the database is connected, the script iterates through all *.xml files in the inputXML directory, and parses the Pebble XML format into Post/Comment/Category instances which are then saved into the WP database. During the parsing process, certain things like image URLs are massaged from the old Pebble layout to something more compatible with WP. Since the default WP uploads directory has the date baked into the file path, I opted to simplify things and put all migrated files into the uploads/ base directory. This has an added benefit of giving me one-stop shopping for anything that breaks during the migration process, instead of picking over a nested directory structure.

Before you can start using the code, you’ll need to install some RubyGems and OS hooks:

$  sudo apt-get -y install rubygems mysql-dev
...
$  gem install activerecord activerecord-jdbcmysql-adapter
...

Next, you need to create a new directory alongside the convert.rb script, called inputXML/. Into this you should copy all of the XML files from the pebble data directory. My data directory was in /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/, so I used the following commands to achieve this:

$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2009 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2008 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2007 -type f -name '*.xml' -exec cp '{}' inputXML/ \;
...
$ find /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/2006 -type f -name '*.xml' -exec cp '{}' inputXML/ \;

I know, using some more arcane shell-fu I could have combined the individual commands into one. However, rather than take a chance of fouling something up with an overly-complex command line, I chose to use the Base arrow-up feature to retrieve and modify my previous command. Simple, effective, and nearly fool-proof.

Luckily, in my case I had a bare WP install that I could abuse a bit while I fine tuned the migration code. Along with a decent database client (I love the free version of DBVisualizer for this), rolling back failed migrations are a breeze. After a failed migration attempt, I simply issued the following SQL commands to get back near enough to the base install:

DELETE FROM wp_posts;

DELETE FROM wp_comments;

DELETE FROM wp_term_relationships WHERE object_id $  999

NOTE: I chose to delete term relationships where the object ID is 1000 or greater, since it seems like the auto-increment field for posts starts at around 1000. This may not be the case in all environments.

I’m not going to pollute this blog post by giving a full listing of the migration script, but you can download it here: convert.rb. It has an accompanying database configuration YAML file that looks like this: database.yml. The script expects the inputXML/ directory and the database.yml file to be in the same directory as the script itself. Use the script if you want to, enjoy it, but above all, DON’T BLAME ME IF YOU BLOW SOMETHING UP.

I then tested, rinsed, and repeated until I was pleased with the result. As a final step, I copied the contents of the files/ and images/ directories from the Pebble data directory into my WP install:

$  cp -rf /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/images/* /opt/web/sites/johnofalltrades.name/htdocs/wp-content/uploads
...
$  cp -rf /opt/web/sites/ejlife.net/var/pebble/blogs/buildchimp/files/* /opt/web/sites/johnofalltrades.name/htdocs/wp-content/uploads
...

That’s about all there is to it. Hackish? Maybe. Effective? Definitely. I’m very happy in my new WordPress digs!

UPDATE: I’ve submitted my changes back to the project in this issue.

  • Share/Bookmark

Announcing: Sonatype

Posted by john on Wednesday, 18 April, 2007

In case you missed it here or here, I’ve been working with some really talented folks of late, creating a new business centered on Maven in the enterprise. The company is called Sonatype, and our goal is to give customers access to a large support network, training, and more, in order to help them address the inefficiencies and risk in their software development process.

Our first focus is on user training, starting with a Maven user training session scheduled the Monday before JavaOne 2007 kicks off. We’ll be in the area throughout JavaOne, so don’t hesitate to get in touch. Training is far from the final word for Sonatype, but you’ll have to stay tuned for the details!

Our approach is to grow this company from the ground up, with nothing but top-quality members, partners, products, and information. We’re not relying on hype or analysts here; we understand what it takes to put together a first-rate development infrastructure, and we’re interested in building one for you. We want to start by helping you un-break your build process, using Maven.


<shameless-plug>
Oh, and if I haven’t already mentioned it, take a gander at our website!
</shameless-plug>

BTW, this isn’t my normal programming blog any more, so I’ll be cross-posting this to the BuildChimp.

  • Share/Bookmark

Rise of the Build Chimp: Maven and Programming Content Separated

Posted by john on Wednesday, 11 April, 2007

I’m not going to be posting technical content here any more. Instead, I’ll be using the Build Chimp blog for that.

The reasons are manifold, but mainly that I think it’d be better to have a clean separation between the content I write for my profession, community involvement, and industry, from that of my personal life. I’m aware that I could tweak Pebble a little and provide a code-only RSS feed for JavaBlogs to use, but I prefer a clean separation that will allow me to manage the two bodies of content separately.

So, if you read this blog for technical content, please update your Google Reader (or whatever) to: http://www.ejlife.net/blogs/buildchimp.

  • Share/Bookmark