Caching data from third party APIs in ActiveRecord

02/13/13 20:30:00    

By Michael Mealling

One of the Pipefish apps I'm building uses themoviedb.org API as a source of information on movies. I'm using them because they return usable IDs in search results and don't have the severe usage limitations that IMDB does. Regardless of who it is, it creates a dependency within my service on a third party that I have no control over. So I need to mitigate that dependency as much as possible by caching.

Since we are using Backbone.js in the client and ActiveRecord in the backend we have a very transparent flow between the columns that exist in the database and the attributes that are available to the Javascript in the page. As soon as I run the migration to add a new column to the model it shows up in the page. But that doesn't mean there's actual data in the database.

Right now the logic is that if the object exists in the database load it and move on. If it doesn't then retrieve it from the web service, set the attributes and then save it to the database. To handle new data I needed a way to declare the model in the database as dirty so it would be updated. I didn't want to do it all at once due to the load.

After much googling I discovered the poorly documented “after_find” callback. I created a new column called 'dirty' and modified the model to look like this:

class Movie < ActiveRecord::Base

  after_find :clean_if_dirty

  public

  def clean_if_dirty
    if self.dirty == 1
      begin
        newattributes = Tmdb::Movie.find_by_id(self.tmdb_id)
        self.newthing = newattributes["newthing"]
        self.dirty = 0
        self.save
      rescue => e
        puts "Help! Something broke and it needs fixing!"
        self.dirty=1
        self.save
      end
    end
  end   
end

Then I created a rake task that uses update_all to invalidate the current models:

namespace :data do
  desc "Sets the dirty flag for all movies to true"
  task :invalidatetmdb => :environment do
    Movie.update_all :dirty => 1
  end
end

So I can deploy and immediately invalidate the data without downtime to rebuild a database. This deploys and works but something tells me that I'm not doing this the Rails way since I'm repeating much of what ActiveRecord does for detecting if a model is dirty or not. So while I think this is a valid use of the “after_find” callback, I'm not sure if the overall patern is the right way. Thoughts?


comments powered by Disqus