my recent reads..

Twitpocalypse II: Developers beware of DB variances


Alert: "Twitpocalypse II" coming Friday, September 11th - make sure you can handle large status IDs!
Twitter operations team will artificially increase the maximum status ID to 4294967296 this coming Friday, September 11th.

"Twitpocalypse (I)" occured back in June, when twitter and application developers had to deal with the fact that message status IDs broke the signed 32-bit integer limit (2,147,483,647).

At that point, the limit was raised to the unsigned 32-bit limit of 4,294,967,296. Now we're heading to crack that this week. You can track our collective rush to the brink social celebrity meltdown at www.twitpocalypse.com;-)

First reaction: OMG, it's taken only 3 months to double the volume of tweets sent over all time? That's a serious adoption curve.

Next reaction: once again, application developers are reminded that we unfortunately can't ignore the specifics of the database platform they are running on and just take it for granted.

It's actually quite common for development and production infrastructure to be subtly different. This is especially true in the Rails world where SQLite is the default development database, but production systems will often be using MySQL or PostgreSQL.

If you are using a hosted ("cloud") service it may even take some digging to actually find out what kind of database you are running on. For example, if you use Heroku to host Rails applications, most of the time you don't care that they run PostgreSQL (originally I think they were using MySQL but migrated a while back).

It's in situations like Twitpocalypse that you care. With a Rails-based twitter application, use an "integer" in your database migrations and you will have no problem running locally on SQLite, but you're app will blow up on a production PostgreSQL database when you encounter a message with status_id above 2,147,483,647.

Fortunately, the solution is simple: migrate to bigint data types.

And the even better news is that ActiveRecord database migrations make this a cinch if you have been using integer types in the past. For example, if you've been using an integer type to store "in_reply_to_status_id" references in twitter mentions table, the change_column method will happily manage the messy details for you:

class ForcebigintMentions < ActiveRecord::Migration
def self.up
change_column :mentions, :in_reply_to_status_id, :bigint
end

def self.down
change_column :mentions, :in_reply_to_status_id, :integer
end
end

It's always a good idea to check fundamental limits for the database platforms you are using. They are not always what you expect, and you can't safely apply lessons from one product to another without doing your homework.

Here's a quick comparison of integer on some of the common platforms:
  • SQLite: INTEGER. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value. i.e. will automatically scale to an 8 byte signed BIGINT (-9223372036854775808 to 9223372036854775807)

  • PostgreSQL: INTEGER 4 bytes (-2147483648 to +2147483647). Use BIGINT for 8 byte signed integer.

  • MySQL: INT (alias INTEGER) has a signed range of -2147483648 to 2147483647, or an unsigned range of 0 to 4294967295. Use BIGINT is the 8 byte integers.

  • Oracle : NUMBER type ranges from 1.0 x 10^-130 to but not including 1.0 x 10^126. The activerecord-oracle-enhanced-adapter provides facilities for intepreting NUMBER as FixNum or BigDecimal in ActiveRecord as appropriate.


PS: there's been some discussion of why twitter would schedule this update on Sep 11th and publicise it as the Twitpocalypse II. I hope it was just an EQ+IQ deficiency, not someone's twisted idea of a funny or attention-grabbing stunt.