my recent reads..

Optimising presence in Rails with PostgreSQL

(blogarhythm ~ Can't Happen Here - Rainbow)
It is a pretty common pattern to branch depending on whether a query returns any data - for example to render a quite different view. In Rails we might do something like this:

query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.presence
results.each {|row| ... }
else
# do something else
end
When this code executes, we raise at least 2 database requests: one to check presence, and another to retrieve the data. Running this at the Rails console, we can see the queries logged as they execute, for example:
(0.9ms)  SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL
User Load (15.2ms) SELECT "users".* FROM "users" WHERE "users"."deleted_at" IS NULL
This is not surprising since under the covers, presence (or present?) end up calling count which must do the database query (unless you have already accessed/loaded the results set). And 0.9ms doesn't seem too high a price to pay to determine if you should even try to load the data, does it?

But when we are running on PostgreSQL in particular, we've learned to be leery of COUNT(*) due to it's well known performance problems. In fact I first started digging into this question when I started seeing expensive COUNT(*) queries show up in NewRelic slow transaction traces. How expensive COUNT(*) actually is depends on many factors including the complexity of the query, availability of indexes, size of the table, and size of the results set.

So can we improve things by avoiding the COUNT(*) query? Assuming we are going to use all the results anyway, and we haven't injected any calculated columns in the query, we could simply to_a the query before testing presence i.e.:
query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.to_a.presence
results.each {|row| ... }
else
# do something else
end

I ran some benchmarks comparing the two approaches with different kinds of queries on a pretty well-tuned system and here are some of the results:
QueryUsing present?Using to_aFaster By
10k indexed queries returning 1 / 1716 rows17.511s10.938s38%
4k complex un-indexed queries returning 12 / 1716 rows23.603s15.221s36%
4k indexed queries returning 1 / 1763218 rows22.943s20.924s9%
10 complex un-indexed queries returning 15 / 1763218 rows23.196s14.072s40%

Clearly, depending on the type of query we can gain up to 40% performance improvement by restructuring our code a little. While my aggregate results were fairly consistent over many runs, the performance of individual queries did vary quite widely.

I should note that the numbers were *not* consistent or proportional across development, staging, test and production environments (mainly due to differences in data volumes, latent activity and hardware) - so you can't benchmark on development and assume the same applies in production.

Things get murky with ActiveRecord add-ons

So far we've talked about the standard ActiveRecord situation. But there are various gems we might also be using to add features like pagination and search magic. MetaSearch is an example: a pretty awesome gem for building complex and flexible search features. But (at least with version 1.1.3) present? has a little surprise in store for you:
irb> User.where(id: '0').class
=> ActiveRecord::Relation
irb> User.where(id: 0).present?
(0.8ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" = 0
=> false
irb> User.search(id_eq: 0).class
=> MetaSearch::Searches::User
irb> User.search(id_eq: 0).present?
=> true

Any Guidelines?

So, always to_a my query results? Well, no, it's not that simple. Here are some things to consider:
  • First, don't assume that <my_scoped_query>.present? means what you think it might mean - test or play it safe
  • If you are going to need all result rows anyway, consider calling to_a or similar before testing presence
  • Avoid this kind of optimisation except at the point of use. One of the beauties of ActiveRecord::Relation is the chainability - something we'll kill as soon as we hydrate to a result set Array for example.
  • While I got a nice 40% performance bonus in some cases with a minor code fiddle, mileage varies and much depends on the actual query. You probably want to benchmark in the actual environment that matters and not make any assumptions.

read more and comment..

My Virtual Swag from #rdrc

(blogarhythm ~ Everybody's Everything - Santana)

So the best swag you can get from a technology conference is code, right? Well RedDotRubyConf 2013 did not disappoint! Thanks to some fantastic speakers, my weekends for months to come are spoken for. Here's just some of the goodness:


read more and comment..

Will I still be a Rubyist in 5 years? #rdrc

(blogarhythm ~ Ruby - Kaiser Chiefs)

The third RedDotRubyConf is over, and I think it just keeps getting better! Met lots of great people, and saw so many of my Ruby heroes speak on stage. Only thing that could make it even better next year would be to get the video recording thing happening!

I had the humbling opportunity to share the stage and here are my slides. Turned out to be a reflection on whether I'd still be a Rubyist in another 5 years, and what are the external trends that might change that. Short story: Yes! Of course. I'll always think like a Rubyist even though things will probably get more polyglot. The arena of web development is perhaps the most unpredictable though.

A couple of areas I highlight that really need a bit more love include:

  • There's a push on SciRuby. Analytics are no longer the esoteric domain of bioinformaticists. Coupled with Big Data (which Ruby is pretty good at), analytics are driving much of the significant innovation in things we build.
  • Krypt - an effort lead by Martin Boßlet to improve the cryptographic support in Ruby. My experience building megar made it painfully obvious why we need to fix this.

Let it never be said, the romance is dead
'Cos there’s so little else occupying my head

I mentioned a few of my projects in passing. Here are the links for convenience:
  • RGovData is a ruby library for really simple access to government data. It aims to make consuming government data sets a "one liner", letting you focus on what you are trying to achieve with the data, and happily ignore all the messy underlying details of transport protocols, authentication and so on.
  • sps_bill_scanner is a ruby gem for converting SP Services PDF invoices into data that can be analysed with R. Only useful if you are an SP Services subscriber in Singapore, but other wise perhaps an interesting example of extracting postitional text from PDF and doing some R.
  • megar ("megaargh!" in pirate-speak) is a Ruby wrapper and command-line (CLI) client for the mega.co.nz API. My example of how you *can* do funky crypto in Ruby ... it's just much harder than it should be!


read more and comment..

Amplifying Human Emotion

(blogarhythm ~ Sweet Emotion 相川七瀬)

It all comes back to connectivity. Om Malik (TWiST #327 @00:37:30) has a brilliant characterization of the true impact of the internet:

human emotion amplified at network scale

read more and comment..