February 5, 2009

Your Queries

Pop Quiz: What do the following three items all have in common?
  • staying at tropicana and getting girl
  • rap group goes shopping and eats ice cream in a music video
  • traffic light hairdo
If you guessed they are all key elements in a low budget adult film, well, you might be right, but the answer I wanted was they're all recent search queries that landed people to my web site (full list of recent queries here). How do I know this? Google Analytics!

I don't consider myself a Google apologist, but I do think Analytics is a really cool (and free) service for people interested in visitor information. It's targeted to people that want to monetize user traffic, but it's fun for the recreational web author too. The most amusing data for me is the queries people searched for that resulted in them visiting my pages. So amusing, I thought they were worth sharing.

If you'd like to similarly post the queries people made that resulted in a visit from Analytics, you can try to reproduce either of the convoluted hacks I came up with. If not, now would be a perfect time to leave because it just gets more boring from here.

Unfortunately, Analytics doesn't have a nice API to access their data (yet). Despite that, there are two hacky ways you can automatically access the data for whatever post processing you'd like to do:

Method 1: Analytics has a feature to email reports of data in different formats (.pdf, XML, csv), and you can actually schedule these reports to be sent daily, weekly, or monthly. I created a monthly report that emailed a csv file of the top query keywords and then wrote a simple Python script that used IMAP to download and read the latest Analytics report and parse the csv. The script is here and could probably be easily tweaked to suit your needs. I ran into the problem of getting the scheduled report to reliably send all keywords, so I ended up coming up with and using the ever so slightly more reliable Method 2.

Method 2: I stumbled upon pyGAPI, which is a simple, yet very, very fragile API to access Analytics in Python. It handles authentication and report downloading via web requests and screen scraping. Analytics behavior had actually changed since this was written, so this didn't work as is, but adding the necessary web requests to get it to work wasn't very difficult. My patched version is here until the author updates his. The script I'm currently using to access and parse this data is here, and should also be easy to tweak. Note that pyGAPI is really fragile and once Analytics changes its request flow, things will probably break and need to be hacked back into shape.

The Python scripts above both come with my usual software quality assurance guarantee.*

* The code worked for me at some point, but I didn't test it, think about design, or use any fancy safe coding techniques. I can make no guarantee that running it won't delete your hard drive or ruin your marriage (or both). Execute at your own risk.


L said...

Ok, I am so NOT making up what I am about to tell. As soon as I finished reading this post, really, just finished reading "Execute at your own risk", my computer crashed.

I had to reboot and everything was ok, and I know it (probably) wasn't your fault, but I am going to be more cautious about your hyperlinks from now on...

Billy said...

There are a surprising number of searches for nipples that lead to your site. I can't imagine what sort of scandalous material you host that would create such search hits.

Asirap said...

Ya know Bill, I was thinking the exact same thing! What is leading these nipple loving perverts to my innocent little web page? At least I can take comfort in knowing I maintain friendships with solid, moral, upstanding citizens like yourself.

Anonymous said...

When Katy was still running her blog ended up as one of the foremost authorities for the search phrase "tongue cramping". The post was not actually inappropriate, but I'm sure most of people searching for it were.

L said...

Hey Parisa, can you rank your search queries? I'd like to see where "Leo Linares" ranks with respect to "violent tortorous sex" and "flaming nipples"

No rush, my ego can wait.

Asirap said...

They are already ranked by number of hits. In March, there were no Leo hits, but 4 for "painful torturous sex". My reader's priorities have shifted I suppose :(