Did you pull that number out of a hat?

After the #SMCSTL meeting last night, I’ve been asked several times where our most-influential ranking came from. Unlike Bullwinkle, we did NOT pull that number out of a hat.

The algorithm, obviously can’t be fully revealed for many reasons:

  1. People would try to game it
  2. It’s in flux as we add/remove/tune parameters
  3. I talk like a propeller-head and might bore you all to death
  4. We might want to monetize it ;)

The essence is this:

We gather all the “organic” tweets via various sources (searches by location, keywords, hashtags, etc.)

We gather all the “curated” tweets via the listed/curated twitteratti we’ve identified for the site.

For each tweet, we extract all the links, hashtags and mentions. For each link, we canonicalize to the final endpoint and score that link. For each mention (and the posting tweep) we extract the basic twitter stats (follower/following/tweets) and ALSO our stats on how many STLTweets-tweets we have (both “curated” and “organic“). For each hashtag, we gather the frequency of use, etc.

From this, we compute the score of interesting links this person posts (based on other tweets to the same canonical link) and give a bonus if they are the originator of the link, We do the same thing for intertesting hashtags (originating/using/etc.)

We also factor in weight of other people mentioning the posting person’s screenname (and do a lot of “distinct within 24 hours”, “not the same user” or “closed-circle”, etc adjustments). Also, we have a manually curated juice value for each tweep, tweet and link that lets us downvote or ban spammers and such.

Each word in the tweet as well as each word in the canonical-link‘s description and headline, are all  canonicalized (so that things like spelling issues and such count the same) and scored. Those word scores determine the base-weight of an individual tweet, the tweets referring to a canonical-link help determine the base weight of an individual link, the tweep’s base-weight is based on source (location, mention of hashtag, curated etc.) and ALL of that goes into generating the final weighting the tweets, links, mentions, and hashtags.

All those factors are used to compute how “interesting” in terms of our St. Louis database a particular tweet (and cumulatively) the tweep is to St. Louis.  We also consider the simplistic twitter stats like followers/following/tweets, but those trivial numbers are smoothed by all the other math going on.

After this, we gather the Klout KScore and the Infochimps trstrank values for all the “contenders” and use those to give a “broader scope”.  We would like to do this for everyone, but Klout isn’t too keen on giving us everything. Infochimps have been very gracious and are letting us suck everybody down. We use these “global-scoped” values to ensure that we’re not too biased by the location-specific dataset we have.

Lastly all the various ranking inputs are scaled by “sensistivity” and multiplied by “weight” to compute the overal STLi rank.  We tune those parameters as evidence feeds back into our ranking to help ensure we are tracking actual retweet, click-through and (later) mention of the users.

  • http://linkedin.com/in/javastl brad hogenmiller

    Thanks for the clearer view into the process you guys used. If nothing else you’ve forced me to look up canonicalize (which I plan to use in 3 sentences this weekend).

    -Brad (@javastl)

  • http://www.underwaylife.com Todd Jordan

    What does curated mean in this context? Can you provide a curation example?

  • http://www.infuz.com Ryan Stephenson

    @Todd:

    When we talk about our curated users, we’re talking about the topic-experts and evangelists we file into our site’s categories if the majority of their tweets fit. These curated users’ tweets populate the main and sub-categories on the site, and their content is the basis for generating category-specific popular links and trends.

    For example, Mayor Slay is curated into several categories: “Politics”, and “City of St. Louis”. If you check http://stltweets.com/People/Politics you can see him among other recognized political tweeps. Clicking a Twitter ID on the site will take you to their profile page, and if they’re curated, you’ll see them listed next to their ID. Additionally, as we play nice in Twitter’s framework and use lists on STLTweets category accounts, anyone can see if they’re listed from their Twitter account’s list membership page.

    Curation began with us organizing all the interesting STL people we already followed, and through additional research and culling through suggestions, the curated lists grew. This remains a manual part of the site, but is one of the things we like as well – as much as we love automating systems, we think the human touch here improves on other impersonal and spammy aggregation sites.