Sunday, November 22, 2015

Vision of ORES & ORES vision

So I've been working on a blog post for about ORES.  I talked about ORES a few weeks ago in ORES: Hacking social structures by building infrastructure, so check that out for reference.  Because the WMF blog is relatively high profile, the Comms team at the WMF doesn't want to just lift my personal bloggings about it -- which makes sense.  I usually spend 1-2 hours on this, so you get typos and unfinished thoughts.

In this post, I want to talk to you about something that I think is really important when communication about what ORES is to a lay audience.

Visualizing ORES

The WMF Comms team is pushing me to make the topic of machine triage much more approachable to a broad audience.  So, I have been experimenting with visual metaphors that would make kinds of things that ORES enables easier to understand.  I like to make simple diagrams like the one below for the presentations that I give. 

The flow of edits from The Internet to Wikipedia are highlighted by ORES 
quality prediction models as "good", "needs review" and "damaging".  

ORES vision

But it occurs to me that a metaphor might be more appropriate.  With the right metaphor, I can communicate a lot of important things through implications.  With that in mind, I really like using Xray specs as a metaphor for what ORES does.  It hits a lot of important points about what using ORES means -- both what makes it powerful and useful and also why we should be cautious when using it.
A clipping from an old magazine showing fancy sci-fi specs. 
ORES shows you things that you couldn't see easily beforehand.  Like a pair of Xray specs, ORES lets you peer into the firehose of edits coming into Wikipedia and see potentially damaging edits stand out in sharp contrast against the background of probably good edits.   But just like a pair of sci-fi specs, ORES alters your perception.  It implicitly makes subjective statements about what is important (separating the good from the bad) and it might bias you towards looking at the potentially bad with more scrutiny.  While this may be the point, it can also be problematic.  Profiling an editors work by a small set of statistics is inherently imperfect and the imperfections in the prediction can inevitably lead to biases.  So I think it is important to realize that, when using ORES, you're perception is altered in ways that aren't simply more truthful.

So, I hope that the use of this metaphor will help educate ORES users in the level of caution they employ as this socio-technical conversation about how we should use subjective, profiling algorithms as part of the construction of Wikipedia.

Sunday, November 1, 2015

Measuring value-adding in Wikipedia

So I've been working on this project on an off.  I've been trying to bring robust measures of edit quality/productivity to Wikipedians.  In this blog post, I'm going to summarize where I am with the project.  

First, the umbrella project:  Measuring value-added

Basically, I see the value of Wikipedia as a simple combination of two hidden variables: quality and importance.  If we focused on making our unimportant content really high quality, that wouldn't be very valuable.  Conversely if we were to focus on increasing the quality of the most important content first, that would increase the value of Wikipedia most quickly.  
Value = Quality × Importance
But I want to look at value-adding activities, so I need to measure progress towards quality.  I think a nice term for that is productivity.  
Value-added = Productivity × Importance
So in order to take measurements of value-adding activity in Wikipedia, I need to bring together good measure of productivity and importance.

Measuring importance

Density of log(view rate) for articles assessed by Wikipedians
for importance. 
I'm going to side-step a big debate purely because I don't feel like re-hashing it in text.  It's not clear what importance is.  But we have some good ways to measure it.  The two dominant strategies for determining the importance of a Wikipedia article's topic are (1) view rate counts and (2) link structure.  

With view rate counts, the assumption is made that the most important content in Wikipedia is viewed most often.  This works pretty well as far as assumptions go, but it has some notable weaknesses.  For example, the article on Breaking Bad (TV show) has about an order of magnitude more views than the article on Chemistry.  For an encyclopedia of knowledge, it doesn't feel right that we'd consider a TV show to be more important than a core academic discipline.  

Link structure provides another opportunity.  Google's founders famously used the link structure of the internet to build a ranking strategy for the most important websites.  See PageRank.  This also seems to work pretty well, but it's less clear what the relationship is between the link graph properties and the nebulous notion of importance.  At least with page view rates, you can plainly imagine the impact that a highly viewed article has.  

Fun story though: Chemistry has 10 times as many incoming links as Breaking Bad.  It could be that this measurement strategy can help us deal with the icky feeling us academics get when thinking that a TV show is more important than centuries of difficult work building knowledge.  

Measuring productivity 


Luckily, there is a vast literature for measuring the quality of contributions in Wikipedia.  Many of which I have published!  There are a lot of strategies, but the most robust (and difficult to compute) is tracking the persistence of content between revisions.  The assumption goes: the more subsequent edits a contribution survives, the high quality it probably was.  We can quite easily weight "words added" by "persistence quality" to get a nice productivity measure.  It's not perfect, but it works.   The trick is figuring out the right way to scale and weight the measures so that they are intuitively meaningful.  
The real trick here was making the computation tractable.  It turns out that tracking changes between revisions is extremely computationally intensive.  It would take me 60 days or so to track content persistence across the entire ~600m revisions of Wikipedia on a single core of the fastest processor on the market.  So the trick is to figure out how to distribute the processing across multiple processors.  We've been using Hadoop streaming.  See my past post about it: Fitting Hadoop streaming into my workflow  It's been surprisingly difficult to work with memory issues in Hadoop streaming that don't happen when just using unix pipes on the command line.  I might make a post about that later, but honestly, it just makes me feel tried to think about those types of problems.  

Bringing it together

I'm almost there.  I've still got to work out some threshholding bits for productivity measures, but I've already finished the hard computational work.  My next update (or paper) will be about who, where and when of value-adding in Wikipedia.  Until then, stay tuned.