M.CG

Matthew Caruana Galizia

In a nutshell: Erasmus Mundus in London, Amsterdam and Aarhus, News21 fellow at UC Berkeley.

Distributed thinking vs distributed computing

Problems with a project that went live the day before meant I only got to see one talk at last weekend’s Over the Air in Bletchley - Francois Grey’s closing keynote on Citizen Science and Open Science. He’s an excellent speaker and the subject, about getting the wider population involved in science, was really interesting.

Francois spent some time talking about how effective projects that utilise distributed computing have been at promoting the public understanding of science. Apparently, a while ago CERN called up the guy at Berkeley who made SETI@home and asked him for help. From there, Francois’ team went on to support a bunch of research projects that put distributed computing at their core, gathering or processing valuable data from thousands of participants all while getting publicity and informing people of the subject in the process.

This made me think though. In my day job at FT Labs the kind of research - yes, it’s research, because web developers also try to solve problems scientifically, after all - I do relies more on something akin to distributed thinking than it does on distributed computing. Stackoverflow.com is really just a very efficient source of and platform for distributed thinking. The #lazyweb hashtag on Twitter is a switch that we toggle when we want to ‘turn on’ the distributed thinking functionality built into our online social network.

I’ve always thought that the Internet is great because it allows us to take the problem-solving algorithms that we’ve always followed in analog life and accelerate them enormously. Pre-Web, the scope for answers to our problems was restricted to our real-life social circle. As a computer scientist, I would have had to wander the conference circuit for years before I bumped into someone who’d been thinking about the same problem and arrived at the answer. Never mind having to describe the same problem over and over to people I meet.

Stackoverflow takes that process and speeds it up, applying some other algorithms for sorting answers.

So I began wondering why, if it’s so important and we’re so dependent on it, distributed thinking isn’t studied as a science in itself. Sure, Stackoverflow and online forums are great, but can’t we make an active effort to produce even more efficient platforms for distributed thinking?

Aren’t there other fields that could benefit from distributed thinking in the same way that computer engineers have benefited from it?

One example I found for a project that comes kind of close to this is the Polymath Project, which I discovered following a mention on Francois’ own blog:

In his recent and eminently readable book Reinventing Discovery, physicist and open science advocate Michael Nielsen makes a good case for ‘a new era of networked science’, as the book is subtitled. His first example is of a blogger and mathematician, Tim Gowers, and the progress he made on solving a tough mathematical problem thanks to blogging about it, and getting a lot of other mathematicians – pros and amateurs – involved that way. Gowers called this blog-based experiment the Polymath Project.

That’s a start, but I’d argue that blogging is probably the first iteration of a model for distributed thinking on the Web. Surely we can take it further than that?

Decreasing the distance between producers and consumers of news

There’s a really persuasive post by Oliver Reichtenstein of iA doing the rounds right now called Sweep the Sleaze.

It might appear to be critical of Facebook at first, but I don’t think it’s critical of the deeper mission of social media - rather the opposite - to me it’s fighting for that real mission.

At the last FT hackday, one of the things my team and I were trying to achieve with what we called Shortest Path was to reduce the distance between the producers and consumers of news. We won the judges’ prize (just thought I’d drop that in there) and there’s a chance we might get more time to work on bringing it into the newsroom as a tool for the editorial team. The tool has promise, but it will only fix part of the problem - which is what I more less realised when I read Oliver’s post.

This is because most publishers, including the FT, restrict interactions from their sites and apps to Like and Tweet buttons, which I believe increase the distance between readers and writers and shouldn’t be encouraged. This is important because we showed, using Shortest Path, that discussions about news (in this case business and finance) are happening everywhere and it’s up to journalists themselves to jump in and interact - but first we need those discussions to happen more often!

Oliver’s opinion seems to be backed up by Smashing Magazine, who are quoted as saying: “We removed FB buttons and traffic from Facebook increased. Reason: instead of ‘liking’ articles, readers share it on their timeline.”

In the long run, we think that encouraging more active interaction is better for everyone - better for Facebook because it would result in a greater volume of more meaningful interactions than plain ‘Likes’, better for Facebook users because they have those meaningful interactions and better for the FT because people have more and deeper discussions about our coverage, increasing our exposure and just generally encouraging comment, feedback and debate, which is always good.

In conclusion, what we’d really like (my team and I, not necessarily the FT) is for the editorial team to engage more actively on social media. The first step would be, as the article points out, to discourage passive ‘Liking’ and encourage the more active kind of posting that people would do otherwise. The second step would be to show journalists where these discussions are happening (using our tool) and lead them to participate.

The trouble with Scholastica

Reading Tim Gower’s blog posts about the cost of knowledge, I was led to a company called Scholastica that seems to take (maybe coincidentally) some ideas from the campaign but which I don’t think goes far enough in doing so.

Scholastica still wants to ‘own’ the information and keep it within its own silo. This is the opposite of what the Web is about, and indeed is the kind of middleman-type business that the Web was supposed to allow us to bypass and assign to the dustbin of history in any case. The project is not open source and runs on its own servers - it chains its customers to its service. It doesn’t support open or standard communication protocols either. This means that the only way this business could work on a universal scale is if it were to be adopted universally, effectively becoming a global monopoly. I find this problematic and I think people in administration at universities all over the world will too.

I think a better solution is to develop an open source client that’s capable of communicating in a peer-to-peer fashion with other instances of itself on the Internet. Each participating university can host its own instance. Peers can review articles across the entire network or subscribe to any journal on the network. Any university can join or leave the network at any time, but entrance to the network is itself subject to peer-review by the rest of the network.

As wikileaks struggles, copycats die

The only WikiLeaks-inspired site launched in 2010-11 that is still genuinely going strong that I know of is Rospil.info, a Russian anti-corruption site founded by crusading blogger Alexei Navalny. It would appear that merely putting up a site to invite leaks from anonymous sources isn’t enough to generate much engagement—an engine (like a Navalny) is needed.

- Micah Sifry at Techpresident

Reverse engineering Chinese censorship: When and why are controversial tweets deleted?

MIT student Chi-Chu Tschang is working to detect patterns in the disappearance of thousands of weibos from the Chinese Internet.

There’s a great NiemenLab article about the research, but in summary these seem to be his findings:

  • The day that saw the highest volume of deletions, in a dataset covering Feb. 1 to May 20, was March 8: the day rumors of Bo Xilai’s fall from power began to spread.

  • The second-busiest censorship day was March 15, the day Bo was sacked.

  • Tschang’s hypothesis — that Sina Weibo deletions correlate highly with spikes in media coverage of sensitive stories — are consistent with the findings of a similar study from researchers at Carnegie Mellon University, who evaluated 56 million weibos, of which about 16 percent were deleted.

  • The fastest a post was deleted on Sina Weibo was just over 4 minutes. The longest time it took for the censor to get around deleting a message on Sina Weibo was over four months. For posts created and deleted on the same day, the average was 11 hours.

  • And the best time to weibo something politically sensitive in China? After 11 o’clock on a Friday night, according to the data.

The banality of evil - village edition

Hannah Arendt wrote in her 1963 book Eichmann in Jerusalem that, aside from a desire for improving his career, Adolf Eichmann, who was responsible for transporting all the Jews of Poland to Nazi death camps, showed no trace of anti-Semitism or psychological problems. After seeing him give evidence at his trial, she coined the phrase “the banality of evil” to describe people who, like him, show no psychopathic tendencies and yet display neither guilt for the crimes in which they have taken part nor hatred for their victims.

Every time I read a story about heinous crimes that appear to be carried out uninterrupted and in full view of others, I’m reminded of this phrase. Take this quote, from a BBC News story about a teenage girl who was kept as a slave for eight years in a Bosnian village. There’s absolutely no way that none of the other villagers knew what was going on - yet they allowed it to happen unabated for all that time.

One of the neighbours told local media he once witnessed Milenko Marinkovic harness the girl to a horse cart and whip her while she pulled it.

Easily parse a URL in JavaScript, without regex

var a, url = 'http://m.cg/'; a = document.createElement('a'); a.href = url; console.log(a.protocol); // "http:" console.log(a.hostname); // "m.cg" console.log(a.pathname); // "/"

Beyond web developer tools: dtruss/dtrace and Mobile Safari

Having ready Tony Gentilcore’s excellent post on using strace to debug the browser, I decided to plagiarise his title and give the thing an iOS slant.

Tony gives a good summary of why it’s sometimes necessary to go beyond the Web Inspector.

…great engineers are able to go beyond a browser’s developer tools to find out exactly what the browser is telling the operating system to do. On Linux, this source of ultimate truth can be found using strace. This tool can trace each system call made by a browser. Since every network and file access entails a system call, and this is where browsers spend a lot of their time, it is perfect for debugging many types of browser performance issues.

Mac users have DTrace rather than strace. Even better, you can use the dtruss wrapper for DTrace which makes things easier by simplifying the command syntax.

Example: monitoring localStorage

Would you like to know why your localStorage-heavy web app is slow to start? It’s probably doing a lot of file sysystem I/O on startup. But there’s only one way to find out for sure.

  1. Fire up the iOS Simulator and open Safari in it, but stay on the initial blank page.

  2. Open Terminal and run the following:

    sudo dtruss -a -n MobileSafari
  3. Load your app in Safari in the Simulator. The screen will probably be flooded with output, and here on you should follow Tony’s guide for interpreting it.

Why JavaScript ‘obfuscation’ is not just pointless, but bad

I just saw a mention of JScrambler somewhere and looked it up to try and find out what it does. Looking at the examples page on their corporate site, I was horrified. Here’s the first example, for posterity.

Source code:

navigator.plugins.length

Obfuscated code:

var _=this; for (H in _) if(H.length==9&&H.charCodeAt(0)==110&&H.charCodeAt(8)==114) break; for ($ in _[H]) if($.length==7&&$.charCodeAt(0)==112&&$.charCodeAt(6)==115) break; _[H][$]["length"];

That has got to be crazy slow, but don’t take my word for it - a jsPerf test shows that the obfuscated code is 100% slower than the original in Chrome.

What kind of an evil maniac made this obfuscator?

DOM Events in Internet Explorer 9 and 10

This post is more of a note to self about an idiosyncrasy of IE9-10.

It seems that in IE9 and IE10, in order to be backwards compatible with older code, Microsoft has left the legacy, proprietary pre-IE9 event object in window.event, but will pass the new W3C DOM Events Level 3 standard object into event listeners.

So, even though IE9 and IE10 support Event#stopPropagation, both will error in the following (contrived) example:

function onClick() { window.event.stopPropagation(); }

Whereas this will work:

function onClick(event) { event.stopPropagation(); }

In any case, if you want cross-browser functionality you should really be writing code like this:

function onClick(event) { if (event) { event.stopPropagation(); } else { window.event.cancelBubble = true; } }