Monday, December 19, 2011

A breakthrough in data visualization, what it means for data journalism, predicting the news



Earlier this month, the National Science Foundation announced a new system to help researchers make sense of stores of scientific papers, and potentially find the “next big thing.”

The Action Science Explorer, or ASE, developed jointly by University of Michigan and University of Maryland faculty, takes a difficult cognitive task -- backtracking through paper citations to identify a breakthrough -- and “offloads” it to the much easier task of perceiving density in network visualizations. In other words, it takes mounds of difficult to digest research, and uses social network analysis techniques and graphing to make the information immediately recognizable.

The ASE visually represents papers and concepts as they appear over time, identifies the moment where fields branched out and flourished, and also finds moments where other research became obsolete or lost. It also identifies emerging fields of study:

“Users can quickly appreciate the strength of relationships between groups of papers and see bridging papers that bring together established fields. Even more potent for those studying emerging fields is the capacity to explore an evolutionary visualization using a temporal slider. Temporal visualizations can show the appearance of an initial paper, the gradual increase in papers that cite it, and sometimes the explosion of activity for ‘hot’ topics. Other temporal phenomena are the bridging of communities, fracturing of research topics, and sometimes the demise of a hypotheses.”
(from the ASE tech report)

Here’s how it works:



The ASE researchers say this software has potential in the fields of linguistics, biology and sociology, writing “Both students and educators must have access to accurate surveys of previous work, ranging from short summaries to in-depth historical notes. Government decision-makers must learn about different scientific fields to determine funding priorities.”

But suppose data journalists use similar tools to analyze legislation over time, to forecast future bills and political alliances. Clusters would indicate where certain provisions failed, where lobbyists and special interests had influenced legislation the most, and possibly how those interests would proceed in the future. Instead of conducting reactionary reporting, or relying on too-late intelligence that lets legislation slip through unnoticed, reporters could use the system to help guide questions and investigations.

In September, computer scientist Kalev Leetaru here on the University of Illinois campus did something just as remarkable. He compiled more than 100 million media reports, text-mined and crunched them in a supercomputer, and was able to chart and even predict the instability in Libya and Egypt.

Impressively, Leetaru was also able to use those news reports to estimate the location of al-Qaeda leader Osama Bin Ladin with a 200km degree of accuracy. From the BBC news, who reported on Leetaru’s research:
The computer event analysis model appears to give forewarning of major events, based on deteriorating sentiment.
However, in the case of this study, its analysis is applied to things that have already happened.
According to Kalev Leetaru, such a system could easily be adapted to work in real time, giving an element of foresight.
"That's the next stage," said Mr Leetaru, who is already working on developing the technology.
"It looks like a stock ticker in many regards and you know what direction it has been heading the last few minutes and you want to know where it is heading in the next few.
“Predictive reporting” or “news forecasting” could prove invaluable to digital newsrooms, where seconds mean the difference between breaking the news and just being one of the reporting mob. And if news agencies work on integrating advances in computer and information science into the office, instead of just reporting on them, it could enhance reporting across the entire organization.