By Alex Plough September 29, 2014
Data journalism seems to be having its moment. Though journalists have been using computers, databases and coding to tell stories for decades, the approach is taking off in this era of Big Data and anemic newsroom budgets.
Three high-profile data journalism ventures launched this year testify to the popularity of the concept. In April, the New York Times released The Upshot, a data-heavy politics and policy website. That same month, former Washington Post Wonkblog journalist Ezra Klein launched Vox, what he calls a “news explanation site,” which uses data to add an extra layer of context to news stories.
Probably the best known figure in the field is Nate Silver, a statistician who rose to fame by accurately forecasting the 2008 and 2012 US presidential elections on his FiveThirtyEight blog, which was later picked up and licensed by the New York Times. In March, Silver partnered with ESPN and relaunched it as a separate website under the same name.
All three websites aspire to take a quantitative approach to the major issues of the day, relying heavily on statistical analysis and data visualization to tell stories. Some in the data journalism world, including Silver, have claimed that such quantitative methods allow data journalists to avoid the biases and factual errors made by “traditional” reporters and pundits.
In his FiveThirtyEight ‘manifesto’ “What the Fox Knows,” Silver critiqued “conventional news organizations” for their poor data skills, singling out Wall Street Journal columnist Peggy Noonan for predicting a Romney win in 2012 when the polls said otherwise. Silver claimed Noonan’s clear disdain for data was representative of a wider problem in the media: an over-reliance on anecdotes.
In response, some “traditional” journalists retorted that numbers alone don’t tell a story. Others in the media bristled at FiveThirtyEight’s definition of “data journalism”.
Who Got There First
“The thing that Nate did was say that “this is data journalism”, without any context about the other people who had done this before,” said Alex Howard, a TechRepublic columnist and fellow at the Tow Center for Digital Journalism at Columbia Journalism School. “He’s a brilliant guy and was able to do something that moved away from punditry, but he wasn’t the only one. Politifact has been rating the veracity of what political pundits say since 2007,” Howard said.
Readers in the information age are clamoring for the data, Howard continued, but journalists who aspire to be scientists must be aware that datasets can be just as unreliable as human sources.
His point is underscored by early stumbles from FiveThirtyEight, such as a series of errors in data “cleaning” and interpretation in its report on the kidnapping of girls in Nigeria. To the site’s credit, it quickly issued a lengthy and brutally honest correction to the original piece.
These data gaffes are familiar to one group of conventional reporters in particular. Over 40 years ago, pioneering “computer assisted reporting” (CAR) journalists like Philip Meyer were applying the quantitative methods of social science to their work, such as the Detroit Free Press’ Pulitzer Prize winning reports on the causes of the 1967 Detroit riots. The paper used a mainframe computer to show that people who had attended college were as likely to have participated in the riots as high school dropouts.
Data has played crucial role in investigative journalism for many years. Early in the history of Investigative Reporters and Editors (IRE), a grassroots nonprofit organization founded in 1975, a spin-off organization dedicated to this new field was established. National Institute for Computer Assisted Reporting (NICAR) launched in 1989 and its annual conference in Baltimore this year drew over 1,000 journalists from around the world.
The New Stuff
Just as CAR journalists borrowed research techniques from social science, the newest iterations of data journalists are drawing from the academic disciplines of computer science and statistics.
But a number of other things set them apart: access to an unprecedented amount of data, as well as to powerful and freely available new tools that are being shared, copied and used to manipulate data to tell stories.
“The new wave of data journalism includes two different things. One is the move to more open platforms and reporting models, through code sharing and the use of free, open source-source tools,” said Howard.
Newsroom data experts are now using free open-source software, such as the potent statistical analysis tool R and database manager MySQL, tools that level the data journalism playing field for many smaller outlets.
Another trend is the use of software code-hosting platform Github by news organizations. Typically used by the open-source software development community to store and share their code online (in “repositories”), Github lets users duplicate others’ code and re-purpose it for their own needs.
This feature lets data journalism teams across the world quickly replicate each other’s projects, spurring innovation with increasingly sophisticated news applications. At least 13 major publishers, from upstarts like FiveThirtyEight to legacy media groups such as the New York Times, now have their own dedicated Github repositories.
The second thing that’s different about the new breed of data journalism, said Howard, is that these tools have spawned entirely new forms of story telling. “You don’t just have to have the reverse pyramid, with data you can create a living story, a platform in which data can be explanatory and participatory such as NPR’s ‘Playgrounds for Everyone’ app,” Howard said.
The NPR application crowd sourced data from its readers about playgrounds that are accessible to children with disabilities. After cleaning the submitted data and combining it with existing regional databases, NPR presented an interactive map that users could search for accessible playgrounds nearby.
Many journalists would struggle to recognize the results as a traditional ‘story’. But by designing the application as a personalized service, in this case for parents or guardians of disabled children, NPR developers gave users an incentive to share the map with others and keep it updated. In the process they also created the most complete national database of accessible playgrounds in the country, a resource that can be mined for more traditional news and investigative stories.
The new data journalism is often crowd-sourced and interactive, for today’s connected world.
(This is Part I of a two-part series. Part II, Data Journalism: How to Get Started, coming Oct. 6)
This entry was posted on Monday, September 29th, 2014 at 8:00 am. It is filed under data journalism, Tools & Resources and tagged with Alex Howard, CAR, Computer Assisted Reporting, data journalism, Ezra Klein, FiveThirtyEight, Github, Investigative Reporters and Editors, MySQL, Nate Silver, NICAR, NPR, Politifact, TechRepublic, The Upshot, toolR, Vox, Wonkblog. You can follow any responses to this entry through the RSS 2.0 feed.
Comments are closed.