By Alex Plough December 29, 2014
Rob Barry is an award-winning investigative reporter at the Wall Street Journal, where he specializes in the art and science of computer assisted reporting. Over the years he has become an expert at wrangling large, unwieldy datasets into compelling narratives. This work has earned Barry numerous accolades, including two Gerald Loeb awards and a nomination as a Pulitzer Finalist in the Public Service category.
In 2013, Barry and his colleagues at the Wall Street Journal won a Barlett & Steele Bronze Award for their investigation into stock trading by corporate executives. Their article “Executives’ Good Luck in Trading Own Stock” revealed a group of corporate insiders whose trading patterns helped them profit or avoid losses year after year. Armed with a tip-off from a reliable source and online access to millions of SEC filings, the Journal team set about investigating how lucky these executives really were. I caught up with Barry to talk about the use of CAR in business journalism as well as the story behind how he creates some of his best work.
Alex Plough: The ‘Executives’ Good Luck in Trading Own Stock’ is a great example of a business story that couldn’t have been done without data. But it also started from a conversation with a source. How do you manage both in your reporting process? Do you typically go from traditional tips to data?
Rob Barry: Great question. In our business, data on its own is rarely enough to tell a story. As important is the guidance you get from people involved in what you’re writing about. In that story’s case, for instance, Susan Pulliam, my co-author, was extremely well sourced and used those sources to help inform the analysis. What data does in cases like that is provides the framework in which to embed your story. It helps you determine whether there’s even a worthwhile story to tell, particularly on the investigative front, where what we’re often doing is looking for systemic problems and systemic breakdowns. One anecdote isn’t enough, but an anecdote can form the basis for a much broader analysis.
Plough: Cleaning and getting data into the right format often is the hardest part. What coding tools do you recommend journalists learn to make their lives easier?
Barry: I find myself changing toolkits on a fairly regular basis. Right now, my main tools of choice are a statistical programming language called R, a general programming language called Python and a database management program called Microsoft SQL. Oh, and when I need something to run fast, I use a language called C#, also made by Microsoft.
Rather than specific tools, I think the important thing to focus on is specific characteristics of your workflow. So, for instance, one of the most important things to me is reproducibility. Here’s the idea: say you’re combining 30 different data sets from different sources and in different formats. You could do that by hand in an Excel spreadsheet but then when it comes time to check your work, you’ll have to go through that process again, and if it doesn’t check out—if you get different answers—you’ll often have absolutely no idea why.
Instead, if you used a language like R or Python to glue your analysis together, you can often reduce the whole thing to a single command and that’ll give you a lot more confidence in what you’re doing. Then, if you’re really crazy and you want to vigorously check your work, you can re-write the glue from scratch. If you don’t get the same answer, you’ve got a script—step-by-step instructions— telling you exactly what you did and you can see what went wrong.
Plough: For the executive trading story you used some pretty advanced modelling techniques, what were the challenges using such simulations?
Barry: It’s important to talk things through with experts. To put that story together, I was performing an analysis using a technique called a Monte Carlo simulation. I needed to make sure I understood how that worked and how to interpret the results. In addition, I also needed a lot of speed. I think that analysis required on the order of four trillion simulations. Thus, a faster programming language like C# was very useful.
More recently, I wrote a story about “hot spots” where groups of stockbrokers with troubled regulatory records cluster. To write this story, I sought out an epidemiologist who created several statistical techniques involving identifying disease hot spots and then got his guidance in how to apply that sort of analysis to my data on brokers.
All that said, I think it’s also key to realize that most analysis we do doesn’t need to be particularly sophisticated. A lot of what we do just involves counting things. I just did a story on unreported killings by police and all I really had to do from a technical perspective was send a bunch of emails and add some numbers together.
Plough: Can you go into more detail about the Monte Carlo simulation and why you used it for the executive trading story?
Barry: We started out with 20 thousand executives and the dates they traded their company’s stocks. The first thing was to find how much money the execs earned from their trades, which narrowed it down to a smaller pool of hundreds of profitable trades. The problem was there was no particular evidence that this trading was exceptional. For example, someone buying Google in 2006 would have made as much money whenever they bought the stock that year as it was always going up. So we looked at other academic work into this problem and earlier work by the Journal.
We realized that we had to use a simulation-based approach. Say that an executive traded four times a year, we took those trades and redistributed them at different points in the year then recalculated the gains or losses.
Say this executive avoided a loss of one million dollars, we asked how many random permutations also avoided one million in losses?
If you do this a billion times you can get an idea of the probability that that trade would have been profitable by chance. After that we ended up looking at executives whose trading patterns were fortuitous year after year, then looking at news occurring around trades and thinking about what market sensitive news they might they be party to. We didn’t really get into the underlying analysis in story, as it is somewhat difficult to convey to a reader what this kind of probability means.
Plough: What is the potential for more computational approaches to journalism, such as forecasting? Can you think of any good examples in business journalism specifically?
Barry: I think simple computationally-informed journalism is incredibly useful and provides us with a backbone that makes our work defensible. I can’t think of a business story I’d do that isn’t rooted in or informed by data. This is journalism: most analysis doesn’t need to be sophisticated. Most of what we do is counting things and measuring percent change.
I think at core, data journalism isn’t any different from good records reporting. My colleague Tom McGinty always likes to make this analogy: if you were in a room full of filing cabinets (remember those?) thumbing through thousands of files, would you call that data journalism? Probably not. But that process – thoroughly reviewing a lot of records – is the same.
Plough: I see you started using databases as a journalist while at the Miami Herald. Can you describe your early forays into computer assisted reporting, was there an ‘Aha’ moment when you saw it’s potential?
Barry: Coming to journalism, I had no idea “CAR” was a thing. After I’d been at the newspaper for a while – nearly a year, if I remember right – the managing editor found out I had a background in math and computers. Suddenly, I went from covering city council meetings to working on some of the newspaper’s biggest projects. While it was much more arduous work, I relished the opportunity to get involved in what I thought was some of the most interesting and important work being produced by the paper.
****
Alex Plough is a freelance business journalist based in New York. Originally from London, England, he has a background in data-driven investigative reporting and has worked on a number of agenda-setting projects, such as the award winning Iraq War Logs for the Bureau of Investigative Journalism. He is a graduate of Columbia Journalism School’s M.A. in business and economics reporting, as well as Columbia’s Lede Program – a three month course designed to apply the tools of computer science to journalism. He is particularly interested in the overlapping fields of finance, technology and how young people are shaping the new American economy.
This entry was posted on Monday, December 29th, 2014 at 4:45 pm. It is filed under data journalism, Q&As, Skills and Tradecraft. You can follow any responses to this entry through the RSS 2.0 feed.
Comments are closed.