The Google/China Hacking Case: How Many News Outlets Do a Original Reporting on a Big Story?

MEDIA, 20 Apr 2015

Jonathan Stray – TRANSCEND Media Service

From 800 news stories researched online about the hacking story, all but 121 were identical, 13 contained at least one personal or unique quote, and just seven represented original journalistic work. In other words, 85% of the [800] news articles were verbatim rewrites from other journalists whereas only about 1% constituted original stories.

24 Feb 2010 – We often talk about the new news ecosystem — the network of traditional outlets, new startups, nonprofits, and individuals who are creating and filtering the news. But how is the work of reporting divvied up among the members of that ecosystem?

To try to build a datapoint on that question, I chose a single big story and read every single version listed on Google News to see who was doing the work. Out of the 121 distinct versions of last week’s story about tracing Google’s recent attackers to two schools in China, 13 (11 percent) included at least some original reporting. And just seven organizations (six percent) really got the full story independently.

But as usual, things are a little more subtle than that. I chose the Google-China story because it’s complex, international, sensitive, and important. It’s the sort of big story that requires substantial investigative effort, perhaps including inside sources and foreign-language reporting. Call it a stress test for our reporting infrastructure, a real-life worst case.

The New York Times broke the story last Thursday, writing that unnamed sources involved in the investigation of last year’s hacking of a number of American companies had traced the attacks to a prestigious technical university and a vocational college in mainland China. The article included comment from representatives of the schools and, while it had a San Francisco dateline, credited contributions from Shanghai staff. Immediately, the story was everywhere. Just about every major American newspaper and all the wires covered it.

When I started investigating the issue on Monday morning, Google News showed 800 different reports. But how many of these reports actually brought new information to light? By default, Google does not display duplicate copies of syndicated (or stolen) content, bringing the total down to more than 100 unique pieces of copy. I read each one, and several hours later, I had a spreadsheet recording the sourcing for each story. I also recorded the country of publication, the dateline or contributor location if noted, and the primary publishing medium of each outlet (paper, online, radio, etc.) An excerpt of this data is reproduced in the table below.

Here’s what I found:

— Out of 121 unique stories, 13 (11 percent) contained some amount of original reporting. I counted a story as containing original reporting if it included at least an original quote. From there, things get fuzzy. Several reports, especially the more technical ones, also brought in information from obscure blogs. In some sense they didn’t publish anything new, but I can’t help feeling that these outlets were doing something worthwhile even so. Meanwhile, many newsrooms diligently called up the Chinese schools to hear exactly the same denial, which may not be adding much value.

— Only seven stories (six percent) were primarily based on original reporting. These were produced by The New York Times, The Washington Post, the Wall Street Journal, The Guardian, Tech News World, Bloomberg, Xinhua (China), and the Global Times (China).

— Of the 13 stories with original reporting, eight were produced by outlets that primarily publish on paper, four were produced by wire services, and one was produced by a primarily online outlet. For this story, the news really does come from newspapers.

— 14 reports (12 percent) were produced by Chinese outlets, had a China dateline, or mentioned the assistance of staff in China. For a story about China, that seems awfully low to me. Perhaps this has to do with cutbacks of foreign correspondents?

— Nine reports (7 percent) mentioned no source at all. Five more were partially unsourced. Given the ease of hyperlinks, this frightens me.

— Google News tended to rank solid original stories fairly high in its list. Google says they rank stories based on criteria such as the reputation of a source, number of references by other articles, and the headline clickthrough rate — though they won’t reveal exactly how it’s done. The spreadsheet and table below list stories in the order that Google News ranked them.

— Google’s story-clustering algorithm included three unrelated stories and missed at least one original report. The three extraneous stories were about Google and China, but not about the recent trace. The exclusion of the Financial Times’ excellent piece is a disappointment — perhaps this has something to do with their paywall? Maybe I’m biased because, as a computer scientist, I appreciate the difficulty of the problem — but I actually think this means that Google News works remarkably well, for a completely unsupervised algorithm that crawls billions of pages to find millions of stories in dozens of languages.

— What were those other 100 reporters doing? When I think of how much human effort when into re-writing those hundred other unique stories that contained no original reporting, I cringe. That’s a huge amount of journalistic effort that could have gone into reporting other deserving stories. Why are we doing this? What are the legal, technical, economic and cultural barriers to simply linking to the best version of each story and moving on?

— The punchline is that no English-language outlet picked up the original reporting of Chinese-language Qilu Evening News, which was even helpfully translated by Hong Kong blogger Roland Soong. A Chinese reporter visited one of the schools in question and advanced the story by clarifying that serious hackers were unlikely to have been trained in the vocational computer classes offered there. Soong told me that Lanxiang Vocational School is well known in China for their cheesy late-night commercials and low-quality schooling — more of an educational chop shop for cooks and mechanics than the training ground for military hackers than the Times claims.

Tracing one story doesn’t prove anything conclusive beyond that one story, of course. And using Google News as a filter doesn’t truly represent the new news ecosystem: It excludes lots of smaller blogs and other outlets. Soong said Google News told him that his site is not eligible for inclusion in their results because they don’t include small blogs written by a single author. This seems like an arbitrary distinction, but it’s hard to imagine what defensible choice Google could make in an era where the definition of a news source is so up for grabs.

The table below is an extract from the data I collected, with original reporting highlighted. The full spreadsheet also includes country of publication, primary medium for each organization, and lists whether or not each story hyperlinked to its sources.

Article	Sources	Dateline
Calgary Herald	Xinhua, NYT (via AFP)
ABC	AP, Xinhua	Shanghai
Xinhua	original	Shanghai
MarketWatch	NYT, Xinhua	San Francisco
Reuters	Xinhua, NYT	Shanghai
OneIndia	China Daily, NYT (via ANI)	Bejing
Economic Times	?	Washington
PC Magazine Blogs	NYT
Washington Post	original, NYT	Bejing
Times Online	NYT	Washington
Information Week	NYT, original
FOX News	NYT (via AP)
The Canadian Press	NYT (via AP)
Taipei Times	(via NYT)	San Francisco
The Register	NYT, Guardian UK, blog
The Inquirer	AP
MarketWatch	NYT	San Francisco
ComputerWorld	NYT, blog
Telegraph UK	NYT
PC World	NYT, Xinhua
Telegraph UK	NYT	Los Angeles
Wall Street Journal	original, Xinhua, NYT
The Guardian	NYT, original
Business Week	(Bloomberg)	Washington
AFP	NYT	New York
Reuters	NYT	New York
New York Times	original	San Francisco, Shanghai
Daily Contributor	PC World
CCTV	China Daily, NYT, original
Australia Network News	Xinhua, NYT
After Dawn	?, NYT
Top News	NYT
Daily Latest News	?
Press Trust of India	China Daily, NYT	Bejing
UPI	NYT	New York
Security Pro News	?
Gizmodo	NYT
Tom’s Guide	NYT
Digital Media Wire	NYT	Mountain View
Tech News World	original, NYT
Global Times	original, “agencies”
io9	NYT, Guardian
ZD Net	NYT
Benzinga	NYT
Fox Business	NYT
CrunchGear	NYT
AOL News	NYT, Guardian, WSJ
Tech Blorge	NYT
KLIV	NYT	Silicon Valley
eWeek	NYT
TMCnet	NYT
News.am	NYT
Chattabox	NYT
Datamation	NYT
The New New Internet	NYT
IT Pro Portal	Business Week, Telegraph, PC World
The Hill	NYT
Grab Geek Points	NYT
DBTechno	NYT	Boston
IT Chuiko	NYT
All Things Digital	NYT
Before It’s News	NYT
V3	?
San Jose Business Journal	NYT
Help Net Security	NYT
Channel Web	NYT
Marketing Pilgrim	NYT
The Money Times	NYT
TG Daily	NYT, Guardian
ABH News	NYT, ?
Top News	NYT, ?
PCR	NYT
Top News	NYT
Daily Finance	NYT, Hacker Journals
Shuttervoice	?
Thinq	NYT
Top News	NYT
New York Magazine	NYT
Venture Beat	NYT
Fast Company	NYT
Gather News	NYT
Newser	NYT
NASDAQ	NYT (via Dow Jones Newswire)
Reuters	Xinhua	Shanghai
PC World	NYT, Xinhua
Herald Sun	NYT, Xnhua (via AFP)	Bejing
The Hindu	?
The Times of India	?
Daily Mail	NYT
PC World	NYT, blogs
ComputerWorld	NYT (via IDG)
News.com.au	NYT
The Globe and Mail	NYT, original (via Reuters)
9News	NYT
Redmond Pie	NYT,?
Red Orbit	NYT
New Public	NYT
Sydney Morning Herald	NYT (via AP)
Gulf Times	NYT
MyNews	Xinhua, NYT (via Indo Asian News)
Zeenews (India)	NYT, Xinhua (via PTI)
The Tech Herald	NYT, Guardian	Bejing
Web Pro News	Financial Tines, NYT
Business Insider	NYT
The Financial Express	original, NYT (via Bloomberg)
Tech Eye	NYT, ?
CIO	NYT, WSJ (via IDG)
Tech Blorge	NYT, Xinhua
CNET	NYT, Xinhua
ZD Net	NYT, Washington Post
China Daily	NYT, original
Bejing News	?
What’s on Xiamen	NYT, Xinhua
NPR	NYT
San Francisco Chronicle	NYT, Xinhua (via AP)	Shanghai
The Cap Times	NYT, AP, Computer World
Little About	NYT, Xinhua (via Indo Asian News)	Jinan
Little About	NYT, original (via Asian News Intl)	Bejing
San Francisco Chronicle	NYT (via AP)	San Francisco
Portfolio.com	NYT
World Market Media	?

________________________________

Jonathan Stray leads the Overview Project for the Associated Press, a Knight News Challenge-funded visualization system to help investigative journalists make sense of very large document sets, and teaches computational journalism at Columbia University. Formerly he was an interactive editor at the Associated Press, a freelance reporter in Hong Kong, and a senior computer scientist at Adobe Systems. He has contributed stories to The New York Times, Foreign Policy, Wired and China Daily. He has an MSc in computer science from the University of Toronto and an MA in journalism from the University of Hong Kong.

Go to Original – niemamlab.org

Share this article: email mastodon facebook 🔗 copy link

DISCLAIMER: The statements, views and opinions expressed in pieces republished here are solely those of the authors and do not necessarily represent those of TMS. In accordance with title 17 U.S.C. section 107, this material is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. TMS has no affiliation whatsoever with the originator of this article nor is TMS endorsed or sponsored by the originator. “GO TO ORIGINAL” links are provided as a convenience to our readers and allow for verification of authenticity. However, as originating pages are often updated by their originating host sites, the versions posted may not match the versions our readers view when clicking the “GO TO ORIGINAL” links. This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a ‘fair use’ of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information go to: http://www.law.cornell.edu/uscode/17/107.shtml. If you wish to use copyrighted material from this site for purposes of your own that go beyond ‘fair use’, you must obtain permission from the copyright owner.

Comments are closed.

Click here to go to the current weekly digest or pick another article:

MEDIA:

TRANSCEND MEDIA SERVICE

Solutions-Oriented Peace Journalism

2026, Week 24

TMS Weekly Digest

Search TMS

Music Video of the Week

In Original Languages

Paper of the week

Galtung’s Corner

TRANSCEND Highlights

TRANSCEND Links

Categories

Recent Comments

Extras

Follow TMS

The Google/China Hacking Case: How Many News Outlets Do a Original Reporting on a Big Story?

Read more

TRANSCEND MEDIA SERVICE

Solutions-Oriented Peace Journalism

2026, Week 24

TMS Weekly Digest

Search TMS

Music Video of the Week

In Original Languages

Paper of the week

Galtung’s Corner

TRANSCEND Highlights

TRANSCEND Links

Explore TMS by tags

Categories

Recent Comments

Extras

Follow TMS

The Google/China Hacking Case: How Many News Outlets Do a Original Reporting on a Big Story?

Read more