Tracking News Trends

The fact that news is no longer just in print and is now well-represented online, makes it possible to map out the news as it spreads across the globe. By observing the news online, Cornell computer scientists have learned how to track, as well as analyze, what they term the "news cycle," or the way stories have of rising and falling in popularity.

Three Cornell researchers, Jon Kleinberg, Tisch University Professor of Computer Science, postdoctoral researcher Jure Leskovec, and Lars Backstrom, a graduate student, were able to track 1.6 million news sites online, including blogs and some 20,000 major media websites, during the 3 month lead up to the 2008 US presidential election. This represents a total of 90 million news pieces; likely the largest analysis yet performed on online news sources.

News Cycle

The three researchers discovered a specific pattern as stories rose to the fore and then lost steam within a few days' time. They also encountered what they termed a heartbeat pattern, a kind of tradeoff between the blogs and the major media sites. In terms of the media sites, the researchers found that the stories made slow gains in popularity and then died off fast. In the blogs, on the other hand, the stories became popular with alacrity and then stuck around much longer, since there was a process of discussion taking place. In every case, however, stories get pushed aside for new ones.

The researchers believe that their work may help us understand if this so-called "news cycle" is just a term that demonstrates our understanding of current events, or is a true, measurable phenomenon. The scientists think the latter is true and offer sound mathematical reasoning to show how this might work.

Kleinberg explains that scientists would hope to track "memes" or units of information, in cyberspace, but comments that deciding which ideas are true memes is still difficult to pin down. The scientists got around this issue by tracking quotes that were found in news items, since these remain consistent even as the language of the stories varies from site to site and from author to author. The quotes tend to mutate a bit as they travel, but the scientists created an algorithm to find and group phrases that contain similar words and exist within longer phrases. These connections were termed "phrase clusters."

Phrase Clusters

The scientists were then able to determine how many posts appeared containing given phrase clusters and to spot how much time these occupied space on the web. Throughout August and September, data threads rose and fell on a weekly rhythm, though there were times when the news stuck around longer or attracted more attention, such as during the Democratic and Republican conventions, during the "lipstick on a pig" thread, as concerns rose during the financial crisis, and amidst the discussions surrounding the bailout plan.

The researchers think that the gradual rise of news pieces suggests a process of imitation in which it would be noticed that more and more sites were carrying a piece and others would jump on the bandwagon. But new stories come in and drive out the old. The researchers say this lends credence to a mathematical model that laces the concepts of imitation and "recency," since any predictions based on just one of these concepts failed the accuracy test.

Heartbeat Phenomenon

The heartbeat between the media and blogs is seen as blog activity peaks around 2.5 hours after the mainstream media peak on a given story. The vast majority of these news items start with the mainstream and only 3.5% begin in the blogosphere before moving into the mainstream online media.