Twitter Posts Help Scientists Track Flu in Real Time

May 10, 2017

By Lauren Santye, Assistant Editor

Article

Computational model forecasts the evolution of the disease up to 6 weeks in advance.

Scientists have harnessed Twitter posts in combination with key parameters of each flu season’s epidemic to project the spread of the flu in real time.

The novel computational model uses posts on Twitter with parameters that include the incubation period of the disease, the immunization rate, how many people a patient can infect with the virus, and the presence of the viral strains.

The model was tested against officialinfluenzasurveillance systems, and demonstrated the ability to accurately forecast the evolution of the disease up to 6 weeks in advance.

“In the past, we had no knowledge of initial conditions for the flu,” said lead investigator Alessandro Vespignani.

Initial conditions indicate where and when an epidemic has begun and the extent of infection. Using these initial conditions, the investigators incorporated Twitter into the model.

“This kind of integration has never been done before,” Vespignani said. “We were not looking for the number of people who were sick because Twitter will not tell you that. What we wanted to know was: Do we have more flu at this point in time in Texas or in New Jersey, in Seattle or in San Francisco? Twitter, which includes GPS locations, is a proxy for that. By looking at how many people were tweeting about their symptoms or how miserable they were because of the flu, we were able to get a relative weight in each of those areas of the US.”

This development will help public health officials plan for necessary medical resources and launch campaigns that encourage individuals to get vaccinated.

Back in November 2013, the CDC announced the “Predict the Influenza Season Challenge,” which invited external researchers to participate in advancing the science of forecasting infectious diseases. The investigators have participated in the challenge ever since, with a new paper that covered their projections in the United States, Italy, and Spain for the 2014-2015 and 2015-2016 flu season.

During that period, the investigators applied forecasting and other algorithms weekly to the key parameters taken from the Twitter data. This process allowed the investigators to obtain the largest number of ways the disease might evolve.

The resulting simulations were then matched with the surveillance data obtained from the CDC and clinical and personal reports of influenza-like illnesses from the 3 countries.

“The surveillance data tells us the ground truth for the past 4 weeks, but it is always delayed by about 1 week because you need to get the report from the doctor,” Vespignani said.

The investigators were able to select the model that would most likely forecast the future by analyzing the evolving dynamics from the past data.

During the challenge, the novel model differed from other participants because of the explicit modeling of the disease’s parameters. They could identify the week the epidemic would reach its peak and the magnitude of the peak with an accuracy of 70% to 90% six weeks in advance.

“By capturing the key parameters, we could track how serious the flu was each year compared with every other year and see what was driving the spread,” first author Qian Zhang, PhD. “That is what the public health agencies and the epidemiologists really care about. We are not just playing a game of numbers, which is what straightforward statistical models do.”

The authors noted that although they used Twitter data for their experiments, the model can be used with data from different digital sources and online surveys of patients, such as influenzanet.

“Our model is a work in progress,” Vespignani said. “We plan to add new parameters, for example, school and workplace structure. This is not a challenge in the sense that you want to win. This is a science challenge in which you want to learn—–to see that there is not a single model but a portfolio of models that will tell us new things.”