How to turn 19,000 data points into 1 graph.
Turning complex data sets into a story about responses to influenza vaccines.
Science is stories.
Good stories move science forwards. The stories come from the data and turning data into a story is a long and iterative process. The more data you have the longer it can take, as our tools get better at producing more data per sample it is getting harder to find the story. In our recently published study (Inflammatory Responses to Influenza Vaccination at the Extremes of Age) we were measuring 27 different mediators after giving 2 different vaccines 3 times to 3 different ages of mouse, sampling at 8 timepoints after vaccination with 5 replicate animals at each timepoint leading to 19,440 data points. This was a tricky knot to unpick.
The aim of the study was to investigate whether age changed the immune response to vaccination. In particular we were interested in whether age affected inflammation after immunisation. Inflammation sounds bad, but we actually need a small amount to kick the immune system and make the vaccine work. We know that vaccines work less well at the extremes of age and wanted to determine whether the initial reaction to the vaccine shaped how well it worked. To investigate the inflammatory response, we used a tool called Luminex. Luminex measures chemical messengers in the blood called cytokines; these chemical messengers recruit cells of the immune system to the site of vaccination, activate them and shape the type of response they generate. However, as mentioned, Luminex generates LOTS of data: 19,440 data points. The first time we had the complete dataset, we had to book a study room to have sufficient space to spread out all the bits of paper with the data on. So how did we move it from there into a story?
It took four things –perseverance, perspective, peer review and bio-informatics.
Perseverance: With any dataset, but large ones in particular, time is the most critical factor in finding the story. You need to spend time with the dataset, getting to know it, formatting and reformatting: sorting by size, time, alphabetically, into classes of cytokines. Analysis can’t be done piecemeal; several times I would get close to understanding the data but then have to take time off to do something else and when I came back to the data would have forgotten the trends I had been close to identifying and have to start from scratch. There were several dead ends and times when I wanted to give up as there was no discernible pattern in the data.
Perspective: That said, analysis can’t all be done in one sitting. You need time for the subconscious to churn it through, you need to read around the subject to see what other people have seen, you need conversations with colleagues and chance insights when on the loo. The creative process can’t be rushed.
Peer-review: Exposing your precious story to the slings and arrows of outrageous review is often frustrating and can be soul-destroying. However, in this case (and I grudgingly admit quite frequently for other studies) peer review significantly improved the paper. It gave us time and perspective to rethink the conclusions and suggested new ways of analysing and thinking about the dataset.
Bioinformatics: It turns out that, whilst easy and accessible, excel may not be the most effective tool for looking at big datasets. There are a range of other bioinformatic tools, which can help in the analysis. In this case we used principal component analysis. Now I have no idea how the maths behind this actually works, but I do know it squishes the 19,000 or so variables into 2 so that you can then see broad trends in the data and then from there go back and look for individual variables of interest.
So what did we learn?
Having spent time staring at the data, a number of patterns did emerge. First of all, age is a major factor in the inflammatory response to vaccination; with different cytokines being produced in young, adult and elderly animals. Secondly adjuvants can shape the response. Adjuvants are compounds that improve vaccine efficacy; the addition of an adjuvant called MF59 reduced age associated differences, inducing higher levels of the cytokines IL-5, G-CSF, KC, and MCP-1. The level of these four cytokines correlated with the level of antibody produced after vaccination. This is important because it shows that poor responses at the extremes of age can be overcome through the addition of adjuvants; it also gives us some insight into what response to a vaccine can lead to the best results. Taking a complex (and large) dataset and turning it into a story was a lengthy process, but has helped us understand more about the immune response to vaccines.