The growing number of available sequences has radically improved our ability to understand how viruses are moving around – both for emerging pathogens, such as SARS-CoV-2 and long-term endemic pathogens, such as dengue virus in Thailand (the focus of our project). However, a very common problem in these phylogeographic analyses is that we typically only sequence a small subset of viruses from all infections. Many people don’t get sick enough to visit healthcare centres and we will usually only obtain sequences from a handful of locations, usually dictated by where surveillance systems are concentrated. A second limitation of traditional phylogeographic approaches is that while they can infer rates of viral flow between pairs of locations, this only provides superficial mechanistic insight into how a pathogen is moving around and will often ignore transmission flows in unsampled locations. For dengue virus to move between two locations, an infected person needs to move to another location and infect a susceptible individual (via a mosquito), or a susceptible person needs to visit the location of an infected person, or both individuals move to a separate common location. In addition, the density of mosquitoes within a location may help dictate whether infections occur (Aedes mosquitoes that spread dengue virus don’t fly very far so are unlikely to be moving the virus themselves). Finally, local population immunity, which changes over time, driven by changes in the level of infection in the community can also determine the pathway a virus takes. Overall viral flows between locations are made up of all these different factors, however, with existing approaches we cannot disentangle the relative importance of each one in determining spread.
In this project, we developed an analytical framework that uses the time between sequential infections (around 17 days for dengue) to translate the evolutionary relationship between a pair of tips in a phylogenetic tree to the number of transmission steps that separates them. To move from individual transmission steps to the overall number of transmission steps inferred by a tree, we integrate over all possible pathways linking two locations, crucially allowing us to incorporate infections in unsampled locations.
This approach allows us to concentrate our analysis on what is happening at each transmission step, which is a much more tractable analysis level for identifying drivers of transmission. We identify a mobility matrix that best explains virus mobility between all places in Thailand for each transmission step, including places where we do not have sequences but may nevertheless be involved in intermediary (unobserved) transmission steps. We use a likelihood-based approach to fit a model that incorporates how infected people move, how susceptible people move, local vector density, human population distributions and time varying serotype-specific local immunity. To calibrate the model, we use mobile phone data from one of the largest providers in Thailand that captures average human mobility across the country, detailed maps of human and vector population densities, and long-term serotype-specific incidence data.
We apply our framework to 726 geocoded dengue virus sequences from Thailand covering an 18-year period. We find infected individuals spend 96% of their time in their home community compared to 76% for the susceptible population (mainly children as most people are immune by the time that they reach adulthood) and 42% for adults. These estimates of differential mobility hold for the intervening unseen transmission events, as well as the observed cases in the phylogeny. This shows that on average, infected individuals are more likely to stay in and around their home than susceptible individuals and suggests that some subclinical DENV infections may still result in severe enough symptoms to change daily routine and limit mobility. Overall, we find that a third of infections occur outside the 1km2 home grid cell of an infected individual. This is despite infected individuals spending only an average of 4% of their time outside their home cells, highlighting the importance of considering mobility in both infected and susceptible populations when considering viral spread.
We also find that dynamic pockets of local immunity make transmission more likely in places with high immunity from different serotypes and less likely where high immunity exists from the same serotype, consistent with a key role of population immunity in driving pathogen flows. By contrast, we found that mosquito levels did not appear to be important in determining transmission.
Reassuringly, we find that when we entirely remove all sequences from a subset of individual locations from our analysis and use the rest of the sequences to fit the model, we are able to correctly identify the probability of observing the sequenced viruses in the places not included in the model. Therefore, our approach is allowing us to make unbiased inferences despite minimal and heavily biased sequence availability. While we have used this framework for DENV, it is applicable to other communicable pathogens where there exists a time-resolved phylogeny, the generation time distribution is known and is relatively short (days or weeks) and there exists spatial information or other discrete traits.