Klebsiella pneumoniae has garnered quite a reputation in the last decade as a human pathogen of increasing public health concern in healthcare facilities and some communities worldwide. As with other hospital superbugs, its notoriety largely stems from high levels of resistance to the antibiotics usually relied on to treat infections. This resistance is mostly due to the acquisition and accumulation of antimicrobial resistance (AMR) genes that easily transfer between strains. Of greatest concern, there are increasing cases of K. pneumoniae infections that display resistance to the last-line drugs reserved for when all other options have failed, effectively rendering these infections untreatable. Outside of the hospital setting and notably concentrated in areas within the Asian Pacific rim, K. pneumoniae serves up an entirely distinct public health threat from that faced in hospitals. A subset of ‘hypervirulent’ strains, enriched with virulence genes that enhance the disease-causing capacity, can cause invasive and life-threatening infections in people that appear to be otherwise healthy and immunocompetent. Like the AMR genes, these virulence genes can also move between strains.
Under the lead of Professor Kathryn Holt and building upon the early genomic insights gleaned from her 2015 study of n=328 diverse K. pneumoniae genomes (Holt et al. 2015), several members of the Holt lab group have focused their efforts on investigating the evolution, genetic diversity and distribution of genes that enhance the virulence of K. pneumoniae. Prior to the commencement of the work leading to this paper, we had conducted a number of studies revealing extensive genetic diversity in the following key virulence loci in K. pneumoniae that are often associated with distinct lineages and/or mobile elements:
- The polysaccharide capsule encoded by the K locus (Wyres et al. 2016)
- The iron-scavenging siderophore yersiniabactin and genotoxin colibactin (encoded by ybt and clb locus respectively), typically mobilised by a chromosomal element called an integrative and conjugative element, ICEKp (Lam et al. 2018a)
- The siderophores aerobactin and salmochelin (iuc and iro respectively), typically mobilised by the so-called large virulence plasmids (Lam et al. 2018b)
A meeting was called one afternoon in late 2017 to discuss how we could harness the rich information embedded in the genetic diversity of these loci for tracking purposes, especially given that studies often only report on the presence or absence of virulence loci. Midway through, the conversation took a slight detour and we somehow landed on a game of wordplay, attempting to work ICEKp into the lyrics of Ice Ice Baby. “Alright, stop, Kleborate and listen…” And thus Kleborate was born. As for whether or not we managed to integrate ICEKp (pun intended) into Ice Ice Baby, the proof of our lyrical genius is evident on the Kleborate wiki page.
Our vision for Kleborate from the get-go was a tool that rapidly extracts key genotyping information for loci of clinical and epidemiological relevance. What this entails has naturally evolved over the years, and the code, logic and output from the initial Kleborate release made back in March 2018 looks quite different to the version 2.0.0 release used in the paper. Edits to the code often came hand-in-hand with an arduous debugging process (many thanks to Ryan Wick on this front), followed by a re-running of the updated code on the ever-growing dataset of publicly available Klebsiella genomes, re-analysis of the data, re-drawing of figures and so on. A cycle that was finally (thankfully!) broken in late 2020 when we pre-printed the paper. Today’s version of Kleborate outputs an impressive 106 columns of data encompassing assembly metrics, species prediction, MLST, genotypes of the aforementioned virulence loci in addition to the lipopolysaccharide O antigen and hypermucoidy loci rmpADC and rmpA2, and reporting of AMR genes and mutations for 17 different drug classes. The phrase 'one shop stop for Klebsiella genomes’ has been thrown around on a number of occasions.
In our paper, we applied Kleborate to a number of datasets to demonstrate the suitability of the tool as a genome surveillance tool. Starting first with the 2013-14 EuSCAPE surveillance dataset comprising 1600 carbapenem susceptible or non-susceptible K. pneumoniae from across Europe (David et al 2019), Kleborate not only recapitulated the main findings around country vs. strain (i.e. ST) vs. carbapenemase trends (Figure 1), but also revealed novel insights around the K and O surface antigens and the compounding effects of the Omp porin mutations required for enhanced levels of carbapenem resistance.
We next applied Kleborate to a dataset of publicly available Klebsiella whole genomes, which had grown from a relatively measly 3,000 genomes back when we first started the Kleborate project in 2017 to a whopping 13,000+ by 2020. Unsurprisingly, the samples were dominated by clinical samples but still revealed some interesting trends in AMR, virulence and the overall genetic diversity nonetheless (Figure 2).
In particular, and worryingly, we identified 600 genomes where virulence and AMR genes have converged in the same strain (Figure 3) - a potentially deadly combination resulting in infections that are not only invasive due to the hypervirulence determinants (i.e. aerobactin) but also difficult to treat or untreatable thanks to resistance to the most relied-upon drugs, conferred by presence of ESBL and/or carbapenemases.
Lastly, we also highlighted the potential for Kleborate to genotype strains direct from culture-free metagenomics-based sequencing data, using the Baby Biome Study dataset as an example (Shao et al. 2019). We anticipate this will be an increasingly important application, as deep sequencing is rapidly being adopted for both clinical investigations and environmental surveillance.
With the tool cited in at least 74 publications at the time of finalising this study and its implementation into the Pathogenwatch pipeline (Argimón et al. 2021), the usefulness of Kleborate in genomic analyses and surveillance of Klebsiella has already been widely recognised. We have also developed Kleborate-Viz, and along with the Centre for Genomic Surveillance's Pathogenwatch, are two online platforms that facilitate the generation and/or exploration of Kleborate data. Going forward, the Kleborate code will continue to be modified in line with developments in research around AMR and virulence in Klebsiella for improved reporting and interpretation of clinically-relevant features.
The study describing Kleborate and insights gleaned from its application to 13,000+ publicly available genomes and metagenomes from the Baby Biome Study is now published online at Nature Communications.