As part of my Insight project, I built Flu Forecaster, a project that aims to forecast influenza sequence evolution using deep learning. In it, I used a combination of variational autoencoders (VAEs) to translate time-stamped influenza protein sequences into a continuous coordinate space, and then used gaussian process regression to forecast future continuous coordinates that could be translated back to sequence space.
Learn More: Interactive Blog Post
In order to achieve real-time surveillance, we need machine learning models with high learning capacity that are also highly interpretable. I am currently working on extending neural fingerprints to protein structures using convolutions on graph-structured data. In the process, we are writing a graph convolution implementation as a Python package, as well as a software package for converting protein 3-D structures into its corresponding "protein interaction graph" representation. While these tools are developed with the goal of deep learning in mind, we also anticipate their general use as well.
I once saw a probability estimate with mean 0.8 and variance 0.3. From that point onwards, I knew frequentist estimates could be horribly wrong, and decided to go Bayesian.
As part of my learning journey, I decided to make publicly available a GitHub repository of Bayesian statistical analysis recipes in PyMC3 featuring models and data that I've seen elsewhere. Most of them I implemented from scratch, to get familiar with PyMC3 syntax and to get familiar with the logic of Bayesian statistical modelling.
Some models that are implemented here include:
Learn More: GitHub Repository
With its segmented genome, influenza viruses can reassort with other influenza viruses to produce hybrid progeny. Think of it as being like shuffling a red and a blue deck of cards in a box, and picking out each member of the suite at random.
As part of my thesis work, I developed an phylogenetic heuristic algorithm to identify reassortant influenza viruses. Using this method, my colleagues and I were able to show that reassortment is over-represented (relative to a null model) when crossing between viral hosts; additionally, the more evolutionarily distant two viral hosts were, the more over-represented reassortment was.
This may generalize across domains of life, where reticulate evolution enables organisms to more easily switch between ecological niches.
Learn More: Thesis
Zoonotic infections in humans originate in wild animals. To better understand their contact structure, I have been developing an open source hardware and software kit for monitoring wild animal behaviour using the Raspberry Pi, Python, and 3D printing. The time-lapse cameras, which we call TikiCams (they look like Hawaiian lamps when mounted), are based on off-the-shelf hardware available at computer and hardware stores.