As part of my Insight project, I built Flu Forecaster, a project that aims to forecast influenza sequence evolution using deep learning. In it, I used a combination of variational autoencoders (VAEs) to translate time-stamped influenza protein sequences into a continuous coordinate space, and then used gaussian process regression to forecast future continuous coordinates that could be translated back to sequence space.

Learn More: Interactive Blog Post

In order to achieve real-time surveillance, we need machine learning models with high learning capacity that are also highly interpretable. I am currently working on extending neural fingerprints to protein structures using convolutions on graph-structured data. In the process, we are writing a graph convolution implementation as a Python package, as well as a software package for converting protein 3-D structures into its corresponding "protein interaction graph" representation. While these tools are developed with the goal of deep learning in mind, we also anticipate their general use as well.

*Software:*

- Graph Fingerprint on GitHub
- Protein Interaction Network on GitHub
- Protein Convolutional Networks on GitHub

*Senior Collaborators:*

I once saw a probability estimate with mean 0.8 and variance 0.3. From that point onwards, I knew frequentist estimates could be horribly wrong, and decided to go Bayesian.

As part of my learning journey, I decided to make publicly available a GitHub repository of Bayesian statistical analysis recipes in PyMC3 featuring models and data that I've seen elsewhere. Most of them I implemented from scratch, to get familiar with PyMC3 syntax and to get familiar with the logic of Bayesian statistical modelling.

Some models that are implemented here include:

- Binary and multinomial classification.
- Neural networks
- Linear regression
- Hierarchical modelling

Learn More: GitHub Repository

With its segmented genome, influenza viruses can reassort with other influenza viruses to produce hybrid progeny. Think of it as being like shuffling a red and a blue deck of cards in a box, and picking out each member of the suite at random.

As part of my thesis work, I developed an phylogenetic heuristic algorithm to identify reassortant influenza viruses. Using this method, my colleagues and I were able to show that reassortment is over-represented (relative to a null model) when crossing between viral hosts; additionally, the more evolutionarily distant two viral hosts were, the more over-represented reassortment was.

This may generalize across domains of life, where reticulate evolution enables organisms to more easily switch between ecological niches.

Learn More: Thesis

Zoonotic infections in humans originate in wild animals. To better understand their contact structure, I have been developing an open source hardware and software kit for monitoring wild animal behaviour using the Raspberry Pi, Python, and 3D printing. The time-lapse cameras, which we call TikiCams (they look like Hawaiian lamps when mounted), are based on off-the-shelf hardware available at computer and hardware stores.

*Videos:*

*Images:*

- Hanging cameras
- Laptop + Pi in shed
- Tikicams in the wild - image 1
- Tikicams in the wild - image 2
- Selfie with the Tikis

*Resources:*

- GitHub Repositories: