Using Deep Learning for Avatar Aesthetics In Virtual Reality

As part of their course, TMCS students at Oxford spend a couple of days developing their programming skills in a hackathon. This year, we challenged the students to apply machine learning to various problems relevant to our research.

We have recently developed a multi-user virtual reality environment for molecular dynamics simulations, using the Nano Simbox. Within this environment, users can see each others headsets and controllers, and interact with the same simulation. A testament to the quality of the tracking provided by the HTC Vive is that users can confidently assume that the position of the head and controllers in virtual space matches that in VR – we often get new users to reach out and touch the head of another user in VR to get them used to the idea. While this is already great, can we render more than just the headset and controllers? For us, with our multi-user setup, it would be extremely beneficial to have some sort of full body representation of the users, as it would make it much easier for users to perform complex tasks together if they can reason about where each others limbs are as well as the head and hands. More broadly, for many novice users of VR the frequent lack of an avatar for oneself can be disconcerting, and many video games and VR applications could benefit from a full body avatar.

The Vive Trackers (or “pucks”) recently released by HTC are an obvious solution to this problem. They have already released code which produces full avatars for a user wearing several of these trackers on their body. However, for multi-user VR this isn’t very practical as we would need an inordinate number of trackers, and it would be cumbersome to put them on every time you stepped into VR.

We do already have a lot of information about the user: the positions and orientations of their head, left hand and right hand from the devices being tracked. Can we use that information to construct an avatar? We decided that this problem would be an interesting challenge to tackle in a 2-day hackathon: try to predict the full body positions through training an artificial neural network on example poses. Neural networks – or “deep learning” – have become something of a buzz word in the field of machine learning, due to their unparalleled success in several difficult tasks including image recognition and speech recognition.

The idea is to produce a training set with labelled positions of the head, left controller and right controller as the features, and positions of other points of the body as the targets to be predicted. Since at the time of writing the pucks had not yet been released, we commandeered our multi-user VR to track additional controllers carefully placed (with a lot of duct tape) on representative points on a users body. We chose the elbows, top of the back, the belly button and the knees for these representative points. The images show how the controllers were placed on the body, and what this looked like in VR through a simple render of the controllers.

We had 7 volunteers from the group and the wider Centre for Computational Chemistry at Bristol get duct taped up and perform various representative tasks in the Nano Simbox, such as tying knots in peptides and making chemical reactions happen, as well as performing more general movements. The video below shows what the avatar looked like with just the controllers being rendered on the various positions on the body. While crude, the representation does add a more physical nature to the user’s representation compared to simply a floating head and controllers.

The data collection resulted in 36000 example poses from a variety of people of different shapes and sizes. We gave this data to 4 TMCS students – Laszlo Berencei, Callum Bungey, Thomas Fay and Jonathan Milward –  who had spent a couple of days learning about python and scikit-learn, a set of python tools for performing machine learning, and tasked them with setting up a pipeline for training a neural network to predict the body positions given just the headset and controller positions.

There were several tasks necessary to complete:

  • Preprocessing and standardising the data.
  • Removing outliers.
  • Setting up the scikit-learn pipeline for training the machine learning algorithm.
  • Creating a renderer so we could compare the predicted avatar positions to the training set.

Since we only had two short days to do this, the aim was to get a skeleton pipeline from the raw data through to rendering the predicted positions up and running to see if the idea had any chance of working.

In preprocessing the data, we centred all the poses and rotated them about the Y axis to best match a reference frame. This kind of standardisation is very common in molecular simulation, where one seeks to superpose a structure over some reference (for example a protein crystal structure). This was necessary because the data was recorded in a 5m x 5m space, but for the purposes of predicting body positions only the relative distances  and orientations between the controllers and the head are important.

We had to remove a lot of outliers in the data, where occlusion resulted in a systematic drift or “freezing” of the controllers during recording. For this, we used the Isolation Forest method, which worked well in cases where the positions drifted to highly unrealistic positions.

For the machine learning, we opted for the Multi-Layer Perceptron Regressor, more commonly referred to as a neural network regressor. For a starting point, we used 2 hidden layers, and used a Grid Search to start tuning the hyperparameters of the network, varying the number of neurons in the two hidden layers and the regularization term alpha.

To render the data, the students used pyglet to create simple 2D projections of the avatar positions and the predicted positions, so we could visually evaluate the performance of the regressor.

By the end of the two days, we’d hacked together all these components and trained the regressor on a small subset of the data. This subset consisted of 1148 training frames and 688 test frames all from one continuous session with one person, so the results are extremely preliminary! Our R2 score – a measure of how well our model will predict future samples – was a reasonable 67%, leaving plenty of room for improvement. The plot below shows the distribution of error for each target in the test set. The plots show that the median error for the targets vary between 15 and 20cm, with the knees being the least well predicted values (unsurprisingly).


When we rendered the predicted positions in comparison to the true values, we found that the neural network predicted values are already qualitatively reasonable for aesthetic purposes, as shown in the video below (you will want to slow the playback down). The white square is the headset position, the red squares the position of the controllers (the hands), the blue squares are the true positions of the other body parts and the pink squares are the predicted positions.

To take this initial exploration forward, we will want to train the neural network on all of the data, and perform more sophisticated tuning of the hyperparameters. To do this, we’d move over to using TensorFlow, a GPU accelerated neural network library. The results so far are very exciting and we hope that a practical solution to producing virtual avatars will emerge from this work. The repository used for the hackathon, which contains the data as well as the scripts we’ve written so far for processing and analysing is publicly available here.


Exploration of molecular systems with Virtual Reality

Over the last few months I’ve been developing an interactive molecular dynamics platform that supports Virtual Reality (VR). Using the Nano Simbox  framework, I can run a research grade GPU-accelerated molecular dynamics simulation (OpenMM) and visualise it in VR.

Molecular simulations are incredibly complex systems as every atom can interact with every other atom in 3D. For example, many drug design problems are akin to a sort of “3D tetris”, where you try to find a drug with the right shape such that it fit snugly into an enzyme’s active site. Virtual reality is a natural environment for exploring these systems, as the inherently 3D nature of VR interaction means we can at last manipulate the system in an intuitive way.


Simulation of penicillin binding to beta-lactamase, an enzyme instrumental in antibacterial resistance.

We’ve experimented with a variety of VR solutions, and have found the HTC Vive to be the most robust and enjoyable to use. The fact that you can freely walk around the space and that the controllers are tracked extremely well enables powerful interaction with a simulation. Pulling the triggers on the controllers results in a “force probe” being applied to the selected atoms,  meaning you can influence the simulation in a physically meaningful way.

The visualisation and interaction tool we’ve created opens up some exciting prospects. Simply exploring the molecular structure in 3D and observing how the system responds to interaction can be a powerful way of gaining insight into its mechanisms, but I believe we can take this further.

One of the biggest problems in molecular simulations is the so called ‘rare event problem’: interesting events going from one molecular configuration to another (e.g. protein folding, chemical reactions) may occur on the order of milliseconds or, while we typically our simulations are restricted to the order nanoseconds due to the computational intensity of calculating the interactions between all the atoms. In order to compute metrics that can give insight into the system and be compared against experiment, the event has to be sampled many times in order to converge statistics. This has led to a proliferation of methods that attempt to accelerate the occurrence of rare events so that many short simulations can be used to capture the rare event. In previous work, I made some improvements the Boxed Molecular Dynamics (BXD) algorithm, which is an example of one of these methods.

The problem with many of these methods is that they usually require the researcher to set up in advance a set of variables, called collective variables or reaction coordinates, that govern the event of interest. For example, in a simulation of a drug binding to an enzyme, one of the obvious variables governing the binding is the distance of the drug from the active site: it clearly needs to be minimised. However, there may be other more subtle variables as well, such as the angle of the drug as it approaches the protein, or the position of a particle side-chain of a protein. Determining what these collective variables are requires a mixture of chemical intuition and a large amount of trial and error on the part of the researcher, and limits the ability to automate molecular simulations. For simulations of large biomolecules such as proteins, identifying these collective variables can be extremely difficult as the concerted motions between the atoms are incredibly complex. For example, 1% of proteins in the Protein Data Bank are knotted, but it is not clear why or how they end up in this state.


Recording the path a methane molecule takes being pulled through a carbon nanotube.

There are methods that attempt to automatically identify the important collective variables for a particular system, but they typically require an initial path between the states of the system. How do you find this initial path if you don’t know what the
important collective variables are? Finding these paths are exactly what interactive molecular dynamics could be useful for.

In the coming months, I’ll be seeing if we can use interactive molecular dynamics with virtual reality to enable researchers to find paths in molecular simulations, which can then be passed on to path refining and collective variable analysis methods. Combining human intuition with automated methods in this way lead to a workflow that provides enhanced insight to chemical problems more quickly.





Python: Installing PIP and Virtualenv without root

Using virtualenv in combination with pip is a great way to run isolated python environments with all kinds of packages. However if the system you are using does not have pip, easy_install or virtualenv installed system-wide, it’s a little bit tricky to set up.

With some experimentation and some googling (stackoverflow), here’s a quick guide to do it. I’m assuming you have python already on the system.

First, we need ez_setup. The following commands grab the scripts and install them to ~/.local/bin:

python –-user

Now we can install pip in the same way:

curl -O
python –-user

Now we have pip, and just need to add it to the path variable (add it to your .bashrc):

export PATH=~/.local/bin:$PATH

Finally, we can install packages. Get virtualenv:

pip install --user virtualenv

Now you can set up virtual environments for python, which is well-documented here.

Babun – A terminal emulator for Window

As a scientific programmer, I spent most of my time working on Unix systems, and have grown accustomed to the range of features in various shells. I also develop on Windows occasionally, as in my opinion the Visual Studio IDE is excellent for large projects, and some of my projects currently require it.
I mostly used Git Bash, bundled with the Windows installation of git for my command line needs as it provides just enough to get by. However, I’ve recently been working exclusively on my Windows machine and needed a set up that was slightly more reminiscent of my meticulously crafted oh-my-zsh setup in iTerm2 for OS X. The obvious choice is to configure Cygwin but the effort required is non-trivial.

I stumbled upon Babun, which has done all the hard work for me. It comes with Cygwin, oh-my-zsh, a Mintty console and a whole load of other stuff. Right out of the box, it comes close enough to my usual set up on a unix system to be practical, and doesn’t look like something from the punch-card era. Install tmux with the following command, and nobody will ever know you’re using Windows:

pict install tmux

danceroom Spectroscopy in Bhutan


In February I had the unexpected privilege of taking danceroom Spectroscopy to the first Bhutan International Festival, with three other members of the dS team – artist Becca Rose, musician Lee J. Malcolm and University of Bristol PhD student Mike Limb. We transported the gear over 4800 miles to the Himalayan country of Bhutan, a remote Buddhist kingdom rich in history, culture and tradition.

After a 48 hour journey from Bristol to Thimphu, via London, Paris and Delhi, we arrived in the beautiful Centenary park with – to our surprise and immense relief – the dS system all present and in one piece. Setting up dS in Asia in a park for the first festival of it’s kind posed some new challenges, from hair-raising bus journeys through the Paro-Thimphu valley, last minute computer repair in the hotel room, packing down all the equipment every evening to avoid exposing it to sub-zero temperatures, getting repeated electric shocks from dodgy wiring, to manually grounding a geodesic dome to prevent the whole installation from becoming live. Through the ceaseless efforts of the team, the festival organisers, local volunteers and a good amount of duct tape we overcame all the hurdles and successfully ran an installation for the duration of the festival.

The festival was a 10 day extravaganza consisting of arts, live music, film and food; all open to the public.  We ran the dS installation every day peppered with talks, movement workshops and music sessions. Hundreds of locals came and spent time exploring the installation, which is the first of its kind to ever be displayed in Bhutan. Local children in particular spent hours at a time playing games and dancing with one another, then suddenly rushing off only to come back an hour later with more friends. We even had children get into trouble for skipping school to come play!

The theme of the festival was collaboration, with projects between both local and international artists occurring throughout the week. dS has been a collaborative project from it’s inception between artists, scientists, musicians and dancers, so we were well suited to the theme. Interacting and working with so many talented artists from all over the world generated some unforgettable moments, including a spontaneous dance by the Monks of Majuli in the dS dome, with their music and movements blending seamlessly with the soundscape and visuals generated by dS (full video here), and our very own Lee J. Malcolm performing with the Rajasthani folk musician Kutle Khan.

The ten days in Bhutan sped by in a surreal blur of molecular dynamics, incredible music and film, plates of chilli cheese and momos, and visits to remote mountain monasteries above the clouds.  To cap it all off, on the last day of running the installation we were suddenly bundled into a taxi to meet their Majesties the King and Queen of Bhutan, after being fitted with ‘goh’, the Bhutanese formal wear.

I can safely say I never expected my career in computer science to lead to attending such a special event. It was an honour to be included and I hope the Bhutan International Festival continues to be an incredible success in future years, I’ll be back!

Neat Python Debugging Trick

Debugging python is a pretty painless experience using pdb, the built in standard debugger. The basic usage is to insert breakpoints in the code:

import pdb

print('hello world')
print('entering breakpoint')
sum = 0
for x in range(0, 10):
    sum += x

When the interpreter hits ‘pdb.set_trace()’ it’ll launch the interactive debugging interface, which lets you investigate the different variables, print things and run calculations. A tutorial on using pdb can be found here.

A really useful trick using this snippet allows for quick debugging anywhere in your code. Place the following lines at the top of your python script:

import signal
    def int_handler(signal, frame):
       import pdb
signal.signal(signal.SIGINT, int_handler)
Then run the script, and wherever you want to start a debugger press ‘ctrl’+’c’. Press ‘c’ to carry on with execution. Really useful for unpredictable bugs!