Hand Pose Detection in Virtual Reality with Deep Learning

We recently had a 2-day Tensorflow hackathon in the group, and I decided to use it as an opportunity to play with one of our new toys, the Noitom Hi5 VR gloves:


The gloves provide finger tracking, and full hand tracking in VR when combined with HTC Vive Trackers on the wrist, meaning you can effectively see your hands in VR. This opens up some exciting possibilities for interaction and control in VR.

To develop interaction schemes, we need ways of detecting hand gestures and poses. This could be achieved with heuristics or several machine learning methods. In the 2 days of the hackathon, I developed a prototype for classifying hand poses using neural networks in Tensorflow, with the result being displayed in VR in real-time. The code is available over at github.

I spent the first morning collecting labelled data from myself and some members of the group in 4 poses: open palm, clenched fist, the ‘OK’ gesture, and pointing with the index finger. The features consisted of all of the quaternions of the joints of the hand in VR. Each quaternion consists of 4 components, X, Y, Z and W, and there were 21 “joints” (including wrist and overall hand), totalling 86 features.

In supervised machine learning, this is known as a classification task. Given the 86 features, predict which of the 4 poses the hand is currently in. Recent versions of tensorflow have simplified performing common tasks like this with premade estimators. For this application, I used the DNNClassifier, which constructs a deep neural network with a given number of hidden layers, and trains the network to output on a particular node to classify. I constructed a neural network with 2 hidden layers each with 10 nodes, using default settings everywhere, resulting in a structure similar to the image below (except there were actually 86 features).  Many of these features are constant or have little impact on poses so this could be optimised.

DNN Classifier Network Schematic
Illustration of the DNN classifier topology.

This very simple construction resulted in 96% accuracy on the labelled data I produced, 20% of which was reserved for the test set. Not bad for a first attempt, but is likely to be overfitted.

Finally, I made it so the classifier could be queried from within the VR application. In the time I had, I was unable to get the tensorflow server functionality working, so I used OSC messages to communicate the features from VR to the python program running tensorflow, and the classification prediction back to the VR. This introduced a noticeable lag as the tensorflow graph is reinitialized each time a query is made, but it did work! The gif below shows different poses being detected:

tensorglove_v0.1 (1)

A functioning prototype is a great result for a hackathon. The next steps are to improve the classifier with more data and parameter tuning, figure out how to serve it efficiently, and to hook it up to our molecular VR framework so we can finally reach into the nanoscale world with our hands.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s