In a recent post, I described the first pass at hand pose detection in VR using tensorflow. I recently used this work as the starting point for this year’s TMCS hackathon. Each year we spend 2 days with students from the Theory and Modelling in Chemical Sciences (TMCS) course and give them a programming challenge. Last year I had students predict VR avatar positions, and this year we spent the two day’s improving the neural network and making a VR experience that used the gesture detection in real time.
To improve the classifier, we collected more data, implemented cross-validation and performed hyper-parameter optimizations on the number of nodes in the hidden layers, the learning rate and the optimizer. With these optimizations, we reached 99.5% accuracy on the labelled test set.
My main focus was to figure out how to make the DNNClassifier provide predictions responsively. If we were to run the classifier for real, we would use Tensorflow Serving. However, this is non-trivial to set up and seems overkill for a simple test application. The DNNClassifier does have a predict method which can be used to run predictions, but as I noted in the last post, by default every call to this method reloads the graph.
There is a solution, however! The predict method can be configured to use a generator, so the graph can be loaded once and repeatedly called upon. Somebody has kindly implemented a simple class that can be used to do this, it is available as fast_predict2.py.
To use it, we have to set up an input function which takes the feature data and yields the features in the correct Dataset form for the DNNClassifier. I found it tricky to figure out exactly what form the numeric feature columns were supposed to take for this. The key is to pass the feature values as a tuple, and to generate a dictionary of feature labels and values, as shown in the gist below:
With that working, we were able to serve much faster predictions straight to the Unity app via OSC messages as shown in the video below:
The pose classification is now fast enough for real-time applications. It is also more accurate than before and so it is easier to trigger poses. There are still some false positives, for example in the video above a pose between pointing and a fist is classified as a fist, so the algorithm still needs tuning.
With fast predictions accurate predictions available, the students could make something fun out of the new capabilities. The students were struck with the idea of emulating what it would like to be Darth Sidious, resulting in this rather sinister and yet hilarious experience in which you use various hand gestures to throw, punch and lightning strike innocent robot people. The following video does not represent the author’s opinion on virtual robot welfare: