Teaching a robot to read sign language
One of the primary use cases of deep learning is in image classification. When translated to robotics, it implies providing a robot the…
One of the primary use cases of deep learning is in image classification. When translated to robotics, it implies providing a robot the ability to see and understand its environment. In this article, we will discuss how to teach the Anki Vector robot to understand the human sign language. As an example, we will consider the American Sign Language (ASL) for English alphabets. An excellent video introducing this sign language is available here.
We build on the experimental program and dataset provided by the Anki Vector SDK. Here are the steps:
Gathering Labelled Data: In this step we use the data gathering program to generate a labelled dataset. We show Vector the human signs for all the alphabets, and label the images manually. This is a very intensive effort. To expand the volume of the dataset, the program generates 10 image copies for every captured image by slightly rotating the captured image along the X and Y axis. This helps in increasing the variance in the dataset. All together, we captured 8500 images for 26 characters and the background (No sign displayed). The images were taken at different times of the day to simulate different lighting conditions. The same human hand and background were used. We have published this dataset in Kaggle. We hope that others can expand this dataset by capturing images against different backgrounds. Multiple backgrounds and different human hands would greatly expand this dataset.
Training a Deep Learning Model: We generated a Kaggle kernel based on the program provided in the Anki Vector. This builds a Convolution Neural Network (CNN) based on the keras library. The CNN built is very typical for image classification, describing this in detail is beyond the scope of this article. There are some excellent articles describing how CNNs are bulit and function; here is one good reference. The dataset was split into a 7/10, 1/10, 2/10 ratio to generate training, validation, and test datasets. The test dataset was used to evaluate the accuracy of the CNN. The measured accuracy is ~96%. Such a high accuracy suggests an overfit model, meaning that the model would likely fail if used against a different background and/or different human hand. This again illustrates the importance of gathering a varied and larger labelled dataset. Please consider building this dataset to make it rich.
Applying the Deep Learning Model to recognize handsigns. In this step, we let Anki Vector use our model to recognize human signs shown to it. We use the recognizer program provided in the SDK to demonsrate how effectively Vector can understand human signs. Below is a video. Note how Vector easily recognizes ‘f’, ‘b’, ‘and ‘l’ with high confidence, struggles a bit with the ‘a’, and ‘o’ (lower confidence), and mispredicts ‘x’ (should have been ‘h’).
Next Steps: This article touches the surface of how deep learning can be of use in the field of robotics. The project is obviously simplistic; but can be extended in many ways. An example of a massive effort to create tagged data to support learning in robotics is RoboNet. RoboNet consists of 15 million video frames, collected by different robots interacting with different objects in a table-top setting. Learning models trained using such large datasets would behave much better when transferred to new surroundings (such as a new background, or a new human hand)
In summary, we hope you enjoyed this effort. Here is the Kaggle link once again.
If you want to learn more about Vector, or learn AI with Vector, I have a new course at: https://robotics.thinkific.com