Introduction
Recently, I published an article in “Towards AI” on how to train a YOLO v5 model that can be used by Vector to detect another Vector. Brad Dwyer, founder of Roboflow, read the article and reached out to me to offer a free upgraded Roboflow account which provides for all the functionality that Roboflow including the ability to release publically available datasets. Thanks Brad!
So that motivated me to work further on this. As I have documented before, the main barrier to using supervised Machine Learning (ML) aka Deep Learning (DL)is the requirement of large volumes of high quality labelled data. Companies such as Google and Facebook can apply DL in far many sophisticated ways because they have access to huge volumes of labelled data. There are many reasons that drive the advantage: (i) A billion plus users, (ii) Inbuilt labelling techniques in products that users can easily use to supply labelled data, and (iii) An early mover advantage where these companies understood the requirement of labelled data and moved to seek the advantage by designing their products and user experience carefully to solicit feedback and labels. These advantages are not something which can be easily translated to other industries and other domains. Which is why I believe companies such as Roboflow will have a major role to play in the democratization of AI/ ML technologies.
What is Roboflow?
Quite simply, Roboflow allows you to enhance your dataset and make it far richer by providing you with sophisticated preprocessing and augmentation techniques (Screenshots available on the left). There are many reasons why such techniques are important to improve the quality of your model. Like we have discussed before, it is hard work to take pictures and label them. But even if you did all the hard work, your trained model will likely overfit your dataset. Now, when you apply your trained model in practice (inference in ML terminology). it is very likely that the model fails, because the reality is slightly different than what your pictures represented. May be, the majority of the pictures you took were at night time with lights, and now you need the model to work in day light conditions as well. Steps such as auto changing the brightness and exposure of your pictures to generate new labelled pictures can help enrich your dataset and thus make sure that your models aren’t custom trained to one particular condition. Another useful feature that Roboflow provides is of Bounding Boxes, which augment the image within the labelled region.
Public dataset
While I will be exploring all of Roboflow and writing more blog posts on the speciifc options that Roboflow provides, for now, I have a public dataset that you all can use to study how to train the models. There are two download options available: (i) The raw dataset which consists of the actual labelled images: and (ii) An augmented dataset which has an enhanced version with various preprocessing and augmentation options. You can download both datasets here. You can use my Google Colab notebook to check out an example of how a YOLOv5 model can trained with these datasets. Roboflow also provides a great option by which you can fork the dataset, and create your own datasets with the preprocessing and augmentation options that Roboflow provides. A screenshot on the left shows you how this looks like. Brad has also been kind to write a blog post with details on the dataset and how you can benefit from it.
Conclusion
This post introduces Roboflow and how you can benefit from it.
PS: If you would like to learn Artificial Intelligence with the help of Vector, I have a course available at http://robotics.thinkific.com I will feel honored to have you as a student. I am also the editor of a Medium publication: “Programming Robots” where you would find other articles on Vector and other robots.