Background
YOLO (You Only Look Once) is a class of Machine Learning models that facilitate object detection and image classification with great speed and accuracy. YOLO is very popular in applications that require real-time object detection such as robots needing to understand the environment around them.
Our posts describing how to let a Vector robot detect another Vector robot (using the Python SDK) using the YOLO have been very popular… In the past we have written a comparison between YOLOv4 vs YOLOv5, YOLOv5 vs YOLOv6 vs YOLOv7, and on YOLOv8. This article continues the series exploring the recently released YOLOv9.
YOLOv9
The latest update to the YOLO models: YOLOv9 was released on 21st February 2024. The YOLOv9 academic paper mentions an accuracy improvement ranging between 2-3% compared to previous versions of object detection models (for similarly sized models) on the MS COCO benchmark. While 2-3% might not seem a lot, it is actually a big deal because many classes of object detection models have been stuck at the 50% accuracy level on MS COCO.
It is also important to remember here that the size of the model determines the computation cost for object detection… therefore an apples-to-apples comparison can be made only between two models of the same size. For robotic systems, the preference would be to have great accuracy on the small to medium sized models. In this regard, YOLO models have made great improvements over the years, the table below shows how the mAP 50:95 score (Mean Average Precision between 50-95% detection, higher is better) has improved from YOLOv5 to YOLOv9, while the size of the model has shrinked. Data in the table has been extracted from Table 1 in the YOLOv9 paper. YOLOv9 is a great model just in terms of the number of parameters of the model… this directly reduces the compute time to do training and run inference.
Let’s train YOLOv9 for Anki Vector
However, just because that a model is proven to achieve better mAP scores on the MS COCO dataset, it doesn’t mean that it will do better for your specific use case… you need to try it out the new model and identify yourself that it is indeed better for your use case. Thanks to our friends at Roboflow, fine-tuning a YOLOv9 model on a custom dataset is very easy and cheap to do on a Google Colab Instance. Specifically for our specific use case, we want to train a YOLOv9 model that can help the Vector robot detect another Vector robot. We have a notebook which you can run yourself and train the YOLOv9 model on a Google Colab instance with the Vector dataset (which we have made publicly available as well at Roboflow). The Vector dataset comprises of 590 images of training, 61 for validation, and 30 for test, and has proven to be very useful in terms of training a YOLO model and using it for detecting another Vector robot.
Having fine-tuned the YOLOv9 model, we can compare it with other YOLO models which have been fine-tuned on the same dataset. The following video shows how the results from each YOLO model (medium sized configurations chosen akin to the table above) for a camera feed from Vector with another Vector robot in the vicinity which the YOLO model is trying to identify. Enjoy the video.
Editor’s note: If you are enjoying our newsletter, please consider donating a gift subscription to a loved one. We depend heavily on word of mouth advertising for our newsletter.
Rating the model
Once we have the results from each model, we can make a decision on which model was best for our use case. There are many ways of evaluating the performance of a model during inference. Let’s discuss some of these:
Key Performance Indicators for Training. The following tables compare some Key Performance Indicators (KPIs) for the different YOLO models. If you wish to understand these KPIs better, a previous post explores this topic in considerable depth.