YOLOv5 vs YOLOv6 vs YOLOv7
Which YOLO version will help Anki Vector the most?
In a previous article, we discussed how to evaluate and choose the best Machine Learning (ML) model tailored to your use case. Specifically, we evaluated YOLOv5 vs Scaled YOLOv4, and came to the conclusion that for our specific task of making a Vector robot detect another Vector, Scaled YOLOv4 compared better.
New beasts in town!!!
In the last few weeks, two new versions of YOLO have popped up. We wrote about YOLOv7 and YOLOv6. This article presents an evaluation of YOLOv5 vs YOLOv6 vs YOLOv7 in the context of our specific task: helping Anki Vector detect another Vector in its camera feed.
Evaluating a ML Model
It is important to first lay down the ground rules for doing a thorough evaluation of a Machine Learning apparatus. The ground rules are important to ensure that the comparison made is indeed apples-to-apples, and we are not biased in any way. We have discussed the key concepts of doing an apples-to-apples evaluation before in a previous article. A quick summary is that we need to do two important things:
i)Build a dataset which is representative of the problem that we want to address, and
ii)Build an evaluation strategy considering metrics or Key Performance Indicators (KPIs) which would be representative for our use case.
The right KPI
Choosing the right KPI is very important. As an example, if we want to do object detection on a live video feed, our main KPI is the speed of inference, or the Frames Per Second (FPS). On the other hand, if we want to update our Machine Learning model daily (because of new annotated data which we would like to use in training), then our KPI would be the time to train a ML model. By now, you would have guessed it right… There is no answer to the question of which Machine Learning model is the best, it all depends on your specific requirements and how they translate into KPIs.
Now, let us get into the meat of our evaluation. Our dataset is the Anki vector dataset which we have made publicly available at Roboflow. The dataset comprises of 590 images of training, 61 for validation, and 30 for test. Although this is a relatively small dataset for training a ML model, we have found it super useful for evaluating our ideas. We modified the notebooks that Roboflow has open-sourced for each of the models:, and have made these modified notebooks freely available at our Git repository: YOLOv5, YOLOv6, and YOLOv7. These notebooks have all the details of how we did the evaluation, and we strongly encourage you to work through these notebooks yourself on Google Colab or AWS Sagemaker Studio Lab.
Let us now deep dive into a few comparisons for different KPIs. Hopefully, this analysis gives some insights into the strengths and weaknesses of each model.
Accuracy. We trained each model to 50 epochs and measured various accuracy metrics for each model. Here are the results:
mAP stands for Mean Average Precision. mAP@0.5 implies the precision (How many images were correctly identified) with a 50% overlap in the detected bounding box versus the one labelled by humans. If we are looking for accurate object detection, mAP@0.5 or mAP@0.5:0.95 are great KPIs to focus on. For both these KPIs, YOLOv7 is seen to outperform YOLOv6 and YOLOv5. YOLOv6 is seen to be behind YOLOv5 and YOLOv7 in all 4 metrics.
Time to Train: The time to train reflects the time to train the model to 50 epochs. Usually, a larger model would take more time to train; therefore, we include the number of parameters in the model to provide a good comparison point. The GPU used is a key variable here, in our case, we trained models on a nVidia Tesla T4 GPU available via Google Colab.
Time to run inference: In the case of Vector, we are looking for fast object detection that can work on a live video feed. The speed of inference directly affects the Frames Per Second(FPS) we can support object detection for. Inference Times are reported for the nVidia Testa T4 GPU
YOLOv7 seems to have similar inference time compared to YOLOv5. YOLOv6 is approximately 15% slower in inference. Since YOLOv7 has a much higher mAP@0.5:0.95 score (see comparison for accuracy), we think that YOLOv7 is a better model for this use case.
Application to a live setting: Finally, we would like to deploy the models in a live setup whose feed differs from the dataset we used to build our models. This would show which model would perform well in practice. Here are the results comparing YOLOv7 vs YOLOv5 on the same video. We left out YOLOv6 for this study because: (i)The YOLOv6 git did not have ready to use code to run inference on videos, and (ii)YOLOv6 performed poorer than YOLOv5 and YOLOv7 in all our studies.
First. lets look at how YOLOv5 performs:
Now, lets look at YOLOv7.
To help you decide which is better, we also have a split frame video. Please ignore the watermark in the lower frame of the video.
We ran a short poll in the Anki Vector reddit channel asking users which video they thought was better. The results were a mixed bag, an almost equal number of voters were split into deciding that YOLOv7 was the better model, and that there was no difference between YOLOv7 and YOLOv5.
Some voters thought that YOLOv5 was better. Looking through the comments, the attributed reason was that YOLOv5 was able to identify Vector in all the frames… while YOLOv7 missed some of the frames. On the other hand, YOLOv7 had much tighter bounding boxes around Vector… that could be the reason many voters favored YOLOv7. The lesson is that ML models are subjective… and the efficacy of ML models is often determined by the use cases and users perception rather than pure science. Hence, it is important to inject a feedback loop in any ML Operations pipeline. A feedback loops should get inputs from the users, and build the feedback into generating labelled data to train the next ML model. If you are interested in Machine Learning Operations, you can find our course material here.
In summary, we evaluated three YOLO models: YOLOv5, YOLOv6, and YOLOv7. The objective of this post is to illustrate that unlike what the model authors claim, there is no easy way to decide which Machine Learning Model is best.. it depends on the use case and the KPIs targeted.
Hope you enjoyed this post. If you wish, you can buy me a coffee. There is one last request I have for you. If you have any thoughts/ideas, please comment. And if you are not a paying subscriber, please consider enrolling as one. Paying subscribers get early access to posts such as this one, and also get access to premium posts which are not available to the free tier. You also get to influence what I write. And thanks to Substack, every post is delivered to your inbox with no ads or display banners. Please click below to subscribe.