Tutorial Series: Instance Segmentation on Vector's camera feed
Part 1: How is instance segmentation different from object detection?
In a previous tutorial, we discussed how to train a Machine Learning model to allow your Vector robot to detect another Vector robot in its camera feed. We trained a YOLOv5 model to do object detection, in this specific case, the goal is to create a rectangular bounding box around wherever the model was able to detect the presence of another Vector robot.
Object detection vs instance segmentation
In most cases, object detection works very well on a live video stream. Models such as YOLOv8 can perform object detection very fast, and easily support detection speeds of 60 frames per second. However, one of the limitations of object detection is that it constructs rectangular bounding boxes around the object. Thus, the contour of the object is not available. In some cases the rectangular bounding box might give an inaccurate idea of the position of the object in the picture. This leads us to instance segmentation. Instance segmentation attempts to classify each pixel of the image, thereby creating contours of objects as opposed to rectangular bounding boxes. Identifying contours of objects is super useful when objects have complex shapes, or when there are multiple objects overlapping with each other such as what could happen when a satellite is tracking movement of tanks.
The following images may help understanding the difference between object detection and instance segmentation. The images are from Vector’s camera with another Vector in the field of vision (borrowing from our past theme of training a Vector robot to recognize another Vector robot). The left panel shows you the result of object detection. The right panel shows you the result of instance segmentation. Notice how the contour is well formed around the Vector robot and even eliminates Vector’s shadow which is part of the bounding box on the left.
As you might have guessed, this tutorial will focus on how to train a model which can be used to help Vector identify another Vector with instance segmentation. Since we do not have access to Vector’s firmware, we will leverage the Vector Python SDK (which can be run on the Raspberry Pi) and Roboflow to orchestrate the Machine Learning Lifecycle of (i) Labelling Data, (ii) Creating a dataset, (iii) Training a Model, and (iv) Deploying it for inference. We will then enhance our Vector Python SDK to use the model for inference and outline the results of instance segmentation in Vector’s camera feed. As you might imagine, this process is pretty detailed, so we will split the tutorials into subparts which outline each step in detail.
Follow the journey
I invite you to follow my journey as I work through these tutorials. Before we embark on our journey, I want to motivate you by showing how the final output will look like. Here is an output sample…
I will also be posting incremental progress pointers on Substack Notes. To find my notes, please head to substack.com/notes or find the “Notes” tab in the Substack app (see picture below). As a subscriber to Learn With A Robot, you’ll automatically see my notes. Feel free to like, reply, or share them around!
Finally, a tutorial such as this cannot be comprehensive with your inputs and feedback. Please keep sending your thoughts via either comments, notes, or send me an email. I hope to bring you the latets developments from this exciting aspect of computer vision.