Riddle Me This

Why do trucks turn that way? Why are phone menus so aggravating? What makes Möbius strips work? With plenty of complex and consequential issues to think about every day, you might wonder why these…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Prerequisites for YOLO

We all know in computer vision, YOLO is the best solution to the object detection problem. But what is the object detection problem? I suggest you regard it as two problems.

About object localization, you can think it’s like you ask the robot the following two questions:

1. “Hi robot, Why do you think there is a dog(or cat) in this picture(or image)?”

2. “Could you indicate the dog’s/cat’s location?”

If the robot tells you the answer, and it’s what you think as well, then you could start to believe the robot.

If you study the CNN structure, you should know you are learning to tackle the image classification problem. However, object localization issue in deep learning I think for most of us is not familiar. But YOLO should tackle this issue as well. In YOLO, you must know the following terms:

The term backbone refers to the feature-extraction network that processes input data (into a certain feature representation). Popular CNN architectures that we use as a backbone such as:

So, in my opinion, the term “backbone” appears mainly for image classification.

The head processes the aggregated features (the result from the backbone and neck) and predicts the things such as:

In my opinion, the term “head” appears mainly for object localization.

The neck connects the backbone and the head, i.e., it is in the middle of the two parts. The neck process refers to the feature-aggregation network. In my opinion, the purpose is to find or extract more effective information from the backbone. It’s auxiliary, not necessary. I think that’s why it’s not used in the former version of YOLO.

So, how do we evaluate one YOLO model (or object detection model) in general? To answer the question, let’s think about the dog/cat classification project first. In that project, what we focus on is the classification ability. The metric we use in that project is accuracy, which is the most common metric in classification problems. However, in the object detection task, we should add another metric, which is prediction speed. So, the best model for the task is the model with higher accuracy and lower calculation time.

Namely, each part of the structure above in our YOLO model should follow two things:

If you want to compare YOLO models (or other object detection models), you must check the two metrics (XY-plot) at the same time.

I think over the terms above for a long time. I find if you are sensitive enough, you may find the structure above is like our bodies. So I try to do some similies:

Add a comment

Related posts:

A Language Of Social Media

Since the existence of the first human, “The homo sapiens” on the earth. We humans have been thriving to connect with each other. First, we drew some “symbols and pictures” on the wall to convey our…

MovieBloc and the traffic it will bring

Earlier this month we announced our partnership with MovieBloc, a new participant-centric film distribution platform. MovieBloc seeks to reform the film industry with blockchain technology and create…

Eyes Wide Shut

Is that so hard to understand? To be alert, aware of the social injustice of racism or sexism. One must have their head deeply embedded up their ass to deny racism and its ugly resurgence after…