TedTest

From the Ted Talk by Joseph Redmon: How computers learn to recognize objects instantly

Unscramble the Blue Letters

If we speed this up by another factor of 10, this is a detector rinnnug at five frames per second. This is a lot better, but for example, if there's any significant mmovenet, I wouldn't want a stysem like this driving my car.

This is our detection system running in real time on my laptop. So it smoothly tracks me as I move around the frame, and it's robust to a wide vretiay of changes in size, pose, forward, backward. This is great. This is what we really need if we're going to build systems on top of computer vision.

(Applause)

So in just a few years, we've gone from 20 seconds per image to 20 meicllosdnis per igame, a thousand times faster. How did we get there? Well, in the past, object detection ssymtes would take an image like this and split it into a bunch of regions and then run a cflsiiesar on each of these rengios, and high scores for that classifier would be considered dencieotts in the image. But this involved running a classifier thousands of tmeis over an image, tdoasuhns of neural network evaluations to produce detection. Instead, we tiarned a slgnie network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our system, instead of looking at an image thousands of times to produce detection, you only look once, and that's why we call it the YOLO method of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and interact with each other.

Open Cloze

If we speed this up by another factor of 10, this is a detector _______ at five frames per second. This is a lot better, but for example, if there's any significant ________, I wouldn't want a ______ like this driving my car.

This is our detection system running in real time on my laptop. So it smoothly tracks me as I move around the frame, and it's robust to a wide _______ of changes in size, pose, forward, backward. This is great. This is what we really need if we're going to build systems on top of computer vision.

(Applause)

So in just a few years, we've gone from 20 seconds per image to 20 ____________ per _____, a thousand times faster. How did we get there? Well, in the past, object detection _______ would take an image like this and split it into a bunch of regions and then run a __________ on each of these _______, and high scores for that classifier would be considered __________ in the image. But this involved running a classifier thousands of _____ over an image, _________ of neural network evaluations to produce detection. Instead, we _______ a ______ network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our system, instead of looking at an image thousands of times to produce detection, you only look once, and that's why we call it the YOLO method of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and interact with each other.

Solution

system
milliseconds
thousands
systems
regions
trained
image
times
classifier
movement
running
detections
variety
single

Original Text

If we speed this up by another factor of 10, this is a detector running at five frames per second. This is a lot better, but for example, if there's any significant movement, I wouldn't want a system like this driving my car.

This is our detection system running in real time on my laptop. So it smoothly tracks me as I move around the frame, and it's robust to a wide variety of changes in size, pose, forward, backward. This is great. This is what we really need if we're going to build systems on top of computer vision.

(Applause)

So in just a few years, we've gone from 20 seconds per image to 20 milliseconds per image, a thousand times faster. How did we get there? Well, in the past, object detection systems would take an image like this and split it into a bunch of regions and then run a classifier on each of these regions, and high scores for that classifier would be considered detections in the image. But this involved running a classifier thousands of times over an image, thousands of neural network evaluations to produce detection. Instead, we trained a single network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our system, instead of looking at an image thousands of times to produce detection, you only look once, and that's why we call it the YOLO method of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and interact with each other.

Frequently Occurring Word Combinations

ngrams of length 2

collocation	frequency
computer vision	5
object detection	4
real time	3
neural network	2
bounding boxes	2
times faster	2
detection system	2
stop signs	2

Important Words

applause
bounding
boxes
build
bunch
call
car
cat
class
classifier
computer
considered
detection
detections
detector
dog
driving
evaluations
factor
faster
frame
frames
great
high
image
interact
involved
laptop
limited
lot
method
milliseconds
move
movement
network
neural
object
pose
probabilities
process
produce
produces
real
regions
robust
run
running
scores
seconds
significant
simultaneously
single
size
smoothly
speed
split
system
systems
thousand
thousands
time
times
top
tracks
trained
variety
video
vision
wide
years
yolo