Object Recognition with YOLO and Darkflow – Part 1

Following the evolution of more and more powerful GPUs (Graphics Process Unit), many of the object recognition algorithms, implemented until some years ago through appropriate computer vision techniques, started to be developed by using particular neural networks. These are able to detect the presence and the position of one or more objects inside an image.

Darknet is a powerful open-source framework, written in C and based on OpenCV and CUDA, that allows building different types of neural networks, supporting both the CPU and the GPU computations. YOLO is an object recognition system based on a CNN (Convolutional Neural Network), built with the Darknet framework and trained on the COCO dataset.

Currently, the official version of Darknet works only on Unix-like systems, so, in this tutorial, we are going to use the TensorFlow implementation of Darknet, named Darkflow, that also allows YOLO to be run on Microsoft Windows systems.


Before playing with the code, you have to prepare your environment. Below is the list of software to install:

And below is the list of Python libraries to install:

You can install the specified version of the listed libraries by using the following command:

pip install <PACKAGE>==<VERSION>

Also, you need to check out this ready-to-use code repository containing Darkflow as well as an execution Python script named run.py, which allows the YOLO object recognition system to be run on images, video or in live mode. Once cloned, you have to perform the following actions:

  1. Download the YOLO neural network weights file, named yolo.weights, from this Google Drive folder.
  2. Create a new directory, named bin, inside the root folder of the repository and put the downloaded weights file inside it.
  3. To install Darkflow as a Python library, inside the root folder of the repository, run the following commands by using a command prompt:
python setup.py build_ext --inplace
pip install .

Note: this tutorial exploits the GPU / CUDA support to test the power of the YOLO real-time system, so make sure you enable the 3D hardware acceleration for your graphic card. If your computer doesn’t have an appropriate graphic card or you just want to use the CPU computing, you have to install the standard version of Tensorflow and set the gpu option value (inside the main function of the run.py file) to 0.0.


After having performed a long series of tedious setups, we need some fun, so let’s start playing with YOLO! From the root folder of the repository, run the following commands by using a command prompt:

python run.py image
python run.py video
python run.py camera

The first command will execute the YOLO object recognition process on a sample image. As a result, you will be able to see something like this:

The second command will execute the YOLO object recognition process on a sample video, while the third one will execute the same process on the frames captured from your webcam, so be sure to look good before running it! 😉

If you would like to test the object recognition process on your image or video, you just have to run the following commands:

python run.py image <YOUR_IMAGE_PATH>
python run.py video <YOUR_VIDEO_PATH>

But how does the magic works? Let’s take a look at the run.py file.
The main function initialises the TFNet neural network, allowing us to run one of the following procedures:

  • image_prediction
  • video_prediction

The first one allows us to run the neural network prediction procedure on an image and draws coloured boxes around the detected objects:

def image_prediction(tfnet, source):
	image = cv2.imread(source)
	array = np.array(image, dtype=np.float32)
	results = tfnet.return_predict(array)
	cv2.imshow('image', tracking(image, results))

The second one allows us to run the same process for each video frame:

def video_prediction(tfnet, source):
	cap = cv2.VideoCapture(source)
		ret, frame = cap.read()
		if ret == True:
			frame = np.asarray(frame)
			results = tfnet.return_predict(frame)
			new_frame = tracking(frame, results)
			if cv2.waitKey(25) & 0xFF == ord('q'):

This function can also be executed in live mode, capturing the frames to analyse them directly from your webcam:

# Enable the live mode by passing 0 as a parameter 
video_prediction(tfnet, 0)

The neural network prediction function, named return_predict, allows for the detection of the position of any (pre-trained) object that is present inside an image. The return value of this function is a JSON object, which has the following format:

  "label": "bicycle",
  "confidence": 0.8486798,
  "topleft": {
    "x": 80,
    "y": 114
  "bottomright": {
    "x": 554,
    "y": 466
}, ...]

Thus, for any recognised object, we can know the name, the position and the percentage of reliability about the recognition process… amazing!

Finally, the tracking function allows drawing coloured boxes around the detected objects, showing the object position together with the name and the percentage of reliability:

def tracking(original_img, predictions):
	image = np.copy(original_img)
	for prediction in predictions:
		top_x = prediction['topleft']['x']
		top_y = prediction['topleft']['y']
		btm_x = prediction['bottomright']['x']
		btm_y = prediction['bottomright']['y']
		confidence = prediction['confidence']
		label = prediction['label'] + " " + str(round(confidence, 3))
		if confidence > 0.3:
			image = cv2.rectangle(image, (top_x, top_y), (btm_x, btm_y), (255,0,0), 3)
			image = cv2.putText(image, label, (top_x, top_y-5), cv2.FONT_HERSHEY_COMPLEX_SMALL , 0.8, (0, 230, 0), 1, cv2.LINE_AA)
	return image

That’s all for now!
In the next part of this tutorial, we are going to learn how to train the YOLO neural network to recognise our object classes. Don’t miss it!

Happy coding!