Skip to main content

Python - Video Processing with OpenCV

Sham Sui Po, Hong Kong

see Git Repository for source code.

Basic Video Operations

pip install opencv-python
Requirement already satisfied: opencv-python in /usr/lib/python3.10/site-packages (4.5.5.62)

Getting Video Information

import cv2

video = cv2.VideoCapture('security_cam.mp4')

width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
fps = int(video.get(cv2.CAP_PROP_FPS))

print(width, height, frames, fps)

The security camera footage has a 1440p resolution and consists of 451 frames that are displayed at a frame rate of 30 fps:

python main.py
2560 1440 451 30

Extracting Frames

The OpenCV read() function allows you to grab a single frame from the video file in from of a Numpy array:

print(video.read())
python main.py
(True, array([[[ 45, 45, 45],
[ 47, 47, 47],
[ 39, 39, 39],
...,
[122, 122, 122],
[126, 126, 126],
[140, 140, 140]]], dtype=uint8))

Repeated calls will return the consecutive frames until the last, after which the function will return a False, None. To extract all frames into a folder images we can run a while loop:

# get the first frame and see if successful
success, frame = video.read()
# initiate count
count = 1
# keep extracting frames as long as success is True
while success:
cv2.imwrite(f'images/{count}.jpg', frame)
success, frame = video.read()
count += 1

Extract Frame at Timestamp

This way I now ended up with 450, mostly useless, images from my cameras alarm video. The next step has to be to specify a timestamp from where a frame should be extracted - e.g. given by an Object Detection algorithm:

# get frame at a specific timestamp
timestamp = '00:00:03.00'
hours, minutes, seconds = [float(i) for i in timestamp.split(':')]
# print(hours, minutes, seconds)

# get number of frames up to timestamp
trigger_frame = hours * 3600 * fps + minutes * 60 * fps + seconds * fps
print(frames, trigger_frame)

The total amount of frames in the video file is represented by frames and the number of fames up to the specified timestamp are represented by trigger_frame - in this example it is frame number 90 we want to take a look at:

python main.py
451 90.0

Now we can select this frame and store it inside the images directory:

# Go to frame selected by timestamp
video.set(1, trigger_frame)
success, frame = video.read()

# save the frame into an image file
if success:
cv2.imwrite(f'images/{filename}_{hours}-{minutes}-{seconds}.jpg', frame)

security_cam_0.0-0.0-4.25.jpg

Python - Video Editing with OpenCV

Detect Faces using OpenCV Cascades & Alarm Recordings

Still Images

import cv2

# load image file & frontal face cascade
image = cv2.imread('images/faces.jpg', 1)
face_cascade = cv2.CascadeClassifier('cascades/adaboost_frontal_face_detector.xml')

# use cascade to detect frontal faces
faces = face_cascade.detectMultiScale(image, 1.1, 4)
# print(faces)

# use returned coordinates to draw a frame
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (30,211,198), 3)

# save edited image to file
cv2.imwrite('images/faces_detected.jpg', image)

faces_detected.jpg

Python - Video Editing with OpenCV

Videos

Using a Harr Cascade to detect faces in a video file:

import cv2

# load video file & frontal face cascade
video = cv2.VideoCapture('videos/faces_01.mp4')
face_cascade = cv2.CascadeClassifier('cascades/adaboost_frontal_face_detector.xml')

# read first frame of the video
success, frame = video.read()

# get video fps of input video
fps = int(video.get(cv2.CAP_PROP_FPS))

# prepare empty video file
output = cv2.VideoWriter('videos/output.avi', cv2.VideoWriter_fourcc(*'DIVX'), fps, (1280, 720))

while success:
resize = cv2.resize(frame, (1280, 720))
# use cascade to detect frontal faces
faces = face_cascade.detectMultiScale(resize, 1.4, 4)

# use returned coordinates to draw a frame
for (x, y, w, h) in faces:
cv2.rectangle(resize, (x, y), (x+w, y+h), (30,211,198), 3)

# write frame to empty output
output.write(resize)

# read next frame to start the loop
success, frame = video.read()

# generate video when you reached end of file
output.release()

This code will detect faces, and use the detected coordinates to draw a bounding box around them. These edited frames are then stashed in a new video file:

Python - Video Editing with OpenCV

To issues with this code - the detectMultiScale function is struggling with my source video. Setting the scale factor to 1.4 and minimum neighbors to 4 still gives me false positives on background objects. I hope to improve this by turning the image grayscale.

And secondly, I am returning the entire video. By moving the write() function into the for-loop I would get an output video that only contains frames with detected faces. But I can also simply save every frame into a jpg file instead:

import cv2

# load video file & frontal face cascade
video = cv2.VideoCapture('videos/faces_01.mp4')
face_cascade = cv2.CascadeClassifier('cascades/adaboost_frontal_face_detector.xml')

# read first frame of the video
success, frame = video.read()
count = 1

while success:
# convert image to grayscale and resize
resize = cv2.resize(frame, (1280, 720))
gray_image = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)
# use cascade to detect frontal faces on grayscale frame
faces = face_cascade.detectMultiScale(gray_image, scaleFactor=1.4, minNeighbors=4)

# use returned coordinates to draw bounding box on colour frame
for (x, y, w, h) in faces:
cv2.rectangle(resize, (x, y), (x+w, y+h), (198,211,30), thickness=3)
cv2.imwrite(f'images/face_detection_{count}.jpg', resize)
count += 1

# read next frame to start the loop
success, frame = video.read()

This time I am still seeing a few false positives but the detection rate in general has much improved. And I end up with a collection of images that I can process further:

Python - Video Editing with OpenCV

IP Camera Streams

Now that I am able to handle local video files I now want to use an RTSP stream from my INSTAR 2k+ WQHD camera as input source. I already looked at how to process RTSP streams with OpenCV. So now I just have to merge those codes to get an Alarm Snapshot function that will record a single JPG image whenever a face is detected:

import cv2
import os

face_cascade = cv2.CascadeClassifier('cascades/adaboost_frontal_face_detector.xml')
RTSP_URL = 'rtsp://admin:instar@192.168.2.120/livestream/12'

# use tcp instead of udp if stream is unstable
os.environ['OPENCV_FFMPEG_CAPTURE_OPTIONS'] = 'rtsp_transport;udp'
# start the stream and verify
cap = cv2.VideoCapture(RTSP_URL, cv2.CAP_FFMPEG)

if not cap.isOpened():
print("ERROR :: Cannot open RTSP stream")
exit(-1)


# start reading frames
success, frame = cap.read()
count = 0

while success:
# resize frame and convert to grayscale
resize = cv2.resize(frame, (1280, 720))
gray_image = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)

# use cascade to detect frontal faces on grayscale frame
faces = face_cascade.detectMultiScale(gray_image, scaleFactor=1.4, minNeighbors=4)

# use returned coordinates to draw bounding box on colour frame
for (x, y, w, h) in faces:
cv2.rectangle(resize, (x, y), (x+w, y+h), (198,211,30), thickness=3)
cv2.imwrite(f'images/face_detection_{count}.jpg', resize)

count += 1
success, frame = cap.read()

So now I have a piece of code that systematically ignores James May. But rigorously triggers whenever Richard Hammond or Jeremy Clarkson walk in - fair enough:

Python - Video Editing with OpenCV

And with a minor change I am also able to record every frame - with a successful face detection - into a video file:

# prepare empty video file
output = cv2.VideoWriter('videos/face_detection.avi', cv2.VideoWriter_fourcc(*'DIVX'), 15, (1280, 720))


# start reading frames
success, frame = cap.read()
count = 0

while success:
# resize frame and convert to grayscale
resize = cv2.resize(frame, (1280, 720))
gray_image = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)

# use cascade to detect frontal faces on grayscale frame
faces = face_cascade.detectMultiScale(gray_image, scaleFactor=1.4, minNeighbors=4)

# use returned coordinates to draw bounding box on colour frame
for (x, y, w, h) in faces:
cv2.rectangle(resize, (x, y), (x+w, y+h), (198,211,30), thickness=3)
output.write(resize)

count += 1
success, frame = cap.read()

output.release()

Continuous Video Display with Face Detection

To be able to monitor the live video while recording with face detection use the following code:

import cv2
import os

face_cascade = cv2.CascadeClassifier('cascades/adaboost_frontal_face_detector.xml')
RTSP_URL = 'rtsp://admin:instar@192.168.2.120/livestream/12'

os.environ['OPENCV_FFMPEG_CAPTURE_OPTIONS'] = 'rtsp_transport;udp' # Use tcp instead of udp if stream is unstable

cap = cv2.VideoCapture(RTSP_URL, cv2.CAP_FFMPEG)

if not cap.isOpened():
print('Cannot open RTSP stream')
exit(-1)

frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
fps = 15

video_codec = cv2.VideoWriter_fourcc(*'DIVX')
# video_output = cv2.VideoWriter('videos/captured_video.avi', video_codec, fps, (frame_width, frame_height))
video_output = cv2.VideoWriter('videos/captured_video.avi', video_codec, 15, (1280, 720))

while True:
success, frame = cap.read()

if success:
# resize frame and convert to grayscale
resize = cv2.resize(frame, (1280, 720))
gray_image = cv2.cvtColor(resize, cv2.COLOR_BGR2GRAY)

# use cascade to detect frontal faces on grayscale frame
faces = face_cascade.detectMultiScale(gray_image, scaleFactor=1.4, minNeighbors=4)

# use returned coordinates to draw bounding box on colour frame
for (x, y, w, h) in faces:
cv2.rectangle(resize, (x, y), (x+w, y+h), (198,211,30), thickness=3)

cv2.imshow("Video Recording", resize)
video_output.write(resize)

if cv2.waitKey(1) & 0xFF == ord('q'):
cap.release()
video_output.release()
cv2.destroyAllWindows()
print('INFO :: Video was saved.')
break

else:
cap.release()
video_output.release()
cv2.destroyAllWindows()
print('ERROR :: Video recording aborted!')
break