Real-time Zoom-In/Zoom-Out using Python-based Computer Vision

Hi Guys,

Today, I’ll be using another exciting installment of Computer Vision. The application will read the real-time human hand gesture to control WebCAM’s zoom-in or zoom-out capability.

Why don’t we see the demo first before jumping into the technical details?

Demo

Architecture:

Let us understand the architecture –

As one can see, the application reads individual frames from WebCAM & then map the human hand gestures with a media pipe. And finally, calculate the distance between particular pipe points projected on human hands.

Let’s take another depiction of the experiment to better understand the above statement.

Python Packages:

Following are the python packages that are necessary to develop this brilliant use case –

pip install mediapipe
pip install opencv-python

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

clsConfig.py (Configuration script for the application.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 24-May-2022 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'FINAL_PATH': Curr_Path + sep + 'Target' + sep,
	'APP_DESC_1': 'Hand Gesture Zoom Control!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR': 'data',
	'SEP': sep,
	'TITLE': "Human Hand Gesture Controlling App",
	'minVal':0.01,
	'maxVal':1
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

2. clsVideoZoom.py (This script will zoom the video streaming depending upon the hand gestures.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 23-May-2022 ####
	#### Modified On 24-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsVideoZoom class to initiate ####
	#### the model to read the real-time ####
	#### human hand gesture from video ####
	#### Web-CAM & control zoom-in & zoom-out. ####
	##################################################

	import mediapipe as mp
	import cv2
	import time
	import clsHandMotionScanner as hms
	import math
	import imutils
	import numpy as np

	from clsConfig import clsConfig as cf

	class clsVideoZoom():
	def __init__(self):
	self.title = str(cf.conf['TITLE'])
	self.minVal = float(cf.conf['minVal'])
	self.maxVal = int(cf.conf['maxVal'])

	def zoomVideo(self, image, Iscale=1):
	try:
	scale=Iscale

	#get the webcam size
	height, width, channels = image.shape

	#prepare the crop
	centerX,centerY=int(height/2),int(width/2)
	radiusX,radiusY= int(scalecenterX),int(scalecenterY)

	minX,maxX=centerX-radiusX,centerX+radiusX
	minY,maxY=centerY-radiusY,centerY+radiusY

	cropped = image[minX:maxX, minY:maxY]
	resized_cropped = cv2.resize(cropped, (width, height))

	return resized_cropped

	except Exception as e:
	x = str(e)

	return image

	def runSensor(self):
	try:
	pTime = 0
	cTime = 0
	zRange = 0
	zRangeBar = 0
	cap = cv2.VideoCapture(0)
	detector = hms.clsHandMotionScanner(detectionCon=0.7)

	while True:
	success,img = cap.read()
	img = imutils.resize(img, width=720)
	#img = detector.findHands(img, draw=False)
	#lmList = detector.findPosition(img, draw=False)

	img = detector.findHands(img)
	lmList = detector.findPosition(img, draw=False)

	if len(lmList) != 0:
	print(''60)
	#print(lmList[4], lmList[8])
	#print(''60)

	x1, y1 = lmList[4][1], lmList[4][2]
	x2, y2 = lmList[8][1], lmList[8][2]

	cx, cy = (x1+x2)//2, (y1+y2)//2

	cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
	cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)

	cv2.line(img, (x1,y1), (x2,y2), (255,0,255), 3)

	cv2.circle(img, (cx,cy), 15, (255,0,255), cv2.FILLED)

	lenVal = math.hypot(x2-x1, y2-y1)
	print('Length:', str(lenVal))
	print(''60)

	# Hand Range is from 50 to 270
	# Camera Zoom Range is 0.01, 1
	minVal = self.minVal
	maxVal = self.maxVal

	zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
	zRangeBar = np.interp(lenVal, [50, 270], [400, 150])

	print('Range: ', str(zRange))

	if lenVal < 50:
	cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)

	cv2.rectangle(img, (50, 150), (85, 400), (255,0,0), 3)
	cv2.rectangle(img, (50, int(zRangeBar)), (85, 400), (255,0,0), cv2.FILLED)

	cTime = time.time()
	fps = 1/(cTime-pTime)
	pTime = cTime


	image = cv2.flip(img, flipCode=1)
	cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
	cv2.imshow("Original Source",image)

	# Creating the new zoom video
	cropImg = self.zoomVideo(img, zRange)
	cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
	cv2.imshow("Zoomed Source",cropImg)

	if cv2.waitKey(1) == ord('q'):
	break

	cap.release()
	cv2.destroyAllWindows()

	return 0
	except Exception as e:
	x = str(e)
	print('Error:', x)

	return 1

view raw

clsVideoZoom.py

hosted with ❤ by GitHub

Key snippets from the above scripts –

def zoomVideo(self, image, Iscale=1):
    try:
        scale=Iscale

        #get the webcam size
        height, width, channels = image.shape

        #prepare the crop
        centerX,centerY=int(height/2),int(width/2)
        radiusX,radiusY= int(scale*centerX),int(scale*centerY)

        minX,maxX=centerX-radiusX,centerX+radiusX
        minY,maxY=centerY-radiusY,centerY+radiusY

        cropped = image[minX:maxX, minY:maxY]
        resized_cropped = cv2.resize(cropped, (width, height))

        return resized_cropped

    except Exception as e:
        x = str(e)

        return image

The above method will zoom in & zoom out depending upon the scale value that the human hand gesture will receive.

cap = cv2.VideoCapture(0)
detector = hms.clsHandMotionScanner(detectionCon=0.7)

The following lines will read the individual frames from webCAM. Instantiate another open-source customized class, which will find the hand’s position.

img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)

And captured the hand position depending upon the movements.

x1, y1 = lmList[4][1], lmList[4][2]
x2, y2 = lmList[8][1], lmList[8][2]

cx, cy = (x1+x2)//2, (y1+y2)//2

cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)

To understand the above lines, let’s look into the following diagram –

As one can see, the thumbs tip value is 4 & Index fingertip is 8. The application will mark these points with a solid circle.

lenVal = math.hypot(x2-x1, y2-y1)

The above line will calculate the distance between the thumbs tip & index fingertip.

# Camera Zoom Range is 0.01, 1

minVal = self.minVal
maxVal = self.maxVal

zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
zRangeBar = np.interp(lenVal, [50, 270], [400, 150])

In the above lines, the application will translate the values captured between the two fingertips & then translate them into a more meaningful camera zoom range from 0.01 to 1.

if lenVal < 50:
    cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)

The application will not consider a value below 50 as 0.01 for the WebCAM start value.

cTime = time.time()
fps = 1/(cTime-pTime)
pTime = cTime


image = cv2.flip(img, flipCode=1)
cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Original Source",image)

# Creating the new zoom video
cropImg = self.zoomVideo(img, zRange)
cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Zoomed Source",cropImg)

The application will capture the frame rate & share the original video frame and the test frame, where it will zoom in or out depending on the hand gesture.

3. clsHandMotionScanner.py (This is an enhance version of open source script, which will capture the hand position.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Modified On 23-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python class that will capture the ####
	#### human hand gesture on real-time basis ####
	#### and that will enable the video zoom ####
	#### capability of the feed directly coming ####
	#### out of a Web-CAM. ####
	##################################################

	import mediapipe as mp
	import cv2
	import time

	class clsHandMotionScanner():
	def __init__(self, mode=False, maxHands=2, detectionCon=0.5, modelComplexity=1, trackCon=0.5):
	self.mode = mode
	self.maxHands = maxHands
	self.detectionCon = detectionCon
	self.modelComplex = modelComplexity
	self.trackCon = trackCon
	self.mpHands = mp.solutions.hands
	self.hands = self.mpHands.Hands(self.mode, self.maxHands,self.modelComplex,self.detectionCon, self.trackCon)

	# it gives small dots onhands total 20 landmark points
	self.mpDraw = mp.solutions.drawing_utils

	def findHands(self, img, draw=True):
	try:
	# Send rgb image to hands
	imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
	self.results = self.hands.process(imgRGB)

	# process the frame
	if self.results.multi_hand_landmarks:
	for handLms in self.results.multi_hand_landmarks:

	if draw:
	#Draw dots and connect them
	self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)

	return img
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return img

	def findPosition(self, img, handNo=0, draw=True):
	try:
	lmlist = []

	# check wether any landmark was detected
	if self.results.multi_hand_landmarks:
	#Which hand are we talking about
	myHand = self.results.multi_hand_landmarks[handNo]
	# Get id number and landmark information
	for id, lm in enumerate(myHand.landmark):
	# id will give id of landmark in exact index number
	# height width and channel
	h,w,c = img.shape
	#find the position
	cx,cy = int(lm.xw), int(lm.yh) #center
	#print(id,cx,cy)
	lmlist.append([id,cx,cy])

	# Draw circle for 0th landmark
	if draw:
	cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)

	return lmlist
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	lmlist = []
	return lmlist

view raw

clsHandMotionScanner.py

hosted with ❤ by GitHub

Key snippets from the above script –

def findHands(self, img, draw=True):
    try:
        # Send rgb image to hands
        imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        # process the frame
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:

                if draw:
                    #Draw dots and connect them
                    self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)

        return img
    except Exception as e:
        x = str(e)
        print('Error: ', x)

        return img

The above function will identify individual key points & marked them as dots on top of human hands.

def findPosition(self, img, handNo=0, draw=True):
      try:
          lmlist = []

          # check wether any landmark was detected
          if self.results.multi_hand_landmarks:
              #Which hand are we talking about
              myHand = self.results.multi_hand_landmarks[handNo]
              # Get id number and landmark information
              for id, lm in enumerate(myHand.landmark):
                  # id will give id of landmark in exact index number
                  # height width and channel
                  h,w,c = img.shape
                  #find the position - center
                  cx,cy = int(lm.x*w), int(lm.y*h) 
                  lmlist.append([id,cx,cy])

              # Draw circle for 0th landmark
              if draw:
                  cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)

          return lmlist
      except Exception as e:
          x = str(e)
          print('Error: ', x)

          lmlist = []
          return lmlist

The above line will capture the position of each media pipe point along with the x & y coordinate & store them in a list, which will be later parsed for main use case.

4. viewHandMotion.py (Main calling script.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 23-May-2022 ####
	#### Modified On 23-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsVideoZoom class to initiate ####
	#### the model to read the real-time ####
	#### hand movements gesture that enables ####
	#### video zoom control. ####
	##################################################

	import time
	import clsVideoZoom as vz
	from clsConfig import clsConfig as cf
	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating the base class

	x1 = vz.clsVideoZoom()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'visualZoom.log', level=logging.INFO)

	print('Started Visual-Zoom Emotions!')

	r1 = x1.runSensor()

	if (r1 == 0):
	print('Successfully identified visual zoom!')
	else:
	print('Failed to identify the visual zoom!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)


	if __name__ == "__main__":
	main()

view raw

viewHandMotion.py

hosted with ❤ by GitHub

The above lines are self-explanatory. So, I’m not going to discuss anything on this script.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

So, we’ve done it.

You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

	Building a real-time… on Projecting real-time KPIs by i…
	RAG implementation o… on RAG implementation of LLMs by…
	Realtime reading fro… on Live visual reading using Conv…
	Neural prophet… on Predicting real-time Covid-19…
	Projecting real-time… on Python-based dash framework vi…

Real-time Zoom-In/Zoom-Out using Python-based Computer Vision

Like this:

Related

Published by SatyakiDe

Leave a ReplyCancel reply

Share this:

Like this:

Related

Published by SatyakiDe

Leave a ReplyCancel reply

Discover more from Satyaki De's Blog