extends Archives

Realtime reading from a Streaming using Computer Vision

Posted on July 26, 2022 by SatyakiDe in api, Azure, call, cloud, code, Computer-Vision, computing, Crossplatform, Data Science, exposure, extends, function, gui, IoT, json, Keras, machine-learning, matplotlib, mobile, Model, numpy, objects, Open-CV, Pandas, pytesseract, Python, Real-time, snippet, video

This week we’re going to extend one of our earlier posts & trying to read an entire text from streaming using computer vision. If you want to view the previous post, please click the following link.

But, before we proceed, why don’t we view the demo first?

Demo

Architecture:

Let us understand the architecture flow –

The above diagram shows that the application, which uses the Open-CV, analyzes individual frames from the source & extracts the complete text within the video & displays it on top of the target screen besides prints the same in the console.

Python Packages:

pip install imutils==0.5.4
pip install matplotlib==3.5.2
pip install numpy==1.21.6
pip install opencv-contrib-python==4.6.0.66
pip install opencv-contrib-python-headless==4.6.0.66
pip install opencv-python==4.6.0.66
pip install opencv-python-headless==4.6.0.66
pip install pandas==1.3.5
pip install Pillow==9.1.1
pip install pytesseract==0.3.9
pip install python-dateutil==2.8.2

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

clsReadingTextFromStream.py (This is the main class of python script that will extract the text from the WebCAM streaming in real-time.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 22-Jul-2022 ####
	#### Modified On 25-Jul-2022 ####
	#### ####
	#### Objective: This is the main class of ####
	#### python script that will invoke the ####
	#### extraction of texts from a WebCAM. ####
	#### ####
	##################################################

	# Importing necessary packages
	from clsConfig import clsConfig as cf

	from imutils.object_detection import non_max_suppression
	import numpy as np
	import pytesseract
	import imutils
	import time
	import cv2
	import time

	###############################################
	### Global Section ###
	###############################################

	# Two output layer names for the text detector model

	lNames = cf.conf['LAYER_DET']

	# Tesseract OCR text param values

	strVal = "-l " + str(cf.conf['LANG']) + " –oem " + str(cf.conf['OEM_VAL']) + " –psm " + str(cf.conf['PSM_VAL']) + ""
	config = (strVal)

	###############################################
	### End of Global Section ###
	###############################################

	class clsReadingTextFromStream:
	def __init__(self):
	self.sep = str(cf.conf['SEP'])
	self.Curr_Path = str(cf.conf['INIT_PATH'])
	self.CacheL = int(cf.conf['CACHE_LIM'])
	self.modelPath = str(cf.conf['MODEL_PATH']) + str(cf.conf['MODEL_FILE_NAME'])
	self.minConf = float(cf.conf['MIN_CONFIDENCE'])
	self.wt = int(cf.conf['WIDTH'])
	self.ht = int(cf.conf['HEIGHT'])
	self.pad = float(cf.conf['PADDING'])
	self.title = str(cf.conf['TITLE'])
	self.Otitle = str(cf.conf['ORIG_TITLE'])
	self.drawTag = cf.conf['DRAW_TAG']
	self.aRange = int(cf.conf['ASCII_RANGE'])
	self.sParam = cf.conf['SUBTRACT_PARAM']

	def findBoundBox(self, boxes, res, rW, rH, orig, origW, origH, pad):
	try:
	# Loop over the bounding boxes
	for (spX, spY, epX, epY) in boxes:
	# Scale the bounding box coordinates based on the respective
	# ratios
	spX = int(spX * rW)
	spY = int(spY * rH)
	epX = int(epX * rW)
	epY = int(epY * rH)

	# To obtain a better OCR of the text we can potentially
	# apply a bit of padding surrounding the bounding box.
	# And, computing the deltas in both the x and y directions
	dX = int((epX – spX) * pad)
	dY = int((epY – spY) * pad)

	# Apply padding to each side of the bounding box, respectively
	spX = max(0, spX – dX)
	spY = max(0, spY – dY)
	epX = min(origW, epX + (dX * 2))
	epY = min(origH, epY + (dY * 2))

	# Extract the actual padded ROI
	roi = orig[spY:epY, spX:epX]

	# Choose the proper OCR Config
	text = pytesseract.image_to_string(roi, config=config)

	# Add the bounding box coordinates and OCR'd text to the list
	# of results
	res.append(((spX, spY, epX, epY), text))

	# Sort the results bounding box coordinates from top to bottom
	res = sorted(res, key=lambda r:r[0][1])

	return res
	except Exception as e:
	x = str(e)
	print(x)

	return res

	def predictText(self, imgScore, imgGeo):
	try:
	minConf = self.minConf

	# Initializing the bounding box rectangles & confidence score by
	# extracting the rows & columns from the imgScore volume.
	(numRows, numCols) = imgScore.shape[2:4]
	rects = []
	confScore = []

	for y in range(0, numRows):
	# Extract the imgScore probabilities to derive potential
	# bounding box coordinates that surround text
	imgScoreData = imgScore[0, 0, y]
	xVal0 = imgGeo[0, 0, y]
	xVal1 = imgGeo[0, 1, y]
	xVal2 = imgGeo[0, 2, y]
	xVal3 = imgGeo[0, 3, y]
	anglesData = imgGeo[0, 4, y]

	for x in range(0, numCols):
	# If our score does not have sufficient probability,
	# ignore it
	if imgScoreData[x] < minConf:
	continue

	# Compute the offset factor as our resulting feature
	# maps will be 4x smaller than the input frame
	(offX, offY) = (x * 4.0, y * 4.0)

	# Extract the rotation angle for the prediction and
	# then compute the sin and cosine
	angle = anglesData[x]
	cos = np.cos(angle)
	sin = np.sin(angle)

	# Derive the width and height of the bounding box from
	# imgGeo
	h = xVal0[x] + xVal2[x]
	w = xVal1[x] + xVal3[x]

	# Compute both the starting and ending (x, y)-coordinates
	# for the text prediction bounding box
	epX = int(offX + (cos * xVal1[x]) + (sin * xVal2[x]))
	epY = int(offY – (sin * xVal1[x]) + (cos * xVal2[x]))
	spX = int(epX – w)
	spY = int(epY – h)

	# Adding bounding box coordinates and probability score
	# to the respective lists
	rects.append((spX, spY, epX, epY))
	confScore.append(imgScoreData[x])

	# return a tuple of the bounding boxes and associated confScore
	return (rects, confScore)

	except Exception as e:
	x = str(e)
	print(x)

	rects = []
	confScore = []

	return (rects, confScore)

	def processStream(self, debugInd, var):
	try:
	sep = self.sep
	Curr_Path = self.Curr_Path
	CacheL = self.CacheL
	modelPath = self.modelPath
	minConf = self.minConf
	wt = self.wt
	ht = self.ht
	pad = self.pad
	title = self.title
	Otitle = self.Otitle
	drawTag = self.drawTag
	aRange = self.aRange
	sParam = self.sParam

	val = 0

	# Initialize the video stream and allow the camera sensor to warm up
	print("[INFO] Starting video stream…")
	cap = cv2.VideoCapture(0)

	# Loading the pre-trained text detector
	print("[INFO] Loading Text Detector…")
	net = cv2.dnn.readNet(modelPath)

	# Loop over the frames from the video stream
	while True:
	try:
	# Grab the frame from our video stream and resize it
	success, frame = cap.read()

	orig = frame.copy()
	(origH, origW) = frame.shape[:2]

	# Setting new width and height and then determine the ratio in change
	# for both the width and height
	(newW, newH) = (wt, ht)
	rW = origW / float(newW)
	rH = origH / float(newH)

	# Resize the frame and grab the new frame dimensions
	frame = cv2.resize(frame, (newW, newH))
	(H, W) = frame.shape[:2]

	# Construct a blob from the frame and then perform a forward pass of
	# the model to obtain the two output layer sets
	blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
	net.setInput(blob)
	(confScore, imgGeo) = net.forward(lNames)

	# Decode the predictions, then apply non-maxima suppression to
	# suppress weak, overlapping bounding boxes
	(rects, confidences) = self.predictText(confScore, imgGeo)
	boxes = non_max_suppression(np.array(rects), probs=confidences)

	# Initialize the list of results
	res = []

	# Getting BoundingBox boundaries
	res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)

	for ((spX, spY, epX, epY), text) in res:
	# Display the text OCR by using Tesseract APIs
	print("Reading Text::")
	print("=" *60)
	print(text)
	print("=" *60)

	# Removing the non-ASCII text so it can draw the text on the frame
	# using OpenCV, then draw the text and a bounding box surrounding
	# the text region of the input frame
	text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
	output = orig.copy()

	cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
	cv2.putText(output, text, (spX, spY – 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)

	# Show the output frame
	cv2.imshow(title, output)
	#cv2.imshow(Otitle, frame)

	# If the `q` key was pressed, break from the loop
	if cv2.waitKey(1) == ord('q'):
	break

	val = 0

	except Exception as e:
	x = str(e)
	print(x)

	val = 1

	# Performing cleanup at the end
	cap.release()
	cv2.destroyAllWindows()

	return val
	except Exception as e:
	x = str(e)
	print('Error:', x)

	return 1

view raw

clsReadingTextFromStream.py

hosted with ❤ by GitHub

Please find the key snippet from the above script –

# Two output layer names for the text detector model

lNames = cf.conf['LAYER_DET']

# Tesseract OCR text param values

strVal = "-l " + str(cf.conf['LANG']) + " --oem " + str(cf.conf['OEM_VAL']) + " --psm " + str(cf.conf['PSM_VAL']) + ""
config = (strVal)

The first line contains the two output layers’ names for the text detector model. Among them, the first one indicates the outcome possibilities & the second one use to derive the bounding box coordinates of the predicted text.

The second line contains various options for the tesseract APIs. You need to understand the opportunities in detail to make them work. These are the essential options for our use case –

Language – The intended language, for example, English, Spanish, Hindi, Bengali, etc.
OEM flag – In this case, the application will use 4 to indicate LSTM neural net model for OCR.
OEM Value – In this case, the selected value is 7, indicating that the application treats the ROI as a single line of text.

For more details, please refer to the config file.

print("[INFO] Loading Text Detector...")
net = cv2.dnn.readNet(modelPath)

The above lines bring the already created model & load it to memory for evaluation.

# Setting new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (wt, ht)
rW = origW / float(newW)
rH = origH / float(newH)

# Resize the frame and grab the new frame dimensions
frame = cv2.resize(frame, (newW, newH))
(H, W) = frame.shape[:2]

# Construct a blob from the frame and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
net.setInput(blob)
(confScore, imgGeo) = net.forward(lNames)

# Decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = self.predictText(confScore, imgGeo)
boxes = non_max_suppression(np.array(rects), probs=confidences)

The above lines are more of preparing individual frames to get the bounding box by resizing the height & width followed by a forward pass of the model to obtain two output layer sets. And then apply the non-maxima suppression to remove the weak, overlapping bounding box by interpreting the prediction. In short, this will identify the potential text region & put the bounding box surrounding it.

# Initialize the list of results
res = []

# Getting BoundingBox boundaries
res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)

The above function will create the bounding box surrounding the predicted text regions. Also, we will capture the expected text inside the result variable.

for (spX, spY, epX, epY) in boxes:
  # Scale the bounding box coordinates based on the respective
  # ratios
  spX = int(spX * rW)
  spY = int(spY * rH)
  epX = int(epX * rW)
  epY = int(epY * rH)

  # To obtain a better OCR of the text we can potentially
  # apply a bit of padding surrounding the bounding box.
  # And, computing the deltas in both the x and y directions
  dX = int((epX - spX) * pad)
  dY = int((epY - spY) * pad)

  # Apply padding to each side of the bounding box, respectively
  spX = max(0, spX - dX)
  spY = max(0, spY - dY)
  epX = min(origW, epX + (dX * 2))
  epY = min(origH, epY + (dY * 2))

  # Extract the actual padded ROI
  roi = orig[spY:epY, spX:epX]

Now, the application will scale the bounding boxes based on the previously computed ratio for actual text recognition. In this process, the application also padded the bounding boxes & then extracted the padded region of interest.

# Choose the proper OCR Config
text = pytesseract.image_to_string(roi, config=config)

# Add the bounding box coordinates and OCR'd text to the list
# of results
res.append(((spX, spY, epX, epY), text))

Using OCR options, the application extracts the text within the video frame & adds that to the res list.

# Sort the results bounding box coordinates from top to bottom
res = sorted(res, key=lambda r:r[0][1])

It then sends a sorted output to the primary calling functions.

for ((spX, spY, epX, epY), text) in res:
  # Display the text OCR by using Tesseract APIs
  print("Reading Text::")
  print("=" *60)
  print(text)
  print("=" *60)

  # Removing the non-ASCII text so it can draw the text on the frame
  # using OpenCV, then draw the text and a bounding box surrounding
  # the text region of the input frame
  text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
  output = orig.copy()

  cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
  cv2.putText(output, text, (spX, spY - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)

  # Show the output frame
  cv2.imshow(title, output)

Finally, it fetches the potential text region along with the text & then prints on top of the source video. Also, it removed some non-printable characters during this time to avoid any cryptic texts.

readingVideo.py (Main calling script.)

	#####################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 22-Jul-2022 ####
	#### Modified On 25-Jul-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsReadingTextFromStream class to initiate ####
	#### the reading capability in real-time ####
	#### & display text via Web-CAM. ####
	#####################################################

	# We keep the setup code in a different class as shown below.
	import clsReadingTextFromStream as rtfs

	from clsConfig import clsConfig as cf

	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating all the main class

	x1 = rtfs.clsReadingTextFromStream()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'readingTextFromVideo.log', level=logging.INFO)

	print('Started reading text from videos!')

	# Execute all the pass
	r1 = x1.processStream(debugInd, var)

	if (r1 == 0):
	print('Successfully read text from the Live Stream!')
	else:
	print('Failed to read text from the Live Stream!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

readingVideo.py

hosted with ❤ by GitHub

Please find the key snippet –

# Instantiating all the main class

x1 = rtfs.clsReadingTextFromStream()

# Execute all the pass
r1 = x1.processStream(debugInd, var)

if (r1 == 0):
    print('Successfully read text from the Live Stream!')
else:
    print('Failed to read text from the Live Stream!')

The above lines instantiate the main calling class & then invoke the function to get the desired extracted text from the live streaming video if that is successful.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

You will get the complete codebase in the following Github link.

Unfortunately, I cannot upload the model due to it’s size. I will share on the need basis.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Real-time Zoom-In/Zoom-Out using Python-based Computer Vision

Posted on May 24, 2022 by SatyakiDe in api, Azure, call, cloud, code, Computer-Vision, computing, Crossplatform, Data Science, design, extends, features, function, human, json, Model, numpy, objects, Open-CV, Pandas, Python, Real-time, snippet, Technology, Tensorflow, video

Hi Guys,

Today, I’ll be using another exciting installment of Computer Vision. The application will read the real-time human hand gesture to control WebCAM’s zoom-in or zoom-out capability.

Why don’t we see the demo first before jumping into the technical details?

Demo

Architecture:

Let us understand the architecture –

As one can see, the application reads individual frames from WebCAM & then map the human hand gestures with a media pipe. And finally, calculate the distance between particular pipe points projected on human hands.

Let’s take another depiction of the experiment to better understand the above statement.

Python Packages:

Following are the python packages that are necessary to develop this brilliant use case –

pip install mediapipe
pip install opencv-python

CODE:

clsConfig.py (Configuration script for the application.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 24-May-2022 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'FINAL_PATH': Curr_Path + sep + 'Target' + sep,
	'APP_DESC_1': 'Hand Gesture Zoom Control!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR': 'data',
	'SEP': sep,
	'TITLE': "Human Hand Gesture Controlling App",
	'minVal':0.01,
	'maxVal':1
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

2. clsVideoZoom.py (This script will zoom the video streaming depending upon the hand gestures.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 23-May-2022 ####
	#### Modified On 24-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsVideoZoom class to initiate ####
	#### the model to read the real-time ####
	#### human hand gesture from video ####
	#### Web-CAM & control zoom-in & zoom-out. ####
	##################################################

	import mediapipe as mp
	import cv2
	import time
	import clsHandMotionScanner as hms
	import math
	import imutils
	import numpy as np

	from clsConfig import clsConfig as cf

	class clsVideoZoom():
	def __init__(self):
	self.title = str(cf.conf['TITLE'])
	self.minVal = float(cf.conf['minVal'])
	self.maxVal = int(cf.conf['maxVal'])

	def zoomVideo(self, image, Iscale=1):
	try:
	scale=Iscale

	#get the webcam size
	height, width, channels = image.shape

	#prepare the crop
	centerX,centerY=int(height/2),int(width/2)
	radiusX,radiusY= int(scalecenterX),int(scalecenterY)

	minX,maxX=centerX-radiusX,centerX+radiusX
	minY,maxY=centerY-radiusY,centerY+radiusY

	cropped = image[minX:maxX, minY:maxY]
	resized_cropped = cv2.resize(cropped, (width, height))

	return resized_cropped

	except Exception as e:
	x = str(e)

	return image

	def runSensor(self):
	try:
	pTime = 0
	cTime = 0
	zRange = 0
	zRangeBar = 0
	cap = cv2.VideoCapture(0)
	detector = hms.clsHandMotionScanner(detectionCon=0.7)

	while True:
	success,img = cap.read()
	img = imutils.resize(img, width=720)
	#img = detector.findHands(img, draw=False)
	#lmList = detector.findPosition(img, draw=False)

	img = detector.findHands(img)
	lmList = detector.findPosition(img, draw=False)

	if len(lmList) != 0:
	print(''60)
	#print(lmList[4], lmList[8])
	#print(''60)

	x1, y1 = lmList[4][1], lmList[4][2]
	x2, y2 = lmList[8][1], lmList[8][2]

	cx, cy = (x1+x2)//2, (y1+y2)//2

	cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
	cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)

	cv2.line(img, (x1,y1), (x2,y2), (255,0,255), 3)

	cv2.circle(img, (cx,cy), 15, (255,0,255), cv2.FILLED)

	lenVal = math.hypot(x2-x1, y2-y1)
	print('Length:', str(lenVal))
	print(''60)

	# Hand Range is from 50 to 270
	# Camera Zoom Range is 0.01, 1
	minVal = self.minVal
	maxVal = self.maxVal

	zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
	zRangeBar = np.interp(lenVal, [50, 270], [400, 150])

	print('Range: ', str(zRange))

	if lenVal < 50:
	cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)

	cv2.rectangle(img, (50, 150), (85, 400), (255,0,0), 3)
	cv2.rectangle(img, (50, int(zRangeBar)), (85, 400), (255,0,0), cv2.FILLED)

	cTime = time.time()
	fps = 1/(cTime-pTime)
	pTime = cTime


	image = cv2.flip(img, flipCode=1)
	cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
	cv2.imshow("Original Source",image)

	# Creating the new zoom video
	cropImg = self.zoomVideo(img, zRange)
	cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
	cv2.imshow("Zoomed Source",cropImg)

	if cv2.waitKey(1) == ord('q'):
	break

	cap.release()
	cv2.destroyAllWindows()

	return 0
	except Exception as e:
	x = str(e)
	print('Error:', x)

	return 1

view raw

clsVideoZoom.py

hosted with ❤ by GitHub

Key snippets from the above scripts –

def zoomVideo(self, image, Iscale=1):
    try:
        scale=Iscale

        #get the webcam size
        height, width, channels = image.shape

        #prepare the crop
        centerX,centerY=int(height/2),int(width/2)
        radiusX,radiusY= int(scale*centerX),int(scale*centerY)

        minX,maxX=centerX-radiusX,centerX+radiusX
        minY,maxY=centerY-radiusY,centerY+radiusY

        cropped = image[minX:maxX, minY:maxY]
        resized_cropped = cv2.resize(cropped, (width, height))

        return resized_cropped

    except Exception as e:
        x = str(e)

        return image

The above method will zoom in & zoom out depending upon the scale value that the human hand gesture will receive.

cap = cv2.VideoCapture(0)
detector = hms.clsHandMotionScanner(detectionCon=0.7)

The following lines will read the individual frames from webCAM. Instantiate another open-source customized class, which will find the hand’s position.

img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)

And captured the hand position depending upon the movements.

x1, y1 = lmList[4][1], lmList[4][2]
x2, y2 = lmList[8][1], lmList[8][2]

cx, cy = (x1+x2)//2, (y1+y2)//2

cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)

To understand the above lines, let’s look into the following diagram –

As one can see, the thumbs tip value is 4 & Index fingertip is 8. The application will mark these points with a solid circle.

lenVal = math.hypot(x2-x1, y2-y1)

The above line will calculate the distance between the thumbs tip & index fingertip.

# Camera Zoom Range is 0.01, 1

minVal = self.minVal
maxVal = self.maxVal

zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
zRangeBar = np.interp(lenVal, [50, 270], [400, 150])

In the above lines, the application will translate the values captured between the two fingertips & then translate them into a more meaningful camera zoom range from 0.01 to 1.

if lenVal < 50:
    cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)

The application will not consider a value below 50 as 0.01 for the WebCAM start value.

cTime = time.time()
fps = 1/(cTime-pTime)
pTime = cTime


image = cv2.flip(img, flipCode=1)
cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Original Source",image)

# Creating the new zoom video
cropImg = self.zoomVideo(img, zRange)
cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Zoomed Source",cropImg)

The application will capture the frame rate & share the original video frame and the test frame, where it will zoom in or out depending on the hand gesture.

3. clsHandMotionScanner.py (This is an enhance version of open source script, which will capture the hand position.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Modified On 23-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python class that will capture the ####
	#### human hand gesture on real-time basis ####
	#### and that will enable the video zoom ####
	#### capability of the feed directly coming ####
	#### out of a Web-CAM. ####
	##################################################

	import mediapipe as mp
	import cv2
	import time

	class clsHandMotionScanner():
	def __init__(self, mode=False, maxHands=2, detectionCon=0.5, modelComplexity=1, trackCon=0.5):
	self.mode = mode
	self.maxHands = maxHands
	self.detectionCon = detectionCon
	self.modelComplex = modelComplexity
	self.trackCon = trackCon
	self.mpHands = mp.solutions.hands
	self.hands = self.mpHands.Hands(self.mode, self.maxHands,self.modelComplex,self.detectionCon, self.trackCon)

	# it gives small dots onhands total 20 landmark points
	self.mpDraw = mp.solutions.drawing_utils

	def findHands(self, img, draw=True):
	try:
	# Send rgb image to hands
	imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
	self.results = self.hands.process(imgRGB)

	# process the frame
	if self.results.multi_hand_landmarks:
	for handLms in self.results.multi_hand_landmarks:

	if draw:
	#Draw dots and connect them
	self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)

	return img
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return img

	def findPosition(self, img, handNo=0, draw=True):
	try:
	lmlist = []

	# check wether any landmark was detected
	if self.results.multi_hand_landmarks:
	#Which hand are we talking about
	myHand = self.results.multi_hand_landmarks[handNo]
	# Get id number and landmark information
	for id, lm in enumerate(myHand.landmark):
	# id will give id of landmark in exact index number
	# height width and channel
	h,w,c = img.shape
	#find the position
	cx,cy = int(lm.xw), int(lm.yh) #center
	#print(id,cx,cy)
	lmlist.append([id,cx,cy])

	# Draw circle for 0th landmark
	if draw:
	cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)

	return lmlist
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	lmlist = []
	return lmlist

view raw

clsHandMotionScanner.py

hosted with ❤ by GitHub

Key snippets from the above script –

def findHands(self, img, draw=True):
    try:
        # Send rgb image to hands
        imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        # process the frame
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:

                if draw:
                    #Draw dots and connect them
                    self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)

        return img
    except Exception as e:
        x = str(e)
        print('Error: ', x)

        return img

The above function will identify individual key points & marked them as dots on top of human hands.

def findPosition(self, img, handNo=0, draw=True):
      try:
          lmlist = []

          # check wether any landmark was detected
          if self.results.multi_hand_landmarks:
              #Which hand are we talking about
              myHand = self.results.multi_hand_landmarks[handNo]
              # Get id number and landmark information
              for id, lm in enumerate(myHand.landmark):
                  # id will give id of landmark in exact index number
                  # height width and channel
                  h,w,c = img.shape
                  #find the position - center
                  cx,cy = int(lm.x*w), int(lm.y*h) 
                  lmlist.append([id,cx,cy])

              # Draw circle for 0th landmark
              if draw:
                  cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)

          return lmlist
      except Exception as e:
          x = str(e)
          print('Error: ', x)

          lmlist = []
          return lmlist

The above line will capture the position of each media pipe point along with the x & y coordinate & store them in a list, which will be later parsed for main use case.

4. viewHandMotion.py (Main calling script.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 23-May-2022 ####
	#### Modified On 23-May-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsVideoZoom class to initiate ####
	#### the model to read the real-time ####
	#### hand movements gesture that enables ####
	#### video zoom control. ####
	##################################################

	import time
	import clsVideoZoom as vz
	from clsConfig import clsConfig as cf
	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating the base class

	x1 = vz.clsVideoZoom()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'visualZoom.log', level=logging.INFO)

	print('Started Visual-Zoom Emotions!')

	r1 = x1.runSensor()

	if (r1 == 0):
	print('Successfully identified visual zoom!')
	else:
	print('Failed to identify the visual zoom!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)


	if __name__ == "__main__":
	main()

view raw

viewHandMotion.py

hosted with ❤ by GitHub

The above lines are self-explanatory. So, I’m not going to discuss anything on this script.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

So, we’ve done it.

You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Oracle procedure using Java

Posted on January 14, 2010 by SatyakiDe in call, clob, dbms_dwrite_file, dbms_java, extends, filenotfoundexception, filepermission, grant_permission, ioexception, java, objects, operating system, println, serveroutput, vpdb

Today, i’m going to discuss another powerful feature of Oracle. That is embedding your Java code inside Oracle Procedures. This gives a lot of flexibility & power to Oracle and certainly you can do plenty of things which generally are very difficult to implement directly.

In this purpose i cannot restrict myself to explanation made by Bulusu Lakshman and that is –

From Oracle 9i a new environments are taking place where Java and PL/SQL can interact as two major database languages. There are many advantages to using both languages –

PL/SQL Advantage:

Intensive Database Access – It is faster than Java.
Oracle Specific Functionality that has no equivalent in Java such as using dbms_lock & dbms_alert.
Using the same data types and language construct as SQL providing seamless access to the database.

JAVA Advantage:

Automatic garbage collection, polymorphism, inheritance, multi-threading
Access to system resources outside of the database such as OS commands, files, sockets
Functionality not avialable in PL/SQL such as OS Commands, fine-grained security policies, image generation, easy sending of e-mails with attachements using JavaMail.

But, i dis-agree with him in case of fine grained security policies as Oracle has drastically improves it and introduces security policies like – VPDB (Virtual Private Database) & Database Vault. Anyway, we’ll discuss these topics on some other day.

For better understanding i’m follow categories and we will explore them one by one. Hope you get some basic idea on this powerful feature by Oracle.

Before proceed we have to know the basics of the main ingredients called dbms_java .

We’ve to prepare the environment.

In Sys,

sys@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.00
sys@ORCL>
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.lang.RuntimePermission','writeFileDescriptor','');

PL/SQL procedure successfully completed.

Elapsed: 00:00:53.54
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.lang.RuntimePermission','readFileDescriptor','');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.io.FilePermission','D:\Java_Output\*.*','read,write,execute,delete');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
sys@ORCL>

Let’s concentrate on our test cases.

In Scott,

Type: 1

scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:02.77
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace and compile java source named "Print_Hello"
  2  as
  3    import java.io.*;
  4    public class Print_Hello
  5      {
  6         public static void dislay()
  7           {
  8             System.out.println("Hello World...... In Java Through Oracle....... ");
  9           }
 10      };
 11  /

Java created.

Elapsed: 00:00:44.17
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure java_print
  2  as
  3    language java name 'Print_Hello.dislay()';
  4  /

Procedure created.

Elapsed: 00:00:01.39
scott@ORCL>
scott@ORCL>call dbms_java.set_output(1000000);

Call completed.

Elapsed: 00:00:00.34
scott@ORCL>
scott@ORCL>set serveroutput on size 1000000;
scott@ORCL>
scott@ORCL>exec java_print;
Hello World...... In Java Through Oracle.......

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.22
scott@ORCL>

Type: 2 (Returning Value from JAVA)

scott@ORCL>
scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.13
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace and resolve java source named "ReturnVal"
  2  as
  3    import java.io.*;
  4
  5    public class ReturnVal extends Object
  6      {
  7          public static String Display()
  8          throws IOException
  9          {
 10             return "Hello World";
 11          }
 12     };
 13  /

Java created.

Elapsed: 00:00:00.22
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace function ReturnVal
  2  return varchar2
  3  is
  4     language java
  5     name 'ReturnVal.Display() return String';
  6  /

Function created.

Elapsed: 00:00:00.00
scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(1000000);

Call completed.

Elapsed: 00:00:00.00
scott@ORCL>
scott@ORCL>
scott@ORCL>column ReturnVal format a15
scott@ORCL>
scott@ORCL>
scott@ORCL>
scott@ORCL>
scott@ORCL>select ReturnVal from dual;

RETURNVAL
---------------
Hello World

Elapsed: 00:00:00.12
scott@ORCL>
scott@ORCL>

So, you can return the value from the compiled Java source, too.

Type: 3 (Reading console value into JAVA)

scott@ORCL>ed
Wrote file C:\OracleSpoolBuf\BUF.SQL

  1  create or replace java source named "ConsoleRead"
  2  as
  3    import java.io.*;
  4    class ConsoleRead
  5      {
  6        public static void RDisplay(String Det)
  7          {
  8             String dd = Det;
  9             System.out.println("Value Passed In Java Is: " + dd);
 10             System.out.println("Exiting from the Java .....");
 11          }
 12*     };
 13  /

Java created.

scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure java_UserInput(InputStr in varchar2)
  2     as
  3       language java
  4       name 'ConsoleRead.RDisplay(java.lang.String)';
  5  /

Procedure created.

scott@ORCL>

scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

scott@ORCL>
scott@ORCL>
scott@ORCL>set serveroutput on size 100000
scott@ORCL>
scott@ORCL>exec java_UserInput('Satyaki');
Value Passed In Java Is: Satyaki
Exiting from the Java .....

PL/SQL procedure successfully completed.

scott@ORCL>

Type: 4 (Reading file from OS directory using JAVA)

scott@ORCL>ed
Wrote file C:\OracleSpoolBuf\BUF.SQL

  1  create or replace java source named "ReadTextFile"
  2  as
  3    import java.io.*;
  4    class ReadTextFile
  5  {
  6     public static void Process(String FileName) throws IOException
  7       {
  8          int i;
  9          FileInputStream fin;
 10          try
 11           {
 12             fin = new FileInputStream(FileName);
 13           }
 14         catch(FileNotFoundException e)
 15           {
 16              System.out.println("File Not Found....");
 17              return;
 18           }
 19          catch(ArrayIndexOutOfBoundsException e)
 20           {
 21              System.out.println("Usage: showFile File");
 22              return;
 23           }
 24         do
 25          {
 26             i = fin.read();
 27             if(i != 1)
 28               System.out.println((char) i);
 29          }while(i != 1);
 30         fin.close();
 31       }
 32*  };
 33  /

Java created.

scott@ORCL>
scott@ORCL>create or replace procedure Java_ReadTextFile(FileNameWithPath in varchar2)
  2  as
  3    language java
  4    name 'ReadTextFile.Process(java.lang.String)';
  5  /

Procedure created.

scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

scott@ORCL>
scott@ORCL>
scott@ORCL>exec Java_ReadTextFile('D:\Java_Output\Trial.txt');

Type: 4 (Writing file in OS directory using JAVA)

In Scott,

scott@ORCL>
scott@ORCL>create or replace java source named "DynWriteTextFile"
  2  as
  3    import java.io.*;
  4  class DynWriteTextFile
  5  {
  6    public static void proc(String ctent,String FlNameWithPath) throws IOException
  7      {
  8        int i,j;
  9        String FileNm = FlNameWithPath;
 10        RandomAccessFile rFile;
 11
 12        try
 13         {
 14           rFile = new RandomAccessFile(FileNm,"rw");
 15         }
 16        catch(FileNotFoundException e)
 17         {
 18            System.out.println("Error Writing Output File....");
 19            return;
 20         }
 21
 22        try
 23         {
 24           int ch;
 25
 26           System.out.println("Processing starts...");
 27
 28           ch = ctent.length();
 29
 30           rFile.seek(rFile.length());
 31           for(int k=0; k<ch; k=k+ctent.length())
 32            {
 33               rFile.writeBytes(ctent);
 34            }
 35         }
 36        catch(IOException e)
 37         {
 38           System.out.println("File Error....");
 39         }
 40        finally
 41     {
 42               try
 43        {
 44                  System.out.println("Successfully file generated....");
 45                  rFile.close();
 46        }
 47          catch(IOException oe)
 48      {
 49                System.out.println("Exception in the catch block of finally is: " +oe);
 50                System.exit(0);
 51      }
 52        }
 53     }
 54  };
 55  /

Java created.

Elapsed: 00:00:00.17
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure JavaDyn_WriteTextFile(para in varchar2,FileNameWithPath in varchar2)
  2  as
  3    language JAVA
  4    name 'DynWriteTextFile.proc(java.lang.String, java.lang.String)';
  5  /

Procedure created.

Elapsed: 00:00:00.15
scott@ORCL>

In Sys,

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

sys@ORCL>set timi on
sys@ORCL>
sys@ORCL>
sys@ORCL>create or replace public synonym dbms_dwrite_file for scott.JavaDyn_WriteTextFile;

Synonym created.

Elapsed: 00:00:00.08
sys@ORCL>
sys@ORCL>grant execute on dbms_dwrite_file to scott;

Grant succeeded.

Elapsed: 00:00:00.18
sys@ORCL>

In Scott,

scott@ORCL>
scott@ORCL>create or replace procedure DWrite_Content(dt in date,FileNmWithPath in varchar2)
  2  is
  3    cursor c1
  4    is
  5      select empno,ename,sal
  6      from emp
  7      where hiredate = dt;
  8    r1 c1%rowtype;
  9
 10    str  varchar2(500);
 11  begin
 12     str:= replace(FileNmWithPath,'\','\\');
 13     dbms_dwrite_file('Employee No'||' '||'First Name'||' '||'Salary',str);
 14     dbms_dwrite_file(chr(10),str);
 15     dbms_dwrite_file('---------------------------------------------------',str);
 16     dbms_dwrite_file(chr(10),str);
 17     for r1 in c1
 18     loop
 19       dbms_dwrite_file(r1.empno||' '||r1.ename||' '||r1.sal,str);
 20       dbms_dwrite_file(chr(10),str);
 21     end loop;
 22  exception
 23      when others then
 24        dbms_output.put_line(sqlerrm);
 25  end;
 26  /

Procedure created.

Elapsed: 00:00:00.43
scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

Elapsed: 00:00:00.02
scott@ORCL>
scott@ORCL>exec DWrite_Content(to_date('21-JUN-1999','DD-MON-YYYY'),'D:\Java_Output\satyaki.txt');
Processing starts...
Successfully file generated....

PL/SQL procedure successfully completed.

Hope, this thread will give you some basic idea about using your Java code with Oracle PL/SQL.

I’ll discuss another topic very soon. Till then – Keep following. 😉

Regards.

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Category: extends

Real-time Zoom-In/Zoom-Out using Python-based Computer Vision

Like this:

Oracle procedure using Java

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: