Realtime reading from a Streaming using Computer Vision

This week we’re going to extend one of our earlier posts & trying to read an entire text from streaming using computer vision. If you want to view the previous post, please click the following link.

But, before we proceed, why don’t we view the demo first?

Demo

Architecture:

Let us understand the architecture flow –

Architecture flow

The above diagram shows that the application, which uses the Open-CV, analyzes individual frames from the source & extracts the complete text within the video & displays it on top of the target screen besides prints the same in the console.

Python Packages:

pip install imutils==0.5.4
pip install matplotlib==3.5.2
pip install numpy==1.21.6
pip install opencv-contrib-python==4.6.0.66
pip install opencv-contrib-python-headless==4.6.0.66
pip install opencv-python==4.6.0.66
pip install opencv-python-headless==4.6.0.66
pip install pandas==1.3.5
pip install Pillow==9.1.1
pip install pytesseract==0.3.9
pip install python-dateutil==2.8.2

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

  • clsReadingTextFromStream.py (This is the main class of python script that will extract the text from the WebCAM streaming in real-time.)


##################################################
#### Written By: SATYAKI DE ####
#### Written On: 22-Jul-2022 ####
#### Modified On 25-Jul-2022 ####
#### ####
#### Objective: This is the main class of ####
#### python script that will invoke the ####
#### extraction of texts from a WebCAM. ####
#### ####
##################################################
# Importing necessary packages
from clsConfig import clsConfig as cf
from imutils.object_detection import non_max_suppression
import numpy as np
import pytesseract
import imutils
import time
import cv2
import time
###############################################
### Global Section ###
###############################################
# Two output layer names for the text detector model
lNames = cf.conf['LAYER_DET']
# Tesseract OCR text param values
strVal = "-l " + str(cf.conf['LANG']) + " –oem " + str(cf.conf['OEM_VAL']) + " –psm " + str(cf.conf['PSM_VAL']) + ""
config = (strVal)
###############################################
### End of Global Section ###
###############################################
class clsReadingTextFromStream:
def __init__(self):
self.sep = str(cf.conf['SEP'])
self.Curr_Path = str(cf.conf['INIT_PATH'])
self.CacheL = int(cf.conf['CACHE_LIM'])
self.modelPath = str(cf.conf['MODEL_PATH']) + str(cf.conf['MODEL_FILE_NAME'])
self.minConf = float(cf.conf['MIN_CONFIDENCE'])
self.wt = int(cf.conf['WIDTH'])
self.ht = int(cf.conf['HEIGHT'])
self.pad = float(cf.conf['PADDING'])
self.title = str(cf.conf['TITLE'])
self.Otitle = str(cf.conf['ORIG_TITLE'])
self.drawTag = cf.conf['DRAW_TAG']
self.aRange = int(cf.conf['ASCII_RANGE'])
self.sParam = cf.conf['SUBTRACT_PARAM']
def findBoundBox(self, boxes, res, rW, rH, orig, origW, origH, pad):
try:
# Loop over the bounding boxes
for (spX, spY, epX, epY) in boxes:
# Scale the bounding box coordinates based on the respective
# ratios
spX = int(spX * rW)
spY = int(spY * rH)
epX = int(epX * rW)
epY = int(epY * rH)
# To obtain a better OCR of the text we can potentially
# apply a bit of padding surrounding the bounding box.
# And, computing the deltas in both the x and y directions
dX = int((epX spX) * pad)
dY = int((epY spY) * pad)
# Apply padding to each side of the bounding box, respectively
spX = max(0, spX dX)
spY = max(0, spY dY)
epX = min(origW, epX + (dX * 2))
epY = min(origH, epY + (dY * 2))
# Extract the actual padded ROI
roi = orig[spY:epY, spX:epX]
# Choose the proper OCR Config
text = pytesseract.image_to_string(roi, config=config)
# Add the bounding box coordinates and OCR'd text to the list
# of results
res.append(((spX, spY, epX, epY), text))
# Sort the results bounding box coordinates from top to bottom
res = sorted(res, key=lambda r:r[0][1])
return res
except Exception as e:
x = str(e)
print(x)
return res
def predictText(self, imgScore, imgGeo):
try:
minConf = self.minConf
# Initializing the bounding box rectangles & confidence score by
# extracting the rows & columns from the imgScore volume.
(numRows, numCols) = imgScore.shape[2:4]
rects = []
confScore = []
for y in range(0, numRows):
# Extract the imgScore probabilities to derive potential
# bounding box coordinates that surround text
imgScoreData = imgScore[0, 0, y]
xVal0 = imgGeo[0, 0, y]
xVal1 = imgGeo[0, 1, y]
xVal2 = imgGeo[0, 2, y]
xVal3 = imgGeo[0, 3, y]
anglesData = imgGeo[0, 4, y]
for x in range(0, numCols):
# If our score does not have sufficient probability,
# ignore it
if imgScoreData[x] < minConf:
continue
# Compute the offset factor as our resulting feature
# maps will be 4x smaller than the input frame
(offX, offY) = (x * 4.0, y * 4.0)
# Extract the rotation angle for the prediction and
# then compute the sin and cosine
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
# Derive the width and height of the bounding box from
# imgGeo
h = xVal0[x] + xVal2[x]
w = xVal1[x] + xVal3[x]
# Compute both the starting and ending (x, y)-coordinates
# for the text prediction bounding box
epX = int(offX + (cos * xVal1[x]) + (sin * xVal2[x]))
epY = int(offY (sin * xVal1[x]) + (cos * xVal2[x]))
spX = int(epX w)
spY = int(epY h)
# Adding bounding box coordinates and probability score
# to the respective lists
rects.append((spX, spY, epX, epY))
confScore.append(imgScoreData[x])
# return a tuple of the bounding boxes and associated confScore
return (rects, confScore)
except Exception as e:
x = str(e)
print(x)
rects = []
confScore = []
return (rects, confScore)
def processStream(self, debugInd, var):
try:
sep = self.sep
Curr_Path = self.Curr_Path
CacheL = self.CacheL
modelPath = self.modelPath
minConf = self.minConf
wt = self.wt
ht = self.ht
pad = self.pad
title = self.title
Otitle = self.Otitle
drawTag = self.drawTag
aRange = self.aRange
sParam = self.sParam
val = 0
# Initialize the video stream and allow the camera sensor to warm up
print("[INFO] Starting video stream…")
cap = cv2.VideoCapture(0)
# Loading the pre-trained text detector
print("[INFO] Loading Text Detector…")
net = cv2.dnn.readNet(modelPath)
# Loop over the frames from the video stream
while True:
try:
# Grab the frame from our video stream and resize it
success, frame = cap.read()
orig = frame.copy()
(origH, origW) = frame.shape[:2]
# Setting new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (wt, ht)
rW = origW / float(newW)
rH = origH / float(newH)
# Resize the frame and grab the new frame dimensions
frame = cv2.resize(frame, (newW, newH))
(H, W) = frame.shape[:2]
# Construct a blob from the frame and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
net.setInput(blob)
(confScore, imgGeo) = net.forward(lNames)
# Decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = self.predictText(confScore, imgGeo)
boxes = non_max_suppression(np.array(rects), probs=confidences)
# Initialize the list of results
res = []
# Getting BoundingBox boundaries
res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)
for ((spX, spY, epX, epY), text) in res:
# Display the text OCR by using Tesseract APIs
print("Reading Text::")
print("=" *60)
print(text)
print("=" *60)
# Removing the non-ASCII text so it can draw the text on the frame
# using OpenCV, then draw the text and a bounding box surrounding
# the text region of the input frame
text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
output = orig.copy()
cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
cv2.putText(output, text, (spX, spY 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)
# Show the output frame
cv2.imshow(title, output)
#cv2.imshow(Otitle, frame)
# If the `q` key was pressed, break from the loop
if cv2.waitKey(1) == ord('q'):
break
val = 0
except Exception as e:
x = str(e)
print(x)
val = 1
# Performing cleanup at the end
cap.release()
cv2.destroyAllWindows()
return val
except Exception as e:
x = str(e)
print('Error:', x)
return 1

Please find the key snippet from the above script –

# Two output layer names for the text detector model

lNames = cf.conf['LAYER_DET']

# Tesseract OCR text param values

strVal = "-l " + str(cf.conf['LANG']) + " --oem " + str(cf.conf['OEM_VAL']) + " --psm " + str(cf.conf['PSM_VAL']) + ""
config = (strVal)

The first line contains the two output layers’ names for the text detector model. Among them, the first one indicates the outcome possibilities & the second one use to derive the bounding box coordinates of the predicted text.

The second line contains various options for the tesseract APIs. You need to understand the opportunities in detail to make them work. These are the essential options for our use case –

  • Language – The intended language, for example, English, Spanish, Hindi, Bengali, etc.
  • OEM flag – In this case, the application will use 4 to indicate LSTM neural net model for OCR.
  • OEM Value – In this case, the selected value is 7, indicating that the application treats the ROI as a single line of text.

For more details, please refer to the config file.

print("[INFO] Loading Text Detector...")
net = cv2.dnn.readNet(modelPath)

The above lines bring the already created model & load it to memory for evaluation.

# Setting new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (wt, ht)
rW = origW / float(newW)
rH = origH / float(newH)

# Resize the frame and grab the new frame dimensions
frame = cv2.resize(frame, (newW, newH))
(H, W) = frame.shape[:2]

# Construct a blob from the frame and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
net.setInput(blob)
(confScore, imgGeo) = net.forward(lNames)

# Decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = self.predictText(confScore, imgGeo)
boxes = non_max_suppression(np.array(rects), probs=confidences)

The above lines are more of preparing individual frames to get the bounding box by resizing the height & width followed by a forward pass of the model to obtain two output layer sets. And then apply the non-maxima suppression to remove the weak, overlapping bounding box by interpreting the prediction. In short, this will identify the potential text region & put the bounding box surrounding it.

# Initialize the list of results
res = []

# Getting BoundingBox boundaries
res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)

The above function will create the bounding box surrounding the predicted text regions. Also, we will capture the expected text inside the result variable.

for (spX, spY, epX, epY) in boxes:
  # Scale the bounding box coordinates based on the respective
  # ratios
  spX = int(spX * rW)
  spY = int(spY * rH)
  epX = int(epX * rW)
  epY = int(epY * rH)

  # To obtain a better OCR of the text we can potentially
  # apply a bit of padding surrounding the bounding box.
  # And, computing the deltas in both the x and y directions
  dX = int((epX - spX) * pad)
  dY = int((epY - spY) * pad)

  # Apply padding to each side of the bounding box, respectively
  spX = max(0, spX - dX)
  spY = max(0, spY - dY)
  epX = min(origW, epX + (dX * 2))
  epY = min(origH, epY + (dY * 2))

  # Extract the actual padded ROI
  roi = orig[spY:epY, spX:epX]

Now, the application will scale the bounding boxes based on the previously computed ratio for actual text recognition. In this process, the application also padded the bounding boxes & then extracted the padded region of interest.

# Choose the proper OCR Config
text = pytesseract.image_to_string(roi, config=config)

# Add the bounding box coordinates and OCR'd text to the list
# of results
res.append(((spX, spY, epX, epY), text))

Using OCR options, the application extracts the text within the video frame & adds that to the res list.

# Sort the results bounding box coordinates from top to bottom
res = sorted(res, key=lambda r:r[0][1])

It then sends a sorted output to the primary calling functions.

for ((spX, spY, epX, epY), text) in res:
  # Display the text OCR by using Tesseract APIs
  print("Reading Text::")
  print("=" *60)
  print(text)
  print("=" *60)

  # Removing the non-ASCII text so it can draw the text on the frame
  # using OpenCV, then draw the text and a bounding box surrounding
  # the text region of the input frame
  text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
  output = orig.copy()

  cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
  cv2.putText(output, text, (spX, spY - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)

  # Show the output frame
  cv2.imshow(title, output)

Finally, it fetches the potential text region along with the text & then prints on top of the source video. Also, it removed some non-printable characters during this time to avoid any cryptic texts.

  • readingVideo.py (Main calling script.)


#####################################################
#### Written By: SATYAKI DE ####
#### Written On: 22-Jul-2022 ####
#### Modified On 25-Jul-2022 ####
#### ####
#### Objective: This is the main calling ####
#### python script that will invoke the ####
#### clsReadingTextFromStream class to initiate ####
#### the reading capability in real-time ####
#### & display text via Web-CAM. ####
#####################################################
# We keep the setup code in a different class as shown below.
import clsReadingTextFromStream as rtfs
from clsConfig import clsConfig as cf
import datetime
import logging
###############################################
### Global Section ###
###############################################
# Instantiating all the main class
x1 = rtfs.clsReadingTextFromStream()
###############################################
### End of Global Section ###
###############################################
def main():
try:
# Other useful variables
debugInd = 'Y'
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
var1 = datetime.datetime.now()
print('Start Time: ', str(var))
# End of useful variables
# Initiating Log Class
general_log_path = str(cf.conf['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path + 'readingTextFromVideo.log', level=logging.INFO)
print('Started reading text from videos!')
# Execute all the pass
r1 = x1.processStream(debugInd, var)
if (r1 == 0):
print('Successfully read text from the Live Stream!')
else:
print('Failed to read text from the Live Stream!')
var2 = datetime.datetime.now()
c = var2 var1
minutes = c.total_seconds() / 60
print('Total difference in minutes: ', str(minutes))
print('End Time: ', str(var1))
except Exception as e:
x = str(e)
print('Error: ', x)
if __name__ == "__main__":
main()

view raw

readingVideo.py

hosted with ❤ by GitHub

Please find the key snippet –

# Instantiating all the main class

x1 = rtfs.clsReadingTextFromStream()

# Execute all the pass
r1 = x1.processStream(debugInd, var)

if (r1 == 0):
    print('Successfully read text from the Live Stream!')
else:
    print('Failed to read text from the Live Stream!')

The above lines instantiate the main calling class & then invoke the function to get the desired extracted text from the live streaming video if that is successful.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

You will get the complete codebase in the following Github link.

Unfortunately, I cannot upload the model due to it’s size. I will share on the need basis.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Real-time Zoom-In/Zoom-Out using Python-based Computer Vision

Hi Guys,

Today, I’ll be using another exciting installment of Computer Vision. The application will read the real-time human hand gesture to control WebCAM’s zoom-in or zoom-out capability.

Why don’t we see the demo first before jumping into the technical details?

Demo

Architecture:

Let us understand the architecture –

Broad Diagram

As one can see, the application reads individual frames from WebCAM & then map the human hand gestures with a media pipe. And finally, calculate the distance between particular pipe points projected on human hands.

Let’s take another depiction of the experiment to better understand the above statement.

Camera & Subject Position

Python Packages:

Following are the python packages that are necessary to develop this brilliant use case –

pip install mediapipe
pip install opencv-python

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

  1. clsConfig.py (Configuration script for the application.)


################################################
#### Written By: SATYAKI DE ####
#### Written On: 15-May-2020 ####
#### Modified On: 24-May-2022 ####
#### ####
#### Objective: This script is a config ####
#### file, contains all the keys for ####
#### Machine-Learning & streaming dashboard.####
#### ####
################################################
import os
import platform as pl
class clsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))
os_det = pl.system()
if os_det == "Windows":
sep = '\\'
else:
sep = '/'
conf = {
'APP_ID': 1,
'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
'LOG_PATH': Curr_Path + sep + 'log' + sep,
'REPORT_PATH': Curr_Path + sep + 'report',
'SRC_PATH': Curr_Path + sep + 'data' + sep,
'FINAL_PATH': Curr_Path + sep + 'Target' + sep,
'APP_DESC_1': 'Hand Gesture Zoom Control!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path,
'SUBDIR': 'data',
'SEP': sep,
'TITLE': "Human Hand Gesture Controlling App",
'minVal':0.01,
'maxVal':1
}

view raw

clsConfig.py

hosted with ❤ by GitHub

2. clsVideoZoom.py (This script will zoom the video streaming depending upon the hand gestures.)


##################################################
#### Written By: SATYAKI DE ####
#### Written On: 23-May-2022 ####
#### Modified On 24-May-2022 ####
#### ####
#### Objective: This is the main calling ####
#### python script that will invoke the ####
#### clsVideoZoom class to initiate ####
#### the model to read the real-time ####
#### human hand gesture from video ####
#### Web-CAM & control zoom-in & zoom-out. ####
##################################################
import mediapipe as mp
import cv2
import time
import clsHandMotionScanner as hms
import math
import imutils
import numpy as np
from clsConfig import clsConfig as cf
class clsVideoZoom():
def __init__(self):
self.title = str(cf.conf['TITLE'])
self.minVal = float(cf.conf['minVal'])
self.maxVal = int(cf.conf['maxVal'])
def zoomVideo(self, image, Iscale=1):
try:
scale=Iscale
#get the webcam size
height, width, channels = image.shape
#prepare the crop
centerX,centerY=int(height/2),int(width/2)
radiusX,radiusY= int(scale*centerX),int(scale*centerY)
minX,maxX=centerXradiusX,centerX+radiusX
minY,maxY=centerYradiusY,centerY+radiusY
cropped = image[minX:maxX, minY:maxY]
resized_cropped = cv2.resize(cropped, (width, height))
return resized_cropped
except Exception as e:
x = str(e)
return image
def runSensor(self):
try:
pTime = 0
cTime = 0
zRange = 0
zRangeBar = 0
cap = cv2.VideoCapture(0)
detector = hms.clsHandMotionScanner(detectionCon=0.7)
while True:
success,img = cap.read()
img = imutils.resize(img, width=720)
#img = detector.findHands(img, draw=False)
#lmList = detector.findPosition(img, draw=False)
img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)
if len(lmList) != 0:
print('*'*60)
#print(lmList[4], lmList[8])
#print('*'*60)
x1, y1 = lmList[4][1], lmList[4][2]
x2, y2 = lmList[8][1], lmList[8][2]
cx, cy = (x1+x2)//2, (y1+y2)//2
cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)
cv2.line(img, (x1,y1), (x2,y2), (255,0,255), 3)
cv2.circle(img, (cx,cy), 15, (255,0,255), cv2.FILLED)
lenVal = math.hypot(x2x1, y2y1)
print('Length:', str(lenVal))
print('*'*60)
# Hand Range is from 50 to 270
# Camera Zoom Range is 0.01, 1
minVal = self.minVal
maxVal = self.maxVal
zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
zRangeBar = np.interp(lenVal, [50, 270], [400, 150])
print('Range: ', str(zRange))
if lenVal < 50:
cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)
cv2.rectangle(img, (50, 150), (85, 400), (255,0,0), 3)
cv2.rectangle(img, (50, int(zRangeBar)), (85, 400), (255,0,0), cv2.FILLED)
cTime = time.time()
fps = 1/(cTimepTime)
pTime = cTime
image = cv2.flip(img, flipCode=1)
cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Original Source",image)
# Creating the new zoom video
cropImg = self.zoomVideo(img, zRange)
cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Zoomed Source",cropImg)
if cv2.waitKey(1) == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
return 0
except Exception as e:
x = str(e)
print('Error:', x)
return 1

view raw

clsVideoZoom.py

hosted with ❤ by GitHub

Key snippets from the above scripts –

def zoomVideo(self, image, Iscale=1):
    try:
        scale=Iscale

        #get the webcam size
        height, width, channels = image.shape

        #prepare the crop
        centerX,centerY=int(height/2),int(width/2)
        radiusX,radiusY= int(scale*centerX),int(scale*centerY)

        minX,maxX=centerX-radiusX,centerX+radiusX
        minY,maxY=centerY-radiusY,centerY+radiusY

        cropped = image[minX:maxX, minY:maxY]
        resized_cropped = cv2.resize(cropped, (width, height))

        return resized_cropped

    except Exception as e:
        x = str(e)

        return image

The above method will zoom in & zoom out depending upon the scale value that the human hand gesture will receive.

cap = cv2.VideoCapture(0)
detector = hms.clsHandMotionScanner(detectionCon=0.7)

The following lines will read the individual frames from webCAM. Instantiate another open-source customized class, which will find the hand’s position.

img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)

And captured the hand position depending upon the movements.

x1, y1 = lmList[4][1], lmList[4][2]
x2, y2 = lmList[8][1], lmList[8][2]

cx, cy = (x1+x2)//2, (y1+y2)//2

cv2.circle(img, (x1,y1), 15, (255,0,255), cv2.FILLED)
cv2.circle(img, (x2,y2), 15, (255,0,255), cv2.FILLED)

To understand the above lines, let’s look into the following diagram –

Source: Mediapipe

As one can see, the thumbs tip value is 4 & Index fingertip is 8. The application will mark these points with a solid circle.

lenVal = math.hypot(x2-x1, y2-y1)

The above line will calculate the distance between the thumbs tip & index fingertip.

# Camera Zoom Range is 0.01, 1

minVal = self.minVal
maxVal = self.maxVal

zRange = np.interp(lenVal, [50, 270], [minVal, maxVal])
zRangeBar = np.interp(lenVal, [50, 270], [400, 150])

In the above lines, the application will translate the values captured between the two fingertips & then translate them into a more meaningful camera zoom range from 0.01 to 1.

if lenVal < 50:
    cv2.circle(img, (cx,cy), 15, (0,255,0), cv2.FILLED)

The application will not consider a value below 50 as 0.01 for the WebCAM start value.

cTime = time.time()
fps = 1/(cTime-pTime)
pTime = cTime


image = cv2.flip(img, flipCode=1)
cv2.putText(image, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Original Source",image)

# Creating the new zoom video
cropImg = self.zoomVideo(img, zRange)
cv2.putText(cropImg, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow("Zoomed Source",cropImg)

The application will capture the frame rate & share the original video frame and the test frame, where it will zoom in or out depending on the hand gesture.

3. clsHandMotionScanner.py (This is an enhance version of open source script, which will capture the hand position.)


##################################################
#### Written By: SATYAKI DE ####
#### Modified On 23-May-2022 ####
#### ####
#### Objective: This is the main calling ####
#### python class that will capture the ####
#### human hand gesture on real-time basis ####
#### and that will enable the video zoom ####
#### capability of the feed directly coming ####
#### out of a Web-CAM. ####
##################################################
import mediapipe as mp
import cv2
import time
class clsHandMotionScanner():
def __init__(self, mode=False, maxHands=2, detectionCon=0.5, modelComplexity=1, trackCon=0.5):
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.modelComplex = modelComplexity
self.trackCon = trackCon
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands,self.modelComplex,self.detectionCon, self.trackCon)
# it gives small dots onhands total 20 landmark points
self.mpDraw = mp.solutions.drawing_utils
def findHands(self, img, draw=True):
try:
# Send rgb image to hands
imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
# process the frame
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
#Draw dots and connect them
self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)
return img
except Exception as e:
x = str(e)
print('Error: ', x)
return img
def findPosition(self, img, handNo=0, draw=True):
try:
lmlist = []
# check wether any landmark was detected
if self.results.multi_hand_landmarks:
#Which hand are we talking about
myHand = self.results.multi_hand_landmarks[handNo]
# Get id number and landmark information
for id, lm in enumerate(myHand.landmark):
# id will give id of landmark in exact index number
# height width and channel
h,w,c = img.shape
#find the position
cx,cy = int(lm.x*w), int(lm.y*h) #center
#print(id,cx,cy)
lmlist.append([id,cx,cy])
# Draw circle for 0th landmark
if draw:
cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)
return lmlist
except Exception as e:
x = str(e)
print('Error: ', x)
lmlist = []
return lmlist

Key snippets from the above script –

def findHands(self, img, draw=True):
    try:
        # Send rgb image to hands
        imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        # process the frame
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:

                if draw:
                    #Draw dots and connect them
                    self.mpDraw.draw_landmarks(img,handLms,self.mpHands.HAND_CONNECTIONS)

        return img
    except Exception as e:
        x = str(e)
        print('Error: ', x)

        return img

The above function will identify individual key points & marked them as dots on top of human hands.

def findPosition(self, img, handNo=0, draw=True):
      try:
          lmlist = []

          # check wether any landmark was detected
          if self.results.multi_hand_landmarks:
              #Which hand are we talking about
              myHand = self.results.multi_hand_landmarks[handNo]
              # Get id number and landmark information
              for id, lm in enumerate(myHand.landmark):
                  # id will give id of landmark in exact index number
                  # height width and channel
                  h,w,c = img.shape
                  #find the position - center
                  cx,cy = int(lm.x*w), int(lm.y*h) 
                  lmlist.append([id,cx,cy])

              # Draw circle for 0th landmark
              if draw:
                  cv2.circle(img,(cx,cy), 15 , (255,0,255), cv2.FILLED)

          return lmlist
      except Exception as e:
          x = str(e)
          print('Error: ', x)

          lmlist = []
          return lmlist

The above line will capture the position of each media pipe point along with the x & y coordinate & store them in a list, which will be later parsed for main use case.

4. viewHandMotion.py (Main calling script.)


##################################################
#### Written By: SATYAKI DE ####
#### Written On: 23-May-2022 ####
#### Modified On 23-May-2022 ####
#### ####
#### Objective: This is the main calling ####
#### python script that will invoke the ####
#### clsVideoZoom class to initiate ####
#### the model to read the real-time ####
#### hand movements gesture that enables ####
#### video zoom control. ####
##################################################
import time
import clsVideoZoom as vz
from clsConfig import clsConfig as cf
import datetime
import logging
###############################################
### Global Section ###
###############################################
# Instantiating the base class
x1 = vz.clsVideoZoom()
###############################################
### End of Global Section ###
###############################################
def main():
try:
# Other useful variables
debugInd = 'Y'
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
var1 = datetime.datetime.now()
print('Start Time: ', str(var))
# End of useful variables
# Initiating Log Class
general_log_path = str(cf.conf['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path + 'visualZoom.log', level=logging.INFO)
print('Started Visual-Zoom Emotions!')
r1 = x1.runSensor()
if (r1 == 0):
print('Successfully identified visual zoom!')
else:
print('Failed to identify the visual zoom!')
var2 = datetime.datetime.now()
c = var2 var1
minutes = c.total_seconds() / 60
print('Total difference in minutes: ', str(minutes))
print('End Time: ', str(var1))
except Exception as e:
x = str(e)
print('Error: ', x)
if __name__ == "__main__":
main()

The above lines are self-explanatory. So, I’m not going to discuss anything on this script.


FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

Directory

So, we’ve done it.


You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Oracle procedure using Java

Today, i’m going to discuss another powerful feature of Oracle. That is embedding your Java code inside Oracle Procedures. This gives a lot of flexibility & power to Oracle and certainly you can do plenty of things which generally are very difficult to implement directly.

In this purpose i cannot restrict myself to explanation made by  Bulusu Lakshman and that is –

From Oracle 9i a new environments are taking place where Java and PL/SQL can interact as two major database languages. There are many advantages to using both languages –

PL/SQL Advantage:

  • Intensive Database Access – It is faster than Java.
  • Oracle Specific Functionality that has no equivalent in Java such as using dbms_lock & dbms_alert.
  • Using the same data types and language construct as SQL providing seamless access to the database.

JAVA Advantage:

  • Automatic garbage collection, polymorphism, inheritance, multi-threading
  • Access to system resources outside of the database such as OS commands, files, sockets
  • Functionality not avialable in PL/SQL such as OS Commands, fine-grained security policies, image generation, easy sending of e-mails with attachements using JavaMail.

But, i dis-agree with him in case of fine grained security policies as Oracle has drastically improves it and introduces security policies like – VPDB (Virtual Private Database) & Database Vault. Anyway, we’ll discuss these topics on some other day.

For better understanding i’m follow categories and we will explore them one by one. Hope you get some basic idea on this powerful feature by Oracle.

Before proceed we have to know the basics of the main ingredients called dbms_java .

We’ve to prepare the environment.

In Sys,

sys@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.00
sys@ORCL>
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.lang.RuntimePermission','writeFileDescriptor','');

PL/SQL procedure successfully completed.

Elapsed: 00:00:53.54
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.lang.RuntimePermission','readFileDescriptor','');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
sys@ORCL>
sys@ORCL>exec dbms_java.grant_permission('SCOTT','SYS:java.io.FilePermission','D:\Java_Output\*.*','read,write,execute,delete');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
sys@ORCL>

Let’s concentrate on our test cases.

In Scott,

Type: 1

scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:02.77
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace and compile java source named "Print_Hello"
2 as
3 import java.io.*;
4 public class Print_Hello
5 {
6 public static void dislay()
7 {
8 System.out.println("Hello World...... In Java Through Oracle....... ");
9 }
10 };
11 /

Java created.

Elapsed: 00:00:44.17
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure java_print
2 as
3 language java name 'Print_Hello.dislay()';
4 /

Procedure created.

Elapsed: 00:00:01.39
scott@ORCL>
scott@ORCL>call dbms_java.set_output(1000000);

Call completed.

Elapsed: 00:00:00.34
scott@ORCL>
scott@ORCL>set serveroutput on size 1000000;
scott@ORCL>
scott@ORCL>exec java_print;
Hello World...... In Java Through Oracle.......

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.22
scott@ORCL>

Type: 2 (Returning Value from JAVA)

scott@ORCL>
scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.13
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace and resolve java source named "ReturnVal"
2 as
3 import java.io.*;
4
5 public class ReturnVal extends Object
6 {
7 public static String Display()
8 throws IOException
9 {
10 return "Hello World";
11 }
12 };
13 /

Java created.

Elapsed: 00:00:00.22
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace function ReturnVal
2 return varchar2
3 is
4 language java
5 name 'ReturnVal.Display() return String';
6 /

Function created.

Elapsed: 00:00:00.00
scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(1000000);

Call completed.

Elapsed: 00:00:00.00
scott@ORCL>
scott@ORCL>
scott@ORCL>column ReturnVal format a15
scott@ORCL>
scott@ORCL>
scott@ORCL>
scott@ORCL>
scott@ORCL>select ReturnVal from dual;

RETURNVAL
---------------
Hello World

Elapsed: 00:00:00.12
scott@ORCL>
scott@ORCL>

So, you can return the value from the compiled Java source, too.

Type: 3 (Reading console value into JAVA)

scott@ORCL>ed
Wrote file C:\OracleSpoolBuf\BUF.SQL

1 create or replace java source named "ConsoleRead"
2 as
3 import java.io.*;
4 class ConsoleRead
5 {
6 public static void RDisplay(String Det)
7 {
8 String dd = Det;
9 System.out.println("Value Passed In Java Is: " + dd);
10 System.out.println("Exiting from the Java .....");
11 }
12* };
13 /

Java created.

scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure java_UserInput(InputStr in varchar2)
2 as
3 language java
4 name 'ConsoleRead.RDisplay(java.lang.String)';
5 /

Procedure created.

scott@ORCL>

scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

scott@ORCL>
scott@ORCL>
scott@ORCL>set serveroutput on size 100000
scott@ORCL>
scott@ORCL>exec java_UserInput('Satyaki');
Value Passed In Java Is: Satyaki
Exiting from the Java .....

PL/SQL procedure successfully completed.

scott@ORCL>

Type: 4 (Reading file from OS directory using JAVA) 

scott@ORCL>ed
Wrote file C:\OracleSpoolBuf\BUF.SQL

1 create or replace java source named "ReadTextFile"
2 as
3 import java.io.*;
4 class ReadTextFile
5 {
6 public static void Process(String FileName) throws IOException
7 {
8 int i;
9 FileInputStream fin;
10 try
11 {
12 fin = new FileInputStream(FileName);
13 }
14 catch(FileNotFoundException e)
15 {
16 System.out.println("File Not Found....");
17 return;
18 }
19 catch(ArrayIndexOutOfBoundsException e)
20 {
21 System.out.println("Usage: showFile File");
22 return;
23 }
24 do
25 {
26 i = fin.read();
27 if(i != 1)
28 System.out.println((char) i);
29 }while(i != 1);
30 fin.close();
31 }
32* };
33 /

Java created.

scott@ORCL>
scott@ORCL>create or replace procedure Java_ReadTextFile(FileNameWithPath in varchar2)
2 as
3 language java
4 name 'ReadTextFile.Process(java.lang.String)';
5 /

Procedure created.

scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

scott@ORCL>
scott@ORCL>
scott@ORCL>exec Java_ReadTextFile('D:\Java_Output\Trial.txt');

Type: 4 (Writing file in  OS directory using JAVA)


In Scott,

scott@ORCL>
scott@ORCL>create or replace java source named "DynWriteTextFile"
2 as
3 import java.io.*;
4 class DynWriteTextFile
5 {
6 public static void proc(String ctent,String FlNameWithPath) throws IOException
7 {
8 int i,j;
9 String FileNm = FlNameWithPath;
10 RandomAccessFile rFile;
11
12 try
13 {
14 rFile = new RandomAccessFile(FileNm,"rw");
15 }
16 catch(FileNotFoundException e)
17 {
18 System.out.println("Error Writing Output File....");
19 return;
20 }
21
22 try
23 {
24 int ch;
25
26 System.out.println("Processing starts...");
27
28 ch = ctent.length();
29
30 rFile.seek(rFile.length());
31 for(int k=0; k<ch; k=k+ctent.length())
32 {
33 rFile.writeBytes(ctent);
34 }
35 }
36 catch(IOException e)
37 {
38 System.out.println("File Error....");
39 }
40 finally
41 {
42 try
43 {
44 System.out.println("Successfully file generated....");
45 rFile.close();
46 }
47 catch(IOException oe)
48 {
49 System.out.println("Exception in the catch block of finally is: " +oe);
50 System.exit(0);
51 }
52 }
53 }
54 };
55 /

Java created.

Elapsed: 00:00:00.17
scott@ORCL>
scott@ORCL>
scott@ORCL>create or replace procedure JavaDyn_WriteTextFile(para in varchar2,FileNameWithPath in varchar2)
2 as
3 language JAVA
4 name 'DynWriteTextFile.proc(java.lang.String, java.lang.String)';
5 /

Procedure created.

Elapsed: 00:00:00.15
scott@ORCL>

In Sys,

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

sys@ORCL>set timi on
sys@ORCL>
sys@ORCL>
sys@ORCL>create or replace public synonym dbms_dwrite_file for scott.JavaDyn_WriteTextFile;

Synonym created.

Elapsed: 00:00:00.08
sys@ORCL>
sys@ORCL>grant execute on dbms_dwrite_file to scott;

Grant succeeded.

Elapsed: 00:00:00.18
sys@ORCL>
 
In Scott,  

scott@ORCL>
scott@ORCL>create or replace procedure DWrite_Content(dt in date,FileNmWithPath in varchar2)
2 is
3 cursor c1
4 is
5 select empno,ename,sal
6 from emp
7 where hiredate = dt;
8 r1 c1%rowtype;
9
10 str varchar2(500);
11 begin
12 str:= replace(FileNmWithPath,'\','\\');
13 dbms_dwrite_file('Employee No'||' '||'First Name'||' '||'Salary',str);
14 dbms_dwrite_file(chr(10),str);
15 dbms_dwrite_file('---------------------------------------------------',str);
16 dbms_dwrite_file(chr(10),str);
17 for r1 in c1
18 loop
19 dbms_dwrite_file(r1.empno||' '||r1.ename||' '||r1.sal,str);
20 dbms_dwrite_file(chr(10),str);
21 end loop;
22 exception
23 when others then
24 dbms_output.put_line(sqlerrm);
25 end;
26 /

Procedure created.

Elapsed: 00:00:00.43
scott@ORCL>
scott@ORCL>
scott@ORCL>call dbms_java.set_output(100000);

Call completed.

Elapsed: 00:00:00.02
scott@ORCL>
scott@ORCL>exec DWrite_Content(to_date('21-JUN-1999','DD-MON-YYYY'),'D:\Java_Output\satyaki.txt');
Processing starts...
Successfully file generated....

PL/SQL procedure successfully completed.
 
Hope, this thread will give you some basic idea about using your Java code with Oracle PL/SQL.
I’ll discuss another topic very soon. Till then – Keep following. 😉
Regards.