computervision Archives

Realtime reading from a Streaming using Computer Vision

Posted on July 26, 2022 by SatyakiDe in api, Azure, call, cloud, code, Computer-Vision, computing, Crossplatform, Data Science, exposure, extends, function, gui, IoT, json, Keras, machine-learning, matplotlib, mobile, Model, numpy, objects, Open-CV, Pandas, pytesseract, Python, Real-time, snippet, video

This week we’re going to extend one of our earlier posts & trying to read an entire text from streaming using computer vision. If you want to view the previous post, please click the following link.

But, before we proceed, why don’t we view the demo first?

Demo

Architecture:

Let us understand the architecture flow –

The above diagram shows that the application, which uses the Open-CV, analyzes individual frames from the source & extracts the complete text within the video & displays it on top of the target screen besides prints the same in the console.

Python Packages:

pip install imutils==0.5.4
pip install matplotlib==3.5.2
pip install numpy==1.21.6
pip install opencv-contrib-python==4.6.0.66
pip install opencv-contrib-python-headless==4.6.0.66
pip install opencv-python==4.6.0.66
pip install opencv-python-headless==4.6.0.66
pip install pandas==1.3.5
pip install Pillow==9.1.1
pip install pytesseract==0.3.9
pip install python-dateutil==2.8.2

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

clsReadingTextFromStream.py (This is the main class of python script that will extract the text from the WebCAM streaming in real-time.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 22-Jul-2022 ####
	#### Modified On 25-Jul-2022 ####
	#### ####
	#### Objective: This is the main class of ####
	#### python script that will invoke the ####
	#### extraction of texts from a WebCAM. ####
	#### ####
	##################################################

	# Importing necessary packages
	from clsConfig import clsConfig as cf

	from imutils.object_detection import non_max_suppression
	import numpy as np
	import pytesseract
	import imutils
	import time
	import cv2
	import time

	###############################################
	### Global Section ###
	###############################################

	# Two output layer names for the text detector model

	lNames = cf.conf['LAYER_DET']

	# Tesseract OCR text param values

	strVal = "-l " + str(cf.conf['LANG']) + " –oem " + str(cf.conf['OEM_VAL']) + " –psm " + str(cf.conf['PSM_VAL']) + ""
	config = (strVal)

	###############################################
	### End of Global Section ###
	###############################################

	class clsReadingTextFromStream:
	def __init__(self):
	self.sep = str(cf.conf['SEP'])
	self.Curr_Path = str(cf.conf['INIT_PATH'])
	self.CacheL = int(cf.conf['CACHE_LIM'])
	self.modelPath = str(cf.conf['MODEL_PATH']) + str(cf.conf['MODEL_FILE_NAME'])
	self.minConf = float(cf.conf['MIN_CONFIDENCE'])
	self.wt = int(cf.conf['WIDTH'])
	self.ht = int(cf.conf['HEIGHT'])
	self.pad = float(cf.conf['PADDING'])
	self.title = str(cf.conf['TITLE'])
	self.Otitle = str(cf.conf['ORIG_TITLE'])
	self.drawTag = cf.conf['DRAW_TAG']
	self.aRange = int(cf.conf['ASCII_RANGE'])
	self.sParam = cf.conf['SUBTRACT_PARAM']

	def findBoundBox(self, boxes, res, rW, rH, orig, origW, origH, pad):
	try:
	# Loop over the bounding boxes
	for (spX, spY, epX, epY) in boxes:
	# Scale the bounding box coordinates based on the respective
	# ratios
	spX = int(spX * rW)
	spY = int(spY * rH)
	epX = int(epX * rW)
	epY = int(epY * rH)

	# To obtain a better OCR of the text we can potentially
	# apply a bit of padding surrounding the bounding box.
	# And, computing the deltas in both the x and y directions
	dX = int((epX – spX) * pad)
	dY = int((epY – spY) * pad)

	# Apply padding to each side of the bounding box, respectively
	spX = max(0, spX – dX)
	spY = max(0, spY – dY)
	epX = min(origW, epX + (dX * 2))
	epY = min(origH, epY + (dY * 2))

	# Extract the actual padded ROI
	roi = orig[spY:epY, spX:epX]

	# Choose the proper OCR Config
	text = pytesseract.image_to_string(roi, config=config)

	# Add the bounding box coordinates and OCR'd text to the list
	# of results
	res.append(((spX, spY, epX, epY), text))

	# Sort the results bounding box coordinates from top to bottom
	res = sorted(res, key=lambda r:r[0][1])

	return res
	except Exception as e:
	x = str(e)
	print(x)

	return res

	def predictText(self, imgScore, imgGeo):
	try:
	minConf = self.minConf

	# Initializing the bounding box rectangles & confidence score by
	# extracting the rows & columns from the imgScore volume.
	(numRows, numCols) = imgScore.shape[2:4]
	rects = []
	confScore = []

	for y in range(0, numRows):
	# Extract the imgScore probabilities to derive potential
	# bounding box coordinates that surround text
	imgScoreData = imgScore[0, 0, y]
	xVal0 = imgGeo[0, 0, y]
	xVal1 = imgGeo[0, 1, y]
	xVal2 = imgGeo[0, 2, y]
	xVal3 = imgGeo[0, 3, y]
	anglesData = imgGeo[0, 4, y]

	for x in range(0, numCols):
	# If our score does not have sufficient probability,
	# ignore it
	if imgScoreData[x] < minConf:
	continue

	# Compute the offset factor as our resulting feature
	# maps will be 4x smaller than the input frame
	(offX, offY) = (x * 4.0, y * 4.0)

	# Extract the rotation angle for the prediction and
	# then compute the sin and cosine
	angle = anglesData[x]
	cos = np.cos(angle)
	sin = np.sin(angle)

	# Derive the width and height of the bounding box from
	# imgGeo
	h = xVal0[x] + xVal2[x]
	w = xVal1[x] + xVal3[x]

	# Compute both the starting and ending (x, y)-coordinates
	# for the text prediction bounding box
	epX = int(offX + (cos * xVal1[x]) + (sin * xVal2[x]))
	epY = int(offY – (sin * xVal1[x]) + (cos * xVal2[x]))
	spX = int(epX – w)
	spY = int(epY – h)

	# Adding bounding box coordinates and probability score
	# to the respective lists
	rects.append((spX, spY, epX, epY))
	confScore.append(imgScoreData[x])

	# return a tuple of the bounding boxes and associated confScore
	return (rects, confScore)

	except Exception as e:
	x = str(e)
	print(x)

	rects = []
	confScore = []

	return (rects, confScore)

	def processStream(self, debugInd, var):
	try:
	sep = self.sep
	Curr_Path = self.Curr_Path
	CacheL = self.CacheL
	modelPath = self.modelPath
	minConf = self.minConf
	wt = self.wt
	ht = self.ht
	pad = self.pad
	title = self.title
	Otitle = self.Otitle
	drawTag = self.drawTag
	aRange = self.aRange
	sParam = self.sParam

	val = 0

	# Initialize the video stream and allow the camera sensor to warm up
	print("[INFO] Starting video stream…")
	cap = cv2.VideoCapture(0)

	# Loading the pre-trained text detector
	print("[INFO] Loading Text Detector…")
	net = cv2.dnn.readNet(modelPath)

	# Loop over the frames from the video stream
	while True:
	try:
	# Grab the frame from our video stream and resize it
	success, frame = cap.read()

	orig = frame.copy()
	(origH, origW) = frame.shape[:2]

	# Setting new width and height and then determine the ratio in change
	# for both the width and height
	(newW, newH) = (wt, ht)
	rW = origW / float(newW)
	rH = origH / float(newH)

	# Resize the frame and grab the new frame dimensions
	frame = cv2.resize(frame, (newW, newH))
	(H, W) = frame.shape[:2]

	# Construct a blob from the frame and then perform a forward pass of
	# the model to obtain the two output layer sets
	blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
	net.setInput(blob)
	(confScore, imgGeo) = net.forward(lNames)

	# Decode the predictions, then apply non-maxima suppression to
	# suppress weak, overlapping bounding boxes
	(rects, confidences) = self.predictText(confScore, imgGeo)
	boxes = non_max_suppression(np.array(rects), probs=confidences)

	# Initialize the list of results
	res = []

	# Getting BoundingBox boundaries
	res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)

	for ((spX, spY, epX, epY), text) in res:
	# Display the text OCR by using Tesseract APIs
	print("Reading Text::")
	print("=" *60)
	print(text)
	print("=" *60)

	# Removing the non-ASCII text so it can draw the text on the frame
	# using OpenCV, then draw the text and a bounding box surrounding
	# the text region of the input frame
	text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
	output = orig.copy()

	cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
	cv2.putText(output, text, (spX, spY – 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)

	# Show the output frame
	cv2.imshow(title, output)
	#cv2.imshow(Otitle, frame)

	# If the `q` key was pressed, break from the loop
	if cv2.waitKey(1) == ord('q'):
	break

	val = 0

	except Exception as e:
	x = str(e)
	print(x)

	val = 1

	# Performing cleanup at the end
	cap.release()
	cv2.destroyAllWindows()

	return val
	except Exception as e:
	x = str(e)
	print('Error:', x)

	return 1

view raw

clsReadingTextFromStream.py

hosted with ❤ by GitHub

Please find the key snippet from the above script –

# Two output layer names for the text detector model

lNames = cf.conf['LAYER_DET']

# Tesseract OCR text param values

strVal = "-l " + str(cf.conf['LANG']) + " --oem " + str(cf.conf['OEM_VAL']) + " --psm " + str(cf.conf['PSM_VAL']) + ""
config = (strVal)

The first line contains the two output layers’ names for the text detector model. Among them, the first one indicates the outcome possibilities & the second one use to derive the bounding box coordinates of the predicted text.

The second line contains various options for the tesseract APIs. You need to understand the opportunities in detail to make them work. These are the essential options for our use case –

Language – The intended language, for example, English, Spanish, Hindi, Bengali, etc.
OEM flag – In this case, the application will use 4 to indicate LSTM neural net model for OCR.
OEM Value – In this case, the selected value is 7, indicating that the application treats the ROI as a single line of text.

For more details, please refer to the config file.

print("[INFO] Loading Text Detector...")
net = cv2.dnn.readNet(modelPath)

The above lines bring the already created model & load it to memory for evaluation.

# Setting new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (wt, ht)
rW = origW / float(newW)
rH = origH / float(newH)

# Resize the frame and grab the new frame dimensions
frame = cv2.resize(frame, (newW, newH))
(H, W) = frame.shape[:2]

# Construct a blob from the frame and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
net.setInput(blob)
(confScore, imgGeo) = net.forward(lNames)

# Decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = self.predictText(confScore, imgGeo)
boxes = non_max_suppression(np.array(rects), probs=confidences)

The above lines are more of preparing individual frames to get the bounding box by resizing the height & width followed by a forward pass of the model to obtain two output layer sets. And then apply the non-maxima suppression to remove the weak, overlapping bounding box by interpreting the prediction. In short, this will identify the potential text region & put the bounding box surrounding it.

# Initialize the list of results
res = []

# Getting BoundingBox boundaries
res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)

The above function will create the bounding box surrounding the predicted text regions. Also, we will capture the expected text inside the result variable.

for (spX, spY, epX, epY) in boxes:
  # Scale the bounding box coordinates based on the respective
  # ratios
  spX = int(spX * rW)
  spY = int(spY * rH)
  epX = int(epX * rW)
  epY = int(epY * rH)

  # To obtain a better OCR of the text we can potentially
  # apply a bit of padding surrounding the bounding box.
  # And, computing the deltas in both the x and y directions
  dX = int((epX - spX) * pad)
  dY = int((epY - spY) * pad)

  # Apply padding to each side of the bounding box, respectively
  spX = max(0, spX - dX)
  spY = max(0, spY - dY)
  epX = min(origW, epX + (dX * 2))
  epY = min(origH, epY + (dY * 2))

  # Extract the actual padded ROI
  roi = orig[spY:epY, spX:epX]

Now, the application will scale the bounding boxes based on the previously computed ratio for actual text recognition. In this process, the application also padded the bounding boxes & then extracted the padded region of interest.

# Choose the proper OCR Config
text = pytesseract.image_to_string(roi, config=config)

# Add the bounding box coordinates and OCR'd text to the list
# of results
res.append(((spX, spY, epX, epY), text))

Using OCR options, the application extracts the text within the video frame & adds that to the res list.

# Sort the results bounding box coordinates from top to bottom
res = sorted(res, key=lambda r:r[0][1])

It then sends a sorted output to the primary calling functions.

for ((spX, spY, epX, epY), text) in res:
  # Display the text OCR by using Tesseract APIs
  print("Reading Text::")
  print("=" *60)
  print(text)
  print("=" *60)

  # Removing the non-ASCII text so it can draw the text on the frame
  # using OpenCV, then draw the text and a bounding box surrounding
  # the text region of the input frame
  text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
  output = orig.copy()

  cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
  cv2.putText(output, text, (spX, spY - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)

  # Show the output frame
  cv2.imshow(title, output)

Finally, it fetches the potential text region along with the text & then prints on top of the source video. Also, it removed some non-printable characters during this time to avoid any cryptic texts.

readingVideo.py (Main calling script.)

	#####################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 22-Jul-2022 ####
	#### Modified On 25-Jul-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsReadingTextFromStream class to initiate ####
	#### the reading capability in real-time ####
	#### & display text via Web-CAM. ####
	#####################################################

	# We keep the setup code in a different class as shown below.
	import clsReadingTextFromStream as rtfs

	from clsConfig import clsConfig as cf

	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating all the main class

	x1 = rtfs.clsReadingTextFromStream()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'readingTextFromVideo.log', level=logging.INFO)

	print('Started reading text from videos!')

	# Execute all the pass
	r1 = x1.processStream(debugInd, var)

	if (r1 == 0):
	print('Successfully read text from the Live Stream!')
	else:
	print('Failed to read text from the Live Stream!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

readingVideo.py

hosted with ❤ by GitHub

Please find the key snippet –

# Instantiating all the main class

x1 = rtfs.clsReadingTextFromStream()

# Execute all the pass
r1 = x1.processStream(debugInd, var)

if (r1 == 0):
    print('Successfully read text from the Live Stream!')
else:
    print('Failed to read text from the Live Stream!')

The above lines instantiate the main calling class & then invoke the function to get the desired extracted text from the live streaming video if that is successful.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

You will get the complete codebase in the following Github link.

Unfortunately, I cannot upload the model due to it’s size. I will share on the need basis.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Real-time stacked-up coin counts with the help of Computer Vision using Python-based OpenCV.

Posted on March 21, 2022 by SatyakiDe in api, Azure, BOT, cloud, code, Computer-Vision, computing, Crossplatform, function, gui, IoT, json, machine-learning, mobile, Model, numpy, Open-CV, order by, Pandas, Python, Real-time, select, snippet, Technology, video

Hi Guys,

Today, I’ll be using another exciting installment of Computer Vision. Today, our focus will be to get a sense of visual counting. Let me explain. This post will demonstrate how to count the number of stacked-up coins using computer vision. And, we’re going to add more coins to see the number changes.

Why don’t we see the demo first before jumping into the technical details?

Isn’t it exciting?

Architecture:

Let us understand the architecture –

From the above diagram, one can notice that as raw video feed captured from a specific location at a measured distance. The python-based intelligent application will read the numbers & project on top of the video feed for human validations.

Let me share one more perspective of how you can configure this experiment with another diagram that I prepared for this post.

From the above picture, one can see that a specific distance exists between the camera & the stacked coins as that will influence the single coin width.

You can see how that changed with the following pictures –

This entire test will depend upon many factors to consider to get effective results. I provided the basic demo. However, to make it robust & dynamic, one can dynamically diagnose the distance & individual coin width before starting this project. I felt that part should be machine learning to correctly predict the particular coin width depending upon the length & number of coins stacked. I leave it to you to explore that part.

Then how does the Aruco marker comes into the picture?

Let’s read it from the primary source side –

Please refer to the following link if you want to know more.

For our use case, we’ll be using the following aruco marker –

How will this help us? Because we know the width & height of it. And depending upon the placement & overall pixel area size, our application can then identify the pixel to centimeter ratio & which will enable us to predict any other objects’ height & width. Once we have that, the application will divide that by the calculated width we observed for each coin from this distance. And, then the application will be able to predict the actual counts in real-time.

How can you identify the individual width?

My easy process would be to put ten quarter dollars stacked up & then you will get the height from the Computer vision. You have to divide that height by 10 to get the individual width of the coin until you build the model to predict the correct width depending upon the distance.

CODE:

Let us understand the code now –

clsConfig.py (Configuration file for the entire application.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 28-Dec-2021 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'FILE_NAME': Curr_Path + sep + 'Image' + sep + 'Orig.jpeg',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'APP_DESC_1': 'Old Video Enhancement!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR': 'data',
	'SEP': sep,
	'COIN_DEF_HEIGHT':0.22,
	'PIC_TO_CM_MAP': 15.24,
	'CONTOUR_AREA': 2000
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

'COIN_DEF_HEIGHT':0.22,
'PIC_TO_CM_MAP': 15.24,
'CONTOUR_AREA': 2000

The above entries are the important for us.

PIC_TO_CM_MAP is the total length of the Aruco marker in centimeters involving all four sides.
CONTOUR_AREA will change depending upon the minimum size you want to identify as part of the contour.
COIN_DEF_HEIGHT needs to be revised as part of the previous steps explained.

clsAutoDetector.py (This python script will detect the contour.)

	###############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 17-Jan-2022 ####
	#### Modified On 20-Mar-2022 ####
	#### ####
	#### Objective: This python script will ####
	#### auto-detects the contours of an image ####
	#### using grayscale conversion & then ####
	#### share the contours details to the ####
	#### calling class. ####
	###############################################

	import cv2
	from clsConfig import clsConfig as cf

	class clsAutoDetector():
	def __init__(self):
	self.cntArea = int(cf.conf['CONTOUR_AREA'])

	def detectObjects(self, frame):
	try:
	cntArea = self.cntArea

	# Convert Image to grayscale Image
	grayImage = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

	# Create a Mask with adaptive threshold
	maskImage = cv2.adaptiveThreshold(grayImage, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 19, 5)

	cv2.imshow("Masked-Image", maskImage)

	# Find contours
	conts, Oth = cv2.findContours(maskImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

	objectsConts = []

	for cnt in conts:
	area = cv2.contourArea(cnt)
	if area > cntArea:
	objectsConts.append(cnt)

	return objectsConts

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	objectsConts = []

	return objectsConts

view raw

clsAutoDetector.py

hosted with ❤ by GitHub

Key snippets from the above script are as follows –

# Find contours
conts, Oth = cv2.findContours(maskImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

objectsConts = []

for cnt in conts:
    area = cv2.contourArea(cnt)
    if area > cntArea:
        objectsConts.append(cnt)

Depending upon the supplied contour area, this script will identify & mark the contour of every frame captured through WebCam.

clsCountRealtime.py (This is the main class to calculate the number of stacked coins after reading using computer vision.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 17-Jan-2022 ####
	#### Modified On 20-Mar-2022 ####
	#### ####
	#### Objective: This python class will ####
	#### learn the number of coins stacks on ####
	#### top of another using computer vision ####
	#### with the help from Open-CV after ####
	#### manually recalibarting the initial ####
	#### data (Individual Coin Heights needs to ####
	#### adjust based on the distance of camera.) ####
	##################################################

	import cv2
	from clsAutoDetector import *
	import numpy as np
	import os
	import platform as pl

	# Custom Class
	from clsConfig import clsConfig as cf
	import clsL as cl

	# Initiating Log class
	l = cl.clsL()

	# Load Aruco detector
	arucoParams = cv2.aruco.DetectorParameters_create()
	arucoDict = cv2.aruco.Dictionary_get(cv2.aruco.DICT_5X5_50)

	# Load Object Detector
	detector = clsAutoDetector()

	class clsCountRealtime:
	def __init__(self):
	self.sep = str(cf.conf['SEP'])
	self.Curr_Path = str(cf.conf['INIT_PATH'])
	self.coinDefH = float(cf.conf['COIN_DEF_HEIGHT'])
	self.pics2cm = float(cf.conf['PIC_TO_CM_MAP'])

	def learnStats(self, debugInd, var):
	try:
	# Per Coin Default Size from the known distance_to_camera
	coinDefH = self.coinDefH
	pics2cm = self.pics2cm

	# Load Cap
	cap = cv2.VideoCapture(0)
	cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
	cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

	while True:
	success, img = cap.read()

	if success == False:
	break

	# Get Aruco marker
	imgCorners, a, b = cv2.aruco.detectMarkers(img, arucoDict, parameters=arucoParams)
	if imgCorners:

	# Draw polygon around the marker
	imgCornersInt = np.int0(imgCorners)
	cv2.polylines(img, imgCornersInt, True, (0, 255, 0), 5)

	# Aruco Perimeter
	arucoPerimeter = cv2.arcLength(imgCornersInt[0], True)

	# Pixel to cm ratio
	pixelCMRatio = arucoPerimeter / pics2cm

	contours = detector.detectObjects(img)

	# Draw objects boundaries
	for cnt in contours:
	# Get rect
	rect = cv2.boundingRect(cnt)
	(x, y, w, h) = rect

	print(''60)
	print('Width Pixel: ')
	print(str(w))
	print('Height Pixel: ')
	print(str(h))

	# Get Width and Height of the Objects by applying the Ratio pixel to cm
	objWidth = round(w / pixelCMRatio, 1)
	objHeight = round(h / pixelCMRatio, 1)

	cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)

	cv2.putText(img, "Width {} cm".format(objWidth), (int(x – 100), int(y – 20)), cv2.FONT_HERSHEY_PLAIN, 2, (100, 200, 0), 2)
	cv2.putText(img, "Height {} cm".format(objHeight), (int(x – 100), int(y + 15)), cv2.FONT_HERSHEY_PLAIN, 2, (100, 200, 0), 2)

	NoOfCoins = round(objHeight / coinDefH)

	cv2.putText(img, "No Of Coins: {}".format(NoOfCoins), (int(x – 100), int(y + 35)), cv2.FONT_HERSHEY_PLAIN, 2, (250, 0, 250), 2)

	print('Final Height: ')
	print(str(objHeight))

	print('No Of Coins: ')
	print(str(NoOfCoins))

	cv2.imshow("Image", img)

	if cv2.waitKey(1) & 0xFF == ord('q'):
	break

	cap.release()
	cv2.destroyAllWindows()

	return 0
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

view raw

clsCountRealtime

hosted with ❤ by GitHub

Some of the key snippets from this script –

# Aruco Perimeter
arucoPerimeter = cv2.arcLength(imgCornersInt[0], True)

# Pixel to cm ratio
pixelCMRatio = arucoPerimeter / pics2cm

The above lines will extract the critical auroco perimeter & then the ratio between pixel against centimeters.

contours = detector.detectObjects(img)

The application detects the contours of each frame from the previous class, which will be used here.

# Draw objects boundaries
for cnt in contours:
    # Get rect
    rect = cv2.boundingRect(cnt)
    (x, y, w, h) = rect

In this step, the application will draw the object contours & also capture the center points, along with the height & width of the identified objects.

# Get Width and Height of the Objects by applying the Ratio pixel to cm
objWidth = round(w / pixelCMRatio, 1)
objHeight = round(h / pixelCMRatio, 1)

Finally, identify the width & height of the contoured object in centimeters.

cv2.putText(img, "Width {} cm".format(objWidth), (int(x - 100), int(y - 20)), cv2.FONT_HERSHEY_PLAIN, 2, (100, 200, 0), 2)
cv2.putText(img, "Height {} cm".format(objHeight), (int(x - 100), int(y + 15)), cv2.FONT_HERSHEY_PLAIN, 2, (100, 200, 0), 2)

NoOfCoins = round(objHeight / coinDefH)

cv2.putText(img, "No Of Coins: {}".format(NoOfCoins), (int(x - 100), int(y + 35)), cv2.FONT_HERSHEY_PLAIN, 2, (250, 0, 250), 2)

It displays both the height, width & total number of coins on top of the live video.

if cv2.waitKey(1) & 0xFF == ord('q'):
    break

The above line will help the developer exit from the visual application by pressing the escape or ‘q’ key in Macbook.

visualDataRead.py (Main calling function.)

	###############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 17-Jan-2022 ####
	#### Modified On 20-Mar-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsCountRealtime class to initiate ####
	#### the model to read the real-time ####
	#### stckaed-up coins & share the actual ####
	#### numbers on top of the video feed. ####
	###############################################

	# We keep the setup code in a different class as shown below.
	import clsCountRealtime as ar
	from clsConfig import clsConfig as cf

	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating all the three classes

	x1 = ar.clsCountRealtime()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'restoreVideo.log', level=logging.INFO)

	print('Started Capturing Real-Time Coin Counts!')

	# Execute all the pass
	r1 = x1.learnStats(debugInd, var)

	if (r1 == 0):
	print('Successfully counts number of stcaked coins!')
	else:
	print('Failed to counts number of stcaked coins!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

visualDataRead.py

hosted with ❤ by GitHub

And, the key snippet from the above script –

x1 = ar.clsCountRealtime()

The application instantiates the main class.

# Execute all the pass
r1 = x1.learnStats(debugInd, var)

if (r1 == 0):
    print('Successfully counts number of stcaked coins!')
else:
    print('Failed to counts number of stcaked coins!')

The above code invokes the learnStats function to calculate the count of stacked coins.

FOLDER STRUCTURE:

So, we’ve done it.

You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 😀

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Tag: computervision

Real-time stacked-up coin counts with the help of Computer Vision using Python-based OpenCV.

Like this:

Share this:

Like this:

Share this:

Like this: