prediction Archives

Detecting real-time human emotions using Open-CV, DeepFace & Python

Posted on April 23, 2022April 23, 2022 by SatyakiDe in analytic function, api, Azure, cloud, code, Computer-Vision, computing, Crossplatform, design, emotion, features, feel, function, gui, human, json, machine-learning, matplotlib, Model, Open-CV, pattern matching, Python, Real-time, regex, return, snippet, Technology, video, voice

Hi Guys,

Today, I’ll be using another exciting installment of Computer Vision. Our focus will be on getting a sense of human emotions. Let me explain. This post will demonstrate how to read/detect human emotions by analyzing computer vision videos. We will be using part of a Bengali Movie called “Ganashatru (An enemy of the people)” entirely for educational purposes & also as a tribute to the great legendary director late Satyajit Roy. To know more about him, please click the following link.

Why don’t we see the demo first before jumping into the technical details?

Demo

Architecture:

Let us understand the architecture –

From the above diagram, one can see that the application, which uses both the Open-CV & DeepFace, analyzes individual frames from the source. Then predicts the emotions & adds the label in the target B&W frames. Finally, it creates another video by correctly mixing the source audio.

Python Packages:

Following are the python packages that are necessary to develop this brilliant use case –

pip install deepface
pip install opencv-python
pip install ffpyplayer

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

clsConfig.py (This script will play the video along with audio in sync.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 22-Apr-2022 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'FILE_NAME': 'GonoshotruClimax',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'FINAL_PATH': Curr_Path + sep + 'Target' + sep,
	'APP_DESC_1': 'Video Emotion Capture!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR': 'data',
	'SEP': sep,
	'VIDEO_FILE_EXTN': '.mp4',
	'AUDIO_FILE_EXTN': '.mp3',
	'IMAGE_FILE_EXTN': '.jpg',
	'TITLE': "Gonoshotru – Emotional Analysis"
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

All the above inputs are generic & used as normal parameters.

clsFaceEmotionDetect.py (This python class will track the human emotions after splitting the audio from the video & put that label on top of the video frame.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 17-Apr-2022 ####
	#### Modified On 20-Apr-2022 ####
	#### ####
	#### Objective: This python class will ####
	#### track the human emotions after splitting ####
	#### the audio from the video & put that ####
	#### label on top of the video frame. ####
	#### ####
	##################################################

	from imutils.video import FileVideoStream
	from imutils.video import FPS
	import numpy as np
	import imutils
	import time
	import cv2

	from clsConfig import clsConfig as cf
	from deepface import DeepFace
	import clsL as cl

	import subprocess
	import sys
	import os

	# Initiating Log class
	l = cl.clsL()

	class clsFaceEmotionDetect:
	def __init__(self):
	self.sep = str(cf.conf['SEP'])
	self.Curr_Path = str(cf.conf['INIT_PATH'])
	self.FileName = str(cf.conf['FILE_NAME'])
	self.VideoFileExtn = str(cf.conf['VIDEO_FILE_EXTN'])
	self.ImageFileExtn = str(cf.conf['IMAGE_FILE_EXTN'])

	def convert_video_to_audio_ffmpeg(self, video_file, output_ext="mp3"):
	try:
	"""Converts video to audio directly using `ffmpeg` command
	with the help of subprocess module"""
	filename, ext = os.path.splitext(video_file)
	subprocess.call(["ffmpeg", "-y", "-i", video_file, f"{filename}.{output_ext}"],
	stdout=subprocess.DEVNULL,
	stderr=subprocess.STDOUT)

	return 0
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

	def readEmotion(self, debugInd, var):
	try:
	sep = self.sep
	Curr_Path = self.Curr_Path
	FileName = self.FileName
	VideoFileExtn = self.VideoFileExtn
	ImageFileExtn = self.ImageFileExtn
	font = cv2.FONT_HERSHEY_SIMPLEX

	# Load Video
	videoFile = Curr_Path + sep + 'Video' + sep + FileName + VideoFileExtn
	temp_path = Curr_Path + sep + 'Temp' + sep

	# Extracting the audio from the source video
	x = self.convert_video_to_audio_ffmpeg(videoFile)

	if x == 0:
	print('Successfully Audio extracted from the source file!')
	else:
	print('Failed to extract the source audio!')

	# Loading the haarcascade xml class
	faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

	# start the file video stream thread and allow the buffer to
	# start to fill
	print("[INFO] Starting video file thread…")
	fvs = FileVideoStream(videoFile).start()
	time.sleep(1.0)
	cnt = 0

	# start the FPS timer
	fps = FPS().start()

	try:
	# loop over frames from the video file stream
	while fvs.more():

	cnt += 1
	# grab the frame from the threaded video file stream, resize
	# it, and convert it to grayscale (while still retaining 3
	# channels)
	try:
	frame = fvs.read()
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	frame = imutils.resize(frame, width=720)
	cv2.imshow("Gonoshotru – Source", frame)

	# Enforce Detection to False will continue the sequence even when there is no face
	result = DeepFace.analyze(frame, enforce_detection=False, actions = ['emotion'])

	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	frame = np.dstack([frame, frame, frame])

	faces = faceCascade.detectMultiScale(image=frame, scaleFactor=1.1, minNeighbors=4, minSize=(80,80), flags=cv2.CASCADE_SCALE_IMAGE)

	# Draw a rectangle around the face
	for (x, y, w, h) in faces:
	cv2.rectangle(frame, (x, y), (x + w, y + h), (0,255,0), 2)

	# Use puttext method for inserting live emotion on video
	cv2.putText(frame, result['dominant_emotion'], (50,390), font, 3, (0,0,255), 2, cv2.LINE_4)

	# display the size of the queue on the frame
	#cv2.putText(frame, "Queue Size: {}".format(fvs.Q.qsize()), (10, 30), font, 0.6, (0, 255, 0), 2)
	cv2.imwrite(temp_path+'frame-' + str(cnt) + ImageFileExtn, frame)

	# show the frame and update the FPS counter
	cv2.imshow("Gonoshotru – Emotional Analysis", frame)
	fps.update()

	if cv2.waitKey(2) & 0xFF == ord('q'):
	break
	except Exception as e:
	x = str(e)
	print('Error: ', x)
	print('No more frame exists!')

	# stop the timer and display FPS information
	fps.stop()
	print("[INFO] Elasped Time: {:.2f}".format(fps.elapsed()))
	print("[INFO] Approx. FPS: {:.2f}".format(fps.fps()))

	# do a bit of cleanup
	cv2.destroyAllWindows()
	fvs.stop()

	return 0

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

view raw

clsFaceEmotionDetect.py

hosted with ❤ by GitHub

Key snippets from the above scripts –

def convert_video_to_audio_ffmpeg(self, video_file, output_ext="mp3"):
    try:
        """Converts video to audio directly using `ffmpeg` command
        with the help of subprocess module"""
        filename, ext = os.path.splitext(video_file)
        subprocess.call(["ffmpeg", "-y", "-i", video_file, f"{filename}.{output_ext}"],
                        stdout=subprocess.DEVNULL,
                        stderr=subprocess.STDOUT)

        return 0
    except Exception as e:
        x = str(e)
        print('Error: ', x)

        return 1

The above snippet represents an Audio extraction function that will extract the audio from the source file & store it in the specified directory.

# Loading the haarcascade xml class
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

Now, Loading is one of the best classes for face detection, which our applications require.

fvs = FileVideoStream(videoFile).start()

Using FileVideoStream will enable our application to process the video faster than cv2.VideoCapture() method.

# start the FPS timer
fps = FPS().start()

The application then invokes the FPS.Start() that will initiate the FPS timer.

# loop over frames from the video file stream
while fvs.more():

The application will check using fvs.more() to find the EOF of the video file. Until then, it will try to read individual frames.

try:
    frame = fvs.read()
except Exception as e:
    x = str(e)
    print('Error: ', x)

The application will read individual frames. In case of any issue, it will capture the correct error without terminating the main program at the beginning. This exception strategy is beneficial when there is no longer any frame to read & yet due to the end frame issue, the entire application throws an error.

frame = imutils.resize(frame, width=720)
cv2.imshow("Gonoshotru - Source", frame)

At this point, the application is resizing the frame for better resolution & performance. Furthermore, identify this video feed as a source.

# Enforce Detection to False will continue the sequence even when there is no face
result = DeepFace.analyze(frame, enforce_detection=False, actions = ['emotion'])

Finally, the application has used the deepface machine-learning API to analyze the subject face & trying to predict its emotions.

frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = np.dstack([frame, frame, frame])

faces = faceCascade.detectMultiScale(image=frame, scaleFactor=1.1, minNeighbors=4, minSize=(80,80), flags=cv2.CASCADE_SCALE_IMAGE)

detectMultiScale function can use to detect the faces. This function will return a rectangle with coordinates (x, y, w, h) around the detected face.

It takes three common arguments — the input image, scaleFactor, and minNeighbours.

scaleFactor specifies how much the image size reduces with each scale. There may be more faces near the camera in a group photo than others. Naturally, such faces would appear more prominent than the ones behind. This factor compensates for that.

minNeighbours specifies how many neighbors each candidate rectangle should have to retain. One may have to tweak these values to get the best results. This parameter specifies the number of neighbors a rectangle should have to be called a face.

# Draw a rectangle around the face
for (x, y, w, h) in faces:
    cv2.rectangle(frame, (x, y), (x + w, y + h), (0,255,0), 2)

As discussed above, the application is now calculating the square’s boundary after receiving the values of x, y, w, & h.

# Use puttext method for inserting live emotion on video
cv2.putText(frame, result['dominant_emotion'], (50,390), font, 3, (0,0,255), 2, cv2.LINE_4)

Finally, capture the dominant emotion from the deepface API & post it on top of the target video.

# display the size of the queue on the frame
cv2.imwrite(temp_path+'frame-' + str(cnt) + ImageFileExtn, frame)

# show the frame and update the FPS counter
cv2.imshow("Gonoshotru - Emotional Analysis", frame)
fps.update()

Also, writing individual frames into a temporary folder, where later they will be consumed & mixed with the source audio.

if cv2.waitKey(2) & 0xFF == ord('q'):
    break

At any given point, if the user wants to quit, the above snippet will allow them by simply pressing either the escape-button or ‘q’-button from the keyboard.

clsVideoPlay.py (This script will play the video along with audio in sync.)

	###############################################
	#### Updated By: SATYAKI DE ####
	#### Updated On: 17-Apr-2022 ####
	#### ####
	#### Objective: This script will play the ####
	#### video along with audio in sync. ####
	#### ####
	###############################################

	import os
	import platform as pl
	import cv2
	import numpy as np
	import glob
	import re
	import ffmpeg
	import time
	from clsConfig import clsConfig as cf
	from ffpyplayer.player import MediaPlayer

	import logging

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	class clsVideoPlay:
	def __init__(self):
	self.fileNmFin = str(cf.conf['FILE_NAME'])
	self.final_path = str(cf.conf['FINAL_PATH'])
	self.title = str(cf.conf['TITLE'])
	self.VideoFileExtn = str(cf.conf['VIDEO_FILE_EXTN'])

	def videoP(self, file):
	try:
	cap = cv2.VideoCapture(file)
	player = MediaPlayer(file)
	start_time = time.time()

	while cap.isOpened():
	ret, frame = cap.read()
	if not ret:
	break
	_, val = player.get_frame(show=False)
	if val == 'eof':
	break

	cv2.imshow(file, frame)

	elapsed = (time.time() – start_time) * 1000 # msec
	play_time = int(cap.get(cv2.CAP_PROP_POS_MSEC))
	sleep = max(1, int(play_time – elapsed))
	if cv2.waitKey(sleep) & 0xFF == ord("q"):
	break

	player.close_player()
	cap.release()
	cv2.destroyAllWindows()

	return 0
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

	def stream(self, dInd, var):
	try:
	VideoFileExtn = self.VideoFileExtn
	fileNmFin = self.fileNmFin + VideoFileExtn
	final_path = self.final_path
	title = self.title

	FullFileName = final_path + fileNmFin

	ret = self.videoP(FullFileName)

	if ret == 0:
	print('Successfully Played the Video!')

	return 0
	else:
	return 1

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

view raw

clsVideoPlay.py

hosted with ❤ by GitHub

Let us explore the key snippet –

cap = cv2.VideoCapture(file)
player = MediaPlayer(file)

In the above snippet, the application first reads the video & at the same time, it will create an instance of the MediaPlayer.

play_time = int(cap.get(cv2.CAP_PROP_POS_MSEC))

The application uses cv2.CAP_PROP_POS_MSEC to synchronize video and audio.

peopleEmotionRead.py (This is the main calling python script that will invoke the class to initiate the model to read the real-time human emotions from video.)

	##################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 17-Jan-2022 ####
	#### Modified On 20-Apr-2022 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### clsFaceEmotionDetect class to initiate ####
	#### the model to read the real-time ####
	#### human emotions from video or even from ####
	#### Web-CAM & predict it continuously. ####
	##################################################

	# We keep the setup code in a different class as shown below.
	import clsFaceEmotionDetect as fed
	import clsFrame2Video as fv
	import clsVideoPlay as vp

	from clsConfig import clsConfig as cf

	import datetime
	import logging

	###############################################
	### Global Section ###
	###############################################
	# Instantiating all the three classes

	x1 = fed.clsFaceEmotionDetect()
	x2 = fv.clsFrame2Video()
	x3 = vp.clsVideoPlay()

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'restoreVideo.log', level=logging.INFO)

	print('Started Capturing Real-Time Human Emotions!')

	# Execute all the pass
	r1 = x1.readEmotion(debugInd, var)
	r2 = x2.convert2Vid(debugInd, var)
	r3 = x3.stream(debugInd, var)

	if ((r1 == 0) and (r2 == 0) and (r3 == 0)):
	print('Successfully identified human emotions!')
	else:
	print('Failed to identify the human emotions!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total difference in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

peopleEmotionRead.py

hosted with ❤ by GitHub

The key-snippet from the above script are as follows –

# Instantiating all the three classes

x1 = fed.clsFaceEmotionDetect()
x2 = fv.clsFrame2Video()
x3 = vp.clsVideoPlay()

As one can see from the above snippet, all the major classes are instantiated & loaded into the memory.

# Execute all the pass
r1 = x1.readEmotion(debugInd, var)
r2 = x2.convert2Vid(debugInd, var)
r3 = x3.stream(debugInd, var)

All the responses are captured into the corresponding variables, which later check for success status.

Let us capture & compare the emotions in a screenshot for better understanding –

So, one can see that most of the frames from the video & above-posted frame correctly identify the human emotions.

FOLDER STRUCTURE:

Here is the folder structure that contains all the files & directories in MAC O/S –

So, we’ve done it.

You will get the complete codebase in the following Github link.

If you want to know more about this legendary director & his famous work, please visit the following link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.

Neural prophet – The enhanced version of Facebook’s forecasting API

Posted on February 22, 2022February 22, 2022 by SatyakiDe in api, cloud, code, dashboard, Data Science, design, function, IoT, json, machine-learning, matplotlib, neural prophet, Pandas, Python, snippet, sql, Technology

Hi Team,

Today, I’ll be explaining the enhancement of one of the previous posts. I know that I’ve shared the fascinating API named prophet-API, which Facebook developed. One can quickly get more accurate predictions with significantly fewer data points. (If you want to know more about that post, please click on the following link.)

However, there is another enhancement on top of that API, which is more accurate. However, one needs to know – when they should consider using it. So, today, we’ll be talking about the neural prophet API.

But, before we start digging deep, why don’t we view the demo first?

Demo

Let’s visit a diagram. That way, you can understand where you can use it. Also, I’ll be sharing some of the links from the original site for better information mining.

**Source: Neural Prophet (Official Site)**

As one can see, this API is trying to bridge between the different groups & it enables the time-series computation efficiently.

WHERE TO USE:

Let’s visit another diagram from the same source.

So, I hope these two pictures give you a clear picture & relatively set your expectations to more ground reality.

ARCHITECTURE:

Let us explore the architecture –

As one can see, the application is processing IoT data & creating a historical data volume, out of which the model is gradually predicting correct outcomes with higher confidence.

For more information on this API, please visit the following link.

CODE:

Let’s explore the essential scripts here.

clsConfig.py (Configuration file for the entire application.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 28-Dec-2021 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl
	import pandas as p

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'FILE_NAME': Curr_Path + sep + 'Data' + sep + 'thermostatIoT.csv',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'APP_DESC_1': 'Old Video Enhancement!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR': 'data',
	'SEP': sep,
	'testRatio':0.2,
	'valRatio':0.2,
	'epochsVal':8,
	'sleepTime':3,
	'sleepTime1':6,
	'factorVal':0.2,
	'learningRateVal':0.001,
	'event1': {
	'event': 'SummerEnd',
	'ds': p.to_datetime([
	'2010-04-01', '2011-04-01', '2012-04-01',
	'2013-04-01', '2014-04-01', '2015-04-01',
	'2016-04-01', '2017-04-01', '2018-04-01',
	'2019-04-01', '2020-04-01', '2021-04-01',
	]),},
	'event2': {
	'event': 'LongWeekend',
	'ds': p.to_datetime([
	'2010-12-01', '2011-12-01', '2012-12-01',
	'2013-12-01', '2014-12-01', '2015-12-01',
	'2016-12-01', '2017-12-01', '2018-12-01',
	'2019-12-01', '2020-12-01', '2021-12-01',
	]),}
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

The only key snippet would be passing a nested json element with pandas dataframe in the following lines –

'event1': {
    'event': 'SummerEnd',
    'ds': p.to_datetime([
        '2010-04-01', '2011-04-01', '2012-04-01',
        '2013-04-01', '2014-04-01', '2015-04-01',
        '2016-04-01', '2017-04-01', '2018-04-01',
        '2019-04-01', '2020-04-01', '2021-04-01',
    ]),},
'event2': {
    'event': 'LongWeekend',
    'ds': p.to_datetime([
        '2010-12-01', '2011-12-01', '2012-12-01',
        '2013-12-01', '2014-12-01', '2015-12-01',
        '2016-12-01', '2017-12-01', '2018-12-01',
        '2019-12-01', '2020-12-01', '2021-12-01',
    ]),}

As one can see, our application is equipped with the events to predict our use case better.

2. clsPredictIonIoT.py (Main class file, which will invoke neural-prophet forecast for the entire application.)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 19-Feb-2022 ####
	#### Modified On 21-Feb-2022 ####
	#### ####
	#### Objective: This python script will ####
	#### perform the neural-prophet forecast ####
	#### based on the historical input received ####
	#### from IoT device. ####
	################################################

	# We keep the setup code in a different class as shown below.
	from clsConfig import clsConfig as cf

	import psutil
	import os
	import pandas as p
	import json
	import datetime
	from neuralprophet import NeuralProphet, set_log_level
	from neuralprophet import set_random_seed
	from neuralprophet.benchmark import Dataset, NeuralProphetModel, SimpleExperiment, CrossValidationExperiment

	import time
	import clsL as cl

	import matplotlib.pyplot as plt

	###############################################
	### Global Section ###
	###############################################
	# Initiating Log class
	l = cl.clsL()

	set_random_seed(10)
	set_log_level("ERROR", "INFO")
	###############################################
	### End of Global Section ###
	###############################################

	class clsPredictIonIoT:
	def __init__(self):
	self.sleepTime = int(cf.conf['sleepTime'])
	self.event1 = cf.conf['event1']
	self.event2 = cf.conf['event2']

	def forecastSeries(self, inputDf):
	try:
	sleepTime = self.sleepTime
	event1 = self.event1
	event2 = self.event2

	df = inputDf

	print('IoTData: ')
	print(df)

	## user specified events
	# history events
	SummerEnd = p.DataFrame(event1)
	LongWeekend = p.DataFrame(event2)

	dfEvents = p.concat((SummerEnd, LongWeekend))

	# NeuralProphet Object
	# Adding events
	m = NeuralProphet(loss_func="MSE")

	# set the model to expect these events
	m = m.add_events(["SummerEnd", "LongWeekend"])

	# create the data df with events
	historyDf = m.create_df_with_events(df, dfEvents)

	# fit the model
	metrics = m.fit(historyDf, freq="D")

	# forecast with events known ahead
	futureDf = m.make_future_dataframe(df=historyDf, events_df=dfEvents, periods=365, n_historic_predictions=len(df))
	forecastDf = m.predict(df=futureDf)

	events = forecastDf[(forecastDf['event_SummerEnd'].abs() + forecastDf['event_LongWeekend'].abs()) > 0]
	events.tail()

	## plotting forecasts
	fig = m.plot(forecastDf)

	## plotting components
	figComp = m.plot_components(forecastDf)

	## plotting parameters
	figParam = m.plot_parameters()

	#################################
	#### Train & Test Evaluation ####
	#################################
	m = NeuralProphet(seasonality_mode= "multiplicative", learning_rate = 0.1)

	dfTrain, dfTest = m.split_df(df=df, freq="MS", valid_p=0.2)

	metricsTrain = m.fit(df=dfTrain, freq="MS")
	metricsTest = m.test(df=dfTest)

	print('metricsTest:: ')
	print(metricsTest)

	# Predict Into Future
	metricsTrain2 = m.fit(df=df, freq="MS")
	futureDf = m.make_future_dataframe(df, periods=24, n_historic_predictions=48)
	forecastDf = m.predict(futureDf)
	fig = m.plot(forecastDf)

	# Visualize training
	m = NeuralProphet(seasonality_mode="multiplicative", learning_rate=0.1)
	dfTrain, dfTest = m.split_df(df=df, freq="MS", valid_p=0.2)

	metrics = m.fit(df=dfTrain, freq="MS", validation_df=dfTest, plot_live_loss=True)

	print('Tail of Metrics: ')
	print(metrics.tail(1))

	######################################
	#### Time-series Cross-Validation ####
	######################################
	METRICS = ['SmoothL1Loss', 'MAE', 'RMSE']
	params = {"seasonality_mode": "multiplicative", "learning_rate": 0.1}

	folds = NeuralProphet(**params).crossvalidation_split_df(df, freq="MS", k=5, fold_pct=0.20, fold_overlap_pct=0.5)

	metricsTrain = p.DataFrame(columns=METRICS)
	metricsTest = p.DataFrame(columns=METRICS)

	for dfTrain, dfTest in folds:
	m = NeuralProphet(**params)
	train = m.fit(df=dfTrain, freq="MS")
	test = m.test(df=dfTest)
	metricsTrain = metricsTrain.append(train[METRICS].iloc[-1])
	metricsTest = metricsTest.append(test[METRICS].iloc[-1])

	print('Stats: ')
	dfStats = metricsTest.describe().loc[["mean", "std", "min", "max"]]
	print(dfStats)

	####################################
	#### Using Benchmark Framework ####
	####################################
	print('Starting extracting result set for Benchmark:')
	ts = Dataset(df = df, name = "thermoStatsCPUUsage", freq = "MS")
	params = {"seasonality_mode": "multiplicative"}
	exp = SimpleExperiment(
	model_class=NeuralProphetModel,
	params=params,
	data=ts,
	metrics=["MASE", "RMSE"],
	test_percentage=25,
	)
	resultTrain, resultTest = exp.run()

	print('Test result for Benchmark:: ')
	print(resultTest)
	print('Finished extracting result test for Benchmark!')

	####################################
	#### Cross Validate Experiment ####
	####################################
	print('Starting extracting result set for Corss-Validation:')
	ts = Dataset(df = df, name = "thermoStatsCPUUsage", freq = "MS")
	params = {"seasonality_mode": "multiplicative"}
	exp_cv = CrossValidationExperiment(
	model_class=NeuralProphetModel,
	params=params,
	data=ts,
	metrics=["MASE", "RMSE"],
	test_percentage=10,
	num_folds=3,
	fold_overlap_pct=0,
	)
	resultTrain, resultTest = exp_cv.run()

	print('resultTest for Cross Validation:: ')
	print(resultTest)
	print('Finished extracting result test for Corss-Validation!')

	######################################################
	#### 3-Phase Train, Test & Validation Experiment ####
	######################################################
	print('Starting 3-phase Train, Test & Validation Experiment!')

	m = NeuralProphet(seasonality_mode= "multiplicative", learning_rate = 0.1)

	# create a test holdout set:
	dfTrainVal, dfTest = m.split_df(df=df, freq="MS", valid_p=0.2)
	# create a validation holdout set:
	dfTrain, dfVal = m.split_df(df=dfTrainVal, freq="MS", valid_p=0.2)

	# fit a model on training data and evaluate on validation set.
	metricsTrain1 = m.fit(df=dfTrain, freq="MS")
	metrics_val = m.test(df=dfVal)

	# refit model on training and validation data and evaluate on test set.
	metricsTrain2 = m.fit(df=dfTrainVal, freq="MS")
	metricsTest = m.test(df=dfTest)

	metricsTrain1["split"] = "train1"
	metricsTrain2["split"] = "train2"
	metrics_val["split"] = "validate"
	metricsTest["split"] = "test"
	metrics_stat = metricsTrain1.tail(1).append([metricsTrain2.tail(1), metrics_val, metricsTest]).drop(columns=['RegLoss'])

	print('Metrics Stat:: ')
	print(metrics_stat)

	# Train, Cross-Validate and Cross-Test evaluation
	METRICS = ['SmoothL1Loss', 'MAE', 'RMSE']
	params = {"seasonality_mode": "multiplicative", "learning_rate": 0.1}

	crossVal, crossTest = NeuralProphet(**params).double_crossvalidation_split_df(df, freq="MS", k=5, valid_pct=0.10, test_pct=0.10)

	metricsTrain1 = p.DataFrame(columns=METRICS)
	metrics_val = p.DataFrame(columns=METRICS)
	for dfTrain1, dfVal in crossVal:
	m = NeuralProphet(**params)
	train1 = m.fit(df=dfTrain, freq="MS")
	val = m.test(df=dfVal)
	metricsTrain1 = metricsTrain1.append(train1[METRICS].iloc[-1])
	metrics_val = metrics_val.append(val[METRICS].iloc[-1])

	metricsTrain2 = p.DataFrame(columns=METRICS)
	metricsTest = p.DataFrame(columns=METRICS)
	for dfTrain2, dfTest in crossTest:
	m = NeuralProphet(**params)
	train2 = m.fit(df=dfTrain2, freq="MS")
	test = m.test(df=dfTest)
	metricsTrain2 = metricsTrain2.append(train2[METRICS].iloc[-1])
	metricsTest = metricsTest.append(test[METRICS].iloc[-1])

	mtrain2 = metricsTrain2.describe().loc[["mean", "std"]]
	print('Train 2 Stats:: ')
	print(mtrain2)

	mval = metrics_val.describe().loc[["mean", "std"]]
	print('Validation Stats:: ')
	print(mval)

	mtest = metricsTest.describe().loc[["mean", "std"]]
	print('Test Stats:: ')
	print(mtest)

	return 0
	except Exception as e:
	x = str(e)
	print('Error: ', x)

	return 1

view raw

clsPredictIonIoT.py

hosted with ❤ by GitHub

Some of the key snippets that I will discuss here are as follows –

## user specified events
# history events
SummerEnd = p.DataFrame(event1)
LongWeekend = p.DataFrame(event2)

dfEvents = p.concat((SummerEnd, LongWeekend))

# NeuralProphet Object
# Adding events
m = NeuralProphet(loss_func="MSE")

# set the model to expect these events
m = m.add_events(["SummerEnd", "LongWeekend"])

# create the data df with events
historyDf = m.create_df_with_events(df, dfEvents)

Creating & adding events into your model will allow it to predict based on the milestones.

# fit the model
metrics = m.fit(historyDf, freq="D")

# forecast with events known ahead
futureDf = m.make_future_dataframe(df=historyDf, events_df=dfEvents, periods=365, n_historic_predictions=len(df))
forecastDf = m.predict(df=futureDf)

events = forecastDf[(forecastDf['event_SummerEnd'].abs() + forecastDf['event_LongWeekend'].abs()) > 0]
events.tail()

## plotting forecasts
fig = m.plot(forecastDf)

## plotting components
figComp = m.plot_components(forecastDf)

## plotting parameters
figParam = m.plot_parameters()

Based on the daily/monthly collected data, our algorithm tries to plot the data points & predict a future trend, which will look like this –

From the above diagram, we can conclude that the CPU’s trend has been growing day by day since the beginning. However, there are some events when we can see a momentary drop in requirements due to the climate & holidays. During those times, either people are not using them or are not at home.

Apart from that, I’ve demonstrated the use of a benchwork framework, & splitting the data into Train, Test & Validation & captured the RMSE values. I would request you to go through that & post any questions if you have any.

You can witness the train & validation datasets & visualize them in the standard manner, which will look something like –

3. readingIoT.py (Main invoking script.)

	###############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 21-Feb-2022 ####
	#### Modified On 21-Feb-2022 ####
	#### ####
	#### Objective: This python script will ####
	#### invoke the main class to use the ####
	#### stored historical IoT data stored & ####
	#### then transform, cleanse, predict & ####
	#### analyze the data points into more ####
	#### meaningful decision-making insights. ####
	###############################################

	# We keep the setup code in a different class as shown below.
	from clsConfig import clsConfig as cf

	import datetime
	import logging
	import pandas as p

	import clsPredictIonIoT as cpt
	###############################################
	### Global Section ###
	###############################################

	sep = str(cf.conf['SEP'])
	Curr_Path = str(cf.conf['INIT_PATH'])
	fileName = str(cf.conf['FILE_NAME'])

	###############################################
	### End of Global Section ###
	###############################################

	def main():
	try:
	# Other useful variables
	debugInd = 'Y'
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	var1 = datetime.datetime.now()

	# Initiating Prediction class
	x1 = cpt.clsPredictIonIoT()

	print('Start Time: ', str(var))
	# End of useful variables

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'IoT_NeuralProphet.log', level=logging.INFO)

	# Reading the source IoT data
	iotData = p.read_csv(fileName)
	df = iotData.rename(columns={'MonthlyDate': 'ds', 'AvgIoTCPUUsage': 'y'})[['ds', 'y']]

	r1 = x1.forecastSeries(df)

	if (r1 == 0):
	print('Successfully IoT forecast predicted!')
	else:
	print('Failed to predict IoT forecast!')

	var2 = datetime.datetime.now()

	c = var2 – var1
	minutes = c.total_seconds() / 60
	print('Total Run Time in minutes: ', str(minutes))

	print('End Time: ', str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

readingIoT.py

hosted with ❤ by GitHub

Here are some of the key snippets –

# Reading the source IoT data
iotData = p.read_csv(fileName)
df = iotData.rename(columns={'MonthlyDate': 'ds', 'AvgIoTCPUUsage': 'y'})[['ds', 'y']]

r1 = x1.forecastSeries(df)

if (r1 == 0):
    print('Successfully IoT forecast predicted!')
else:
    print('Failed to predict IoT forecast!')

var2 = datetime.datetime.now()

In those above lines, the main calling application is invoking the neural-forecasting class & passing the pandas dataframe containing IoT’s historical data to train its model.

For your information, here is the outcome of the run, when you invoke the main calling script –

FOLDER STRUCTURE:

Please find the folder structure as shown –

So, we’ve done it.

You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.

Till then, Happy Avenging! 😀

Python-based dash framework visualizing real-time covid-19 trend.

Posted on September 9, 2021September 9, 2021 by SatyakiDe in analytic function, api, Azure, call, cloud, code, comma-separated, Crossplatform, dashboard, Data Science, design, function, gui, html, integration, json, machine-learning, numpy, Pandas, pattern matching, prophet-api, Python, Real-time, sql, Technology, write

Hi Team,

We’ll enhance our last post on Covid-19 prediction & try to capture them in a real-time dashboard, where the values in the visual display points will be affected as soon as the source data changes. In short, this is genuinely a real-time visual dashboard displaying all the graphs, trends depending upon the third-party API source data change.

However, I would like to share the run before we dig deep into this.

Demo Run

Architecture:

Let us understand the architecture for this solution –

From the above diagram, one can see that we’re maintaining a similar approach compared to our last initiative. However, we’ve used a different framework to display the data live.

To achieve this, we’ve used a compelling python-based framework called Dash. Other than that, we’ve used Ably, Plotly & Prophet API.

If you need to know more about our last post, please visit this link.

Package Installation:

Let us understand the sample packages that require for this task.

Step – 1:

Step – 2:

Step – 3:

Step – 4:

And, here is the command to install those packages –

pip install pandas
pip install plotly
pip install prophet
pip install dash
pip install pandas
pip install ably

Code:

Since this is an extension to our previous post, we’re not going to discuss other scripts, which we’ve already discussed over there. Instead, we will talk about the enhanced scripts & the new scripts that require for this use case.

1. clsConfig.py ( This native Python script contains the configuration entries. )

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 09-Sep-2021 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### Machine-Learning & streaming dashboard.####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfig(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'report',
	'FILE_NAME': Curr_Path + sep + 'data' + sep + 'TradeIn.csv',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'APP_DESC_1': 'Dash Integration with Ably!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'SUBDIR' : 'data',
	'ABLY_ID': 'XXX2LL.93kdkiU2:Kdsldoeie737484E',
	"URL":"https://corona-api.com/countries/",
	"appType":"application/json",
	"conType":"keep-alive",
	"limRec": 10,
	"CACHE":"no-cache",
	"MAX_RETRY": 3,
	"coList": "DE, IN, US, CA, GB, ID, BR",
	"FNC": "NewConfirmed",
	"TMS": "ReportedDate",
	"FND": "NewDeaths",
	"FinData": "Cache.csv"
	}

view raw

clsConfig.py

hosted with ❤ by GitHub

A few of the new entries, which are essential to this task are -> ABLY_ID & FinData.

2. clsPublishStream.py ( This script will publish the data transformed for Covid-19 predictions from the third-party sources. )

	###############################################################
	#### ####
	#### Written By: Satyaki De ####
	#### Written Date: 26-Jul-2021 ####
	#### Modified Date: 08-Sep-2021 ####
	#### ####
	#### Objective: This script will publish real-time ####
	#### streaming data coming out from a hosted API ####
	#### sources using another popular third-party service ####
	#### named Ably. Ably mimics pubsub Streaming concept, ####
	#### which might be extremely useful for any start-ups. ####
	#### ####
	###############################################################

	from ably import AblyRest
	import logging
	import json

	from random import seed
	from random import random

	import json
	import math
	import random

	from clsConfig import clsConfig as cf

	# Global Section

	logger = logging.getLogger('ably')
	logger.addHandler(logging.StreamHandler())

	ably_id = str(cf.conf['ABLY_ID'])

	ably = AblyRest(ably_id)
	channel = ably.channels.get('sd_channel')

	# End Of Global Section

	class clsPublishStream:
	def __init__(self):
	self.fnc = cf.conf['FNC']

	def pushEvents(self, srcDF, debugInd, varVa, flg):
	try:
	# JSON data
	# This is the default data for all the identified category
	# we've prepared. You can extract this dynamically. Or, By
	# default you can set their base trade details.

	json_data = [{'Year_Mon': '201911', 'Brazil': 0.0, 'Canada': 0.0, 'Germany': 0.0, 'India': 0.0, 'Indonesia': 0.0, 'UnitedKingdom': 0.0, 'UnitedStates': 0.0, 'Status': flg},
	{'Year_Mon': '201912', 'Brazil': 0.0, 'Canada': 0.0, 'Germany': 0.0, 'India': 0.0, 'Indonesia': 0.0, 'UnitedKingdom': 0.0, 'UnitedStates': 0.0, 'Status': flg}]

	jdata = json.dumps(json_data)

	# Publish a message to the sd_channel channel
	channel.publish('event', jdata)

	# Capturing the inbound dataframe
	iDF = srcDF

	# Adding new selected points
	covid_dict = iDF.to_dict('records')
	jdata_fin = json.dumps(covid_dict)

	# Publish rest of the messages to the sd_channel channel
	channel.publish('event', jdata_fin)

	jdata_fin = ''

	return 0

	except Exception as e:

	x = str(e)
	print(x)

	logging.info(x)

	return 1

view raw

clsPublishStream.py

hosted with ❤ by GitHub

We’ve already discussed this script. The only new line that appears here is –

json_data = [{'Year_Mon': '201911', 'Brazil': 0.0, 'Canada': 0.0, 'Germany': 0.0, 'India': 0.0, 'Indonesia': 0.0, 'UnitedKingdom': 0.0, 'UnitedStates': 0.0, 'Status': flg},
            {'Year_Mon': '201912', 'Brazil': 0.0, 'Canada': 0.0, 'Germany': 0.0, 'India': 0.0, 'Indonesia': 0.0, 'UnitedKingdom': 0.0, 'UnitedStates': 0.0, 'Status': flg}]

This statement is more like a dummy feed, which creates the basic structure of your graph.

3. clsStreamConsume.py ( This script will consume the stream from Ably Queue configuration entries. )

	##############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 26-Jul-2021 ####
	#### Modified On 08-Sep-2021 ####
	#### ####
	#### Objective: Consuming Streaming data ####
	#### from Ably channels published by the ####
	#### callPredictCovidAnalysisRealtime.py ####
	#### ####
	##############################################

	import json
	from clsConfig import clsConfig as cf
	import requests
	import logging
	import time
	import pandas as p
	import clsL as cl

	from ably import AblyRest

	# Initiating Log class
	l = cl.clsL()

	class clsStreamConsume:
	def __init__(self):
	self.ably_id = str(cf.conf['ABLY_ID'])
	self.fileName = str(cf.conf['FinData'])

	def conStream(self, varVa, debugInd):
	try:
	ably_id = self.ably_id
	fileName = self.fileName

	var = varVa
	debug_ind = debugInd

	# Fetching the data
	client = AblyRest(ably_id)
	channel = client.channels.get('sd_channel')

	message_page = channel.history()

	# Counter Value
	cnt = 0

	# Declaring Global Data-Frame
	df_conv = p.DataFrame()

	for i in message_page.items:
	print('Last Msg: {}'.format(i.data))
	json_data = json.loads(i.data)

	# Converting JSON to Dataframe
	df = p.json_normalize(json_data)
	df.columns = df.columns.map(lambda x: x.split(".")[-1])

	if cnt == 0:
	df_conv = df
	else:
	d_frames = [df_conv, df]
	df_conv = p.concat(d_frames)

	cnt += 1

	# Resetting the Index Value
	df_conv.reset_index(drop=True, inplace=True)


	# This will check whether the current load is happening
	# or not. Based on that, it will capture the old events
	# from cache.

	if df_conv.empty:
	df_conv = p.read_csv(fileName, index = True)
	else:
	l.logr(fileName, debug_ind, df_conv, 'log')

	return df_conv

	except Exception as e:

	x = str(e)
	print(x)

	logging.info(x)

	# This will handle the error scenaio as well.
	# Based on that, it will capture the old events
	# from cache.

	try:
	df_conv = p.read_csv(fileName, index = True)
	except:
	df = p.DataFrame()

	return df

view raw

clsStreamConsume.py

hosted with ❤ by GitHub

We’ve already discussed this script in one of my earlier posts, which you will get here.

So, I’m not going to discuss all the steps in detail.

The only added part was to introduce some temporary local caching mechanism.

if df_conv.empty:
    df_conv = p.read_csv(fileName, index = True)
else:
    l.logr(fileName, debug_ind, df_conv, 'log')

4. callPredictCovidAnalysisRealtime.py ( Main calling script to fetch the COVID-19 data from the third-party source & then publish it to the Ably message queue after transforming the data & adding the prediction using Facebook’s prophet API. )

	##############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 26-Jul-2021 ####
	#### Modified On 26-Jul-2021 ####
	#### ####
	#### Objective: Calling multiple API's ####
	#### that including Prophet-API developed ####
	#### by Facebook for future prediction of ####
	#### Covid-19 situations in upcoming days ####
	#### for world's major hotspots. ####
	##############################################

	import json

	import clsCovidAPI as ca
	from clsConfig import clsConfig as cf
	import datetime
	import logging
	import clsL as cl
	import math as m
	import clsPublishStream as cps

	import clsForecast as f

	from prophet import Prophet

	from prophet.plot import plot_plotly, plot_components_plotly

	import matplotlib.pyplot as plt
	import pandas as p
	import datetime as dt

	import time

	# Disbling Warning
	def warn(args, *kwargs):
	pass

	import warnings
	warnings.warn = warn

	# Initiating Log class
	l = cl.clsL()

	# Helper Function that removes underscores
	def countryDet(inputCD):
	try:
	countryCD = inputCD

	if str(countryCD) == 'DE':
	cntCD = 'Germany'
	elif str(countryCD) == 'BR':
	cntCD = 'Brazil'
	elif str(countryCD) == 'GB':
	cntCD = 'UnitedKingdom'
	elif str(countryCD) == 'US':
	cntCD = 'UnitedStates'
	elif str(countryCD) == 'IN':
	cntCD = 'India'
	elif str(countryCD) == 'CA':
	cntCD = 'Canada'
	elif str(countryCD) == 'ID':
	cntCD = 'Indonesia'
	else:
	cntCD = 'N/A'

	return cntCD
	except:
	cntCD = 'N/A'

	return cntCD

	def lookupCountry(row):
	try:
	strCD = str(row['CountryCode'])

	retVal = countryDet(strCD)

	return retVal
	except:
	retVal = 'N/A'

	return retVal

	def adjustTrend(row):
	try:
	flTrend = float(row['trend'])
	flTrendUpr = float(row['trend_upper'])
	flTrendLwr = float(row['trend_lower'])

	retVal = m.trunc((flTrend + flTrendUpr + flTrendLwr)/3)

	if retVal < 0:
	retVal = 0

	return retVal
	except:
	retVal = 0

	return retVal

	def ceilTrend(row, colName):
	try:
	flTrend = str(row[colName])

	if flTrend.find('.'):
	if float(flTrend) > 0:
	retVal = m.trunc(float(flTrend)) + 1
	else:
	retVal = m.trunc(float(flTrend))
	else:
	retVal = float(flTrend)

	if retVal < 0:
	retVal = 0

	return retVal
	except:
	retVal = 0

	return retVal

	def plot_picture(inputDF, debug_ind, var, countryCD, stat):
	try:
	iDF = inputDF

	# Lowercase the column names
	iDF.columns = [c.lower() for c in iDF.columns]
	# Determine which is Y axis
	y_col = [c for c in iDF.columns if c.startswith('y')][0]
	# Determine which is X axis
	x_col = [c for c in iDF.columns if c.startswith('ds')][0]

	# Data Conversion
	iDF['y'] = iDF[y_col].astype('float')
	iDF['ds'] = iDF[x_col].astype('datetime64[ns]')

	# Forecast calculations
	# Decreasing the changepoint_prior_scale to 0.001 to make the trend less flexible
	m = Prophet(n_changepoints=20, yearly_seasonality=True, changepoint_prior_scale=0.001)
	#m = Prophet(n_changepoints=20, yearly_seasonality=True, changepoint_prior_scale=0.04525)
	#m = Prophet(n_changepoints=['2021-09-10'])
	m.fit(iDF)

	forecastDF = m.make_future_dataframe(periods=365)

	forecastDF = m.predict(forecastDF)

	l.logr('15.forecastDF_' + var + '_' + countryCD + '.csv', debug_ind, forecastDF, 'log')

	df_M = forecastDF[['ds', 'trend', 'trend_lower', 'trend_upper']]

	l.logr('16.df_M_' + var + '_' + countryCD + '.csv', debug_ind, df_M, 'log')

	# Getting Full Country Name
	cntCD = countryDet(countryCD)

	# Draw forecast results
	df_M['Country'] = cntCD

	l.logr('17.df_M_C_' + var + '_' + countryCD + '.csv', debug_ind, df_M, 'log')

	df_M['AdjustTrend'] = df_M.apply(lambda row: adjustTrend(row), axis=1)

	l.logr('20.df_M_AdjustTrend_' + var + '_' + countryCD + '.csv', debug_ind, df_M, 'log')

	return df_M

	except Exception as e:
	x = str(e)
	print(x)

	df = p.DataFrame()

	return df

	def countrySpecificDF(counryDF, val):
	try:
	countryName = val
	df = counryDF

	df_lkpFile = df[(df['CountryCode'] == val)]

	return df_lkpFile
	except:
	df = p.DataFrame()

	return df

	def toNum(row, colName):
	try:
	flTrend = str(row[colName])
	flTr, subpart = flTrend.split(' ')
	retVal = int(flTr.replace('-',''))

	return retVal
	except:
	retVal = 0

	return retVal

	def extractPredictedDF(OrigDF, MergePredictedDF, colName):
	try:
	iDF_1 = OrigDF
	iDF_2 = MergePredictedDF

	dt_format = '%Y-%m-%d'

	iDF_1_max_group = iDF_1.groupby(["Country"] , as_index=False)["ReportedDate"].max()

	iDF_2['ReportedDate'] = iDF_2.apply(lambda row: toNum(row, 'ds'), axis=1)

	col_one_list = iDF_1_max_group['Country'].tolist()
	col_two_list = iDF_1_max_group['ReportedDate'].tolist()

	print('col_one_list: ', str(col_one_list))
	print('col_two_list: ', str(col_two_list))

	cnt_1_x = 1
	cnt_1_y = 1
	cnt_x = 0

	df_M = p.DataFrame()

	for i in col_one_list:
	str_countryVal = str(i)
	cnt_1_y = 1

	for j in col_two_list:

	intReportDate = int(str(j).strip().replace('-',''))

	if cnt_1_x == cnt_1_y:
	print('str_countryVal: ', str(str_countryVal))
	print('intReportDate: ', str(intReportDate))

	iDF_2_M = iDF_2[(iDF_2['Country'] == str_countryVal) & (iDF_2['ReportedDate'] > intReportDate)]

	# Merging with the previous Country Code data
	if cnt_x == 0:
	df_M = iDF_2_M
	else:
	d_frames = [df_M, iDF_2_M]
	df_M = p.concat(d_frames)

	cnt_x += 1

	cnt_1_y += 1

	cnt_1_x += 1

	df_M.drop(columns=['ReportedDate'], axis=1, inplace=True)
	df_M.rename(columns={'ds':'ReportedDate'}, inplace=True)
	df_M.rename(columns={'AdjustTrend':colName}, inplace=True)

	return df_M
	except:
	df = p.DataFrame()

	return df

	def toPivot(inDF, colName):
	try:
	iDF = inDF

	iDF_Piv = iDF.pivot_table(colName, ['ReportedDate'], 'Country')
	iDF_Piv.reset_index( drop=False, inplace=True )

	list1 = ['ReportedDate']

	iDF_Arr = iDF['Country'].unique()
	list2 = iDF_Arr.tolist()

	listV = list1 + list2

	iDF_Piv.reindex([listV], axis=1)

	return iDF_Piv
	except Exception as e:
	x = str(e)
	print(x)

	df = p.DataFrame()

	return df

	def toAgg(inDF, var, debugInd, flg):
	try:
	iDF = inDF
	colName = "ReportedDate"

	list1 = list(iDF.columns.values)
	list1.remove(colName)

	list1 = ["Brazil", "Canada", "Germany", "India", "Indonesia", "UnitedKingdom", "UnitedStates"]

	iDF['Year_Mon'] = iDF[colName].apply(lambda x:x.strftime('%Y%m'))
	iDF.drop(columns=[colName], axis=1, inplace=True)

	ColNameGrp = "Year_Mon"
	print('List1 Aggregate:: ', str(list1))
	print('ColNameGrp :: ', str(ColNameGrp))

	iDF_T = iDF[["Year_Mon", "Brazil", "Canada", "Germany", "India", "Indonesia", "UnitedKingdom", "UnitedStates"]]
	iDF_T.fillna(0, inplace = True)
	print('iDF_T:: ')
	print(iDF_T)

	iDF_1_max_group = iDF_T.groupby(ColNameGrp, as_index=False)[list1].sum()
	iDF_1_max_group['Status'] = flg

	return iDF_1_max_group
	except Exception as e:
	x = str(e)
	print(x)

	df = p.DataFrame()

	return df

	def publishEvents(inDF1, inDF2, inDF3, inDF4, var, debugInd):
	try:
	# Original Covid Data from API
	iDF1 = inDF1
	iDF2 = inDF2

	NC = 'NewConfirmed'
	ND = 'NewDeaths'

	iDF1_PV = toPivot(iDF1, NC)
	iDF1_PV['ReportedDate'] = p.to_datetime(iDF1_PV['ReportedDate'])
	l.logr('57.iDF1_PV_' + var + '.csv', debugInd, iDF1_PV, 'log')

	iDF2_PV = toPivot(iDF2, ND)
	iDF2_PV['ReportedDate'] = p.to_datetime(iDF2_PV['ReportedDate'])
	l.logr('58.iDF2_PV_' + var + '.csv', debugInd, iDF2_PV, 'log')

	# Predicted Covid Data from Facebook API
	iDF3 = inDF3
	iDF4 = inDF4

	iDF3_PV = toPivot(iDF3, NC)
	l.logr('59.iDF3_PV_' + var + '.csv', debugInd, iDF3_PV, 'log')

	iDF4_PV = toPivot(iDF4, ND)
	l.logr('60.iDF4_PV_' + var + '.csv', debugInd, iDF4_PV, 'log')

	# Now aggregating data based on year-month only
	iDF1_Agg = toAgg(iDF1_PV, var, debugInd, NC)
	l.logr('61.iDF1_Agg_' + var + '.csv', debugInd, iDF1_Agg, 'log')

	iDF2_Agg = toAgg(iDF2_PV, var, debugInd, ND)
	l.logr('62.iDF2_Agg_' + var + '.csv', debugInd, iDF2_Agg, 'log')

	iDF3_Agg = toAgg(iDF3_PV, var, debugInd, NC)
	l.logr('63.iDF3_Agg_' + var + '.csv', debugInd, iDF3_Agg, 'log')

	iDF4_Agg = toAgg(iDF4_PV, var, debugInd, ND)
	l.logr('64.iDF4_Agg_' + var + '.csv', debugInd, iDF4_Agg, 'log')

	# Initiating Ably class to push events
	x1 = cps.clsPublishStream()

	# Pushing both the Historical Confirmed Cases
	retVal_1 = x1.pushEvents(iDF1_Agg, debugInd, var, NC)

	if retVal_1 == 0:
	print('Successfully historical event pushed!')
	else:
	print('Failed to push historical events!')

	# Pushing both the Historical Death Cases
	retVal_3 = x1.pushEvents(iDF2_Agg, debugInd, var, ND)

	if retVal_3 == 0:
	print('Successfully historical event pushed!')
	else:
	print('Failed to push historical events!')

	time.sleep(5)

	# Pushing both the New Confirmed Cases
	retVal_2 = x1.pushEvents(iDF3_Agg, debugInd, var, NC)

	if retVal_2 == 0:
	print('Successfully predicted event pushed!')
	else:
	print('Failed to push predicted events!')

	# Pushing both the New Death Cases
	retVal_4 = x1.pushEvents(iDF4_Agg, debugInd, var, ND)

	if retVal_4 == 0:
	print('Successfully predicted event pushed!')
	else:
	print('Failed to push predicted events!')


	return 0
	except Exception as e:
	x = str(e)

	print(x)

	return 1

	def main():
	try:
	var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print('' 60)
	DInd = 'Y'
	NC = 'New Confirmed'
	ND = 'New Dead'
	SM = 'data process Successful!'
	FM = 'data process Failure!'

	print("Calling the custom Package for large file splitting..")
	print('Start Time: ' + str(var1))

	countryList = str(cf.conf['coList']).split(',')

	# Initiating Log Class
	general_log_path = str(cf.conf['LOG_PATH'])

	# Enabling Logging Info
	logging.basicConfig(filename=general_log_path + 'CovidAPI.log', level=logging.INFO)

	# Create the instance of the Covid API Class
	x1 = ca.clsCovidAPI()

	# Let's pass this to our map section
	retDF = x1.searchQry(var1, DInd)

	retVal = int(retDF.shape[0])

	if retVal > 0:
	print('Successfully Covid Data Extracted from the API-source.')
	else:
	print('Something wrong with your API-source!')

	# Extracting Skeleton Data
	df = retDF[['data.code', 'date', 'deaths', 'confirmed', 'recovered', 'new_confirmed', 'new_recovered', 'new_deaths', 'active']]

	df.columns = ['CountryCode', 'ReportedDate', 'TotalReportedDead', 'TotalConfirmedCase', 'TotalRecovered', 'NewConfirmed', 'NewRecovered', 'NewDeaths', 'ActiveCaases']

	df.dropna()

	print('Returned Skeleton Data Frame: ')
	print(df)

	l.logr('5.df_' + var1 + '.csv', DInd, df, 'log')

	# Due to source data issue, application will perform of
	# avg of counts based on dates due to multiple entries
	g_df = df.groupby(["CountryCode", "ReportedDate"] , as_index=False)["TotalReportedDead","TotalConfirmedCase","TotalRecovered","NewConfirmed","NewRecovered","NewDeaths","ActiveCaases"].mean()
	g_df['TotalReportedDead_M'] = g_df.apply(lambda row: ceilTrend(row, 'TotalReportedDead'), axis=1)
	g_df['TotalConfirmedCase_M'] = g_df.apply(lambda row: ceilTrend(row, 'TotalConfirmedCase'), axis=1)
	g_df['TotalRecovered_M'] = g_df.apply(lambda row: ceilTrend(row, 'TotalRecovered'), axis=1)
	g_df['NewConfirmed_M'] = g_df.apply(lambda row: ceilTrend(row, 'NewConfirmed'), axis=1)
	g_df['NewRecovered_M'] = g_df.apply(lambda row: ceilTrend(row, 'NewRecovered'), axis=1)
	g_df['NewDeaths_M'] = g_df.apply(lambda row: ceilTrend(row, 'NewDeaths'), axis=1)
	g_df['ActiveCaases_M'] = g_df.apply(lambda row: ceilTrend(row, 'ActiveCaases'), axis=1)

	# Dropping old columns
	g_df.drop(columns=['TotalReportedDead', 'TotalConfirmedCase', 'TotalRecovered', 'NewConfirmed', 'NewRecovered', 'NewDeaths', 'ActiveCaases'], axis=1, inplace=True)

	# Renaming the new columns to old columns
	g_df.rename(columns={'TotalReportedDead_M':'TotalReportedDead'}, inplace=True)
	g_df.rename(columns={'TotalConfirmedCase_M':'TotalConfirmedCase'}, inplace=True)
	g_df.rename(columns={'TotalRecovered_M':'TotalRecovered'}, inplace=True)
	g_df.rename(columns={'NewConfirmed_M':'NewConfirmed'}, inplace=True)
	g_df.rename(columns={'NewRecovered_M':'NewRecovered'}, inplace=True)
	g_df.rename(columns={'NewDeaths_M':'NewDeaths'}, inplace=True)
	g_df.rename(columns={'ActiveCaases_M':'ActiveCaases'}, inplace=True)

	l.logr('5.g_df_' + var1 + '.csv', DInd, g_df, 'log')

	# Working with forecast
	# Create the instance of the Forecast API Class
	x2 = f.clsForecast()

	# Fetching each country name & then get the details
	cnt = 6
	cnt_x = 0
	cnt_y = 0

	df_M_Confirmed = p.DataFrame()
	df_M_Deaths = p.DataFrame()

	for i in countryList:
	try:
	cntryIndiv = i.strip()

	cntryFullName = countryDet(cntryIndiv)

	print('Country Porcessing: ' + str(cntryFullName))

	# Creating dataframe for each country
	# Germany Main DataFrame
	dfCountry = countrySpecificDF(g_df, cntryIndiv)
	l.logr(str(cnt) + '.df_' + cntryIndiv + '_' + var1 + '.csv', DInd, dfCountry, 'log')

	# Let's pass this to our map section
	retDFGenNC = x2.forecastNewConfirmed(dfCountry, DInd, var1)

	statVal = str(NC)

	a1 = plot_picture(retDFGenNC, DInd, var1, cntryIndiv, statVal)

	# Merging with the previous Country Code data
	if cnt_x == 0:
	df_M_Confirmed = a1
	else:
	d_frames = [df_M_Confirmed, a1]
	df_M_Confirmed = p.concat(d_frames)

	cnt_x += 1

	retDFGenNC_D = x2.forecastNewDead(dfCountry, DInd, var1)

	statVal = str(ND)

	a2 = plot_picture(retDFGenNC_D, DInd, var1, cntryIndiv, statVal)

	# Merging with the previous Country Code data
	if cnt_y == 0:
	df_M_Deaths = a2
	else:
	d_frames = [df_M_Deaths, a2]
	df_M_Deaths = p.concat(d_frames)

	cnt_y += 1

	# Printing Proper message
	if (a1 + a2) == 0:
	oprMsg = cntryFullName + ' ' + SM
	print(oprMsg)
	else:
	oprMsg = cntryFullName + ' ' + FM
	print(oprMsg)

	# Resetting the dataframe value for the next iteration
	dfCountry = p.DataFrame()
	cntryIndiv = ''
	oprMsg = ''
	cntryFullName = ''
	a1 = 0
	a2 = 0
	statVal = ''

	cnt += 1
	except Exception as e:
	x = str(e)
	print(x)

	l.logr('49.df_M_Confirmed_' + var1 + '.csv', DInd, df_M_Confirmed, 'log')
	l.logr('50.df_M_Deaths_' + var1 + '.csv', DInd, df_M_Deaths, 'log')

	# Removing unwanted columns
	df_M_Confirmed.drop(columns=['trend', 'trend_lower', 'trend_upper'], axis=1, inplace=True)
	df_M_Deaths.drop(columns=['trend', 'trend_lower', 'trend_upper'], axis=1, inplace=True)

	l.logr('51.df_M_Confirmed_' + var1 + '.csv', DInd, df_M_Confirmed, 'log')
	l.logr('52.df_M_Deaths_' + var1 + '.csv', DInd, df_M_Deaths, 'log')

	# Creating original dataframe from the source API
	df_M_Confirmed_Orig = g_df[['CountryCode', 'ReportedDate','NewConfirmed']]
	df_M_Deaths_Orig = g_df[['CountryCode', 'ReportedDate','NewDeaths']]

	# Transforming Country Code
	df_M_Confirmed_Orig['Country'] = df_M_Confirmed_Orig.apply(lambda row: lookupCountry(row), axis=1)
	df_M_Deaths_Orig['Country'] = df_M_Deaths_Orig.apply(lambda row: lookupCountry(row), axis=1)

	# Dropping unwanted column
	df_M_Confirmed_Orig.drop(columns=['CountryCode'], axis=1, inplace=True)
	df_M_Deaths_Orig.drop(columns=['CountryCode'], axis=1, inplace=True)

	# Reordering columns
	df_M_Confirmed_Orig = df_M_Confirmed_Orig.reindex(['ReportedDate','Country','NewConfirmed'], axis=1)
	df_M_Deaths_Orig = df_M_Deaths_Orig.reindex(['ReportedDate','Country','NewDeaths'], axis=1)

	l.logr('53.df_M_Confirmed_Orig_' + var1 + '.csv', DInd, df_M_Confirmed_Orig, 'log')
	l.logr('54.df_M_Deaths_Orig_' + var1 + '.csv', DInd, df_M_Deaths_Orig, 'log')

	# Filter out only the predicted data
	filterDF_1 = extractPredictedDF(df_M_Confirmed_Orig, df_M_Confirmed, 'NewConfirmed')
	l.logr('55.filterDF_1_' + var1 + '.csv', DInd, filterDF_1, 'log')

	filterDF_2 = extractPredictedDF(df_M_Confirmed_Orig, df_M_Confirmed, 'NewDeaths')
	l.logr('56.filterDF_2_' + var1 + '.csv', DInd, filterDF_2, 'log')

	# Calling the final publish events
	retVa = publishEvents(df_M_Confirmed_Orig, df_M_Deaths_Orig, filterDF_1, filterDF_2, var1, DInd)

	if retVa == 0:
	print('Successfully stream processed!')
	else:
	print('Failed to process stream!')


	var2 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print('End Time: ' + str(var2))
	print('' 60)

	except Exception as e:
	x = str(e)

	print(x)

	if __name__ == "__main__":
	main()

view raw

callPredictCovidAnalysisRealtime.py

hosted with ❤ by GitHub

Let us understand the enhancement part of this script –

We’ve taken out the plotly part as we will use a separate dashboard script to visualize the data trend.

However, we need to understand the initial consumed data from API & how we transform the data, which will be helpful for visualization.

The initial captured data should look like this after extracting only the relevant elements from the API response.

As you can see that based on the country & reported date, our application is consuming attributes like Total-Reported-Death, Total-Recovered, New-death, New-Confirmed & so on.

From this list, we’ve taken two attributes for our use cases & they are New-Death & New-Confirmed. Also, we’re predicting the Future-New-Death & Future-New-Confirmed based on the historical data using Facebook’s prophet API.

And, we would be transposing them & extract the countries & put them as columns for better representations.

Hence, here is the code that we should be exploring –

def toPivot(inDF, colName):
    try:
        iDF = inDF

        iDF_Piv = iDF.pivot_table(colName, ['ReportedDate'], 'Country')
        iDF_Piv.reset_index( drop=False, inplace=True )

        list1 = ['ReportedDate']

        iDF_Arr = iDF['Country'].unique()
        list2 = iDF_Arr.tolist()

        listV = list1 + list2

        iDF_Piv.reindex([listV], axis=1)

        return iDF_Piv
    except Exception as e:
        x = str(e)
        print(x)

        df = p.DataFrame()

        return df

Now, using the pivot_table function, we’re transposing the row values into the columns. And, later we’ve realigned the column heading as per our desired format.

However, we still have the data as per individual daily dates in this case. We want to eliminate that by removing the daypart & then aggregate them by month as shown below –

And, here is the code for that –

def toAgg(inDF, var, debugInd, flg):
    try:
        iDF = inDF
        colName = "ReportedDate"

        list1 = list(iDF.columns.values)
        list1.remove(colName)

        list1 = ["Brazil", "Canada", "Germany", "India", "Indonesia", "UnitedKingdom", "UnitedStates"]

        iDF['Year_Mon'] = iDF[colName].apply(lambda x:x.strftime('%Y%m'))
        iDF.drop(columns=[colName], axis=1, inplace=True)

        ColNameGrp = "Year_Mon"
        print('List1 Aggregate:: ', str(list1))
        print('ColNameGrp :: ', str(ColNameGrp))

        iDF_T = iDF[["Year_Mon", "Brazil", "Canada", "Germany", "India", "Indonesia", "UnitedKingdom", "UnitedStates"]]
        iDF_T.fillna(0, inplace = True)
        print('iDF_T:: ')
        print(iDF_T)

        iDF_1_max_group = iDF_T.groupby(ColNameGrp, as_index=False)[list1].sum()
        iDF_1_max_group['Status'] = flg

        return iDF_1_max_group
    except Exception as e:
        x = str(e)
        print(x)

        df = p.DataFrame()

        return df

From the above snippet we can conclude that the application is taking out the daypart & then aggregate it based on the Year_Mon attribute.

The following snippet will push the final transformed data to Ably queue –

x1 = cps.clsPublishStream()

# Pushing both the Historical Confirmed Cases
retVal_1 = x1.pushEvents(iDF1_Agg, debugInd, var, NC)

if retVal_1 == 0:
    print('Successfully historical event pushed!')
else:
    print('Failed to push historical events!')

5. dashboard_realtime.py ( Main calling script to consume the data from Ably queue & then visualize the trend. )

	##############################################
	#### Written By: SATYAKI DE ####
	#### Written On: 08-Sep-2021 ####
	#### Modified On 08-Sep-2021 ####
	#### ####
	#### Objective: This is the main script ####
	#### to invoke dashboard after consuming ####
	#### streaming real-time predicted data ####
	#### using Facebook API & Ably message Q. ####
	#### ####
	#### This script will show the trend ####
	#### comparison between major democracies ####
	#### of the world. ####
	#### ####
	##############################################

	import datetime

	import dash
	from dash import dcc
	from dash import html
	import plotly
	from dash.dependencies import Input, Output
	from ably import AblyRest

	from clsConfig import clsConfig as cf
	import pandas as p

	# Main Class to consume streaming
	import clsStreamConsume as ca

	import numpy as np

	# Create the instance of the Covid API Class
	x1 = ca.clsStreamConsume()

	external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'%5D

	app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

	app.layout = html.Div(
	html.Div([
	html.H1("Covid-19 Trend Dashboard",
	className='text-center text-primary mb-4'),
	html.H5(children='''
	Dash: Covid-19 Trend – (Present Vs Future)
	'''),
	html.P("Covid-19: New Confirmed Cases:",
	style={"textDecoration": "underline"}),
	dcc.Graph(id='live-update-graph-1'),
	html.P("Covid-19: New Death Cases:",
	style={"textDecoration": "underline"}),
	dcc.Graph(id='live-update-graph-2'),
	dcc.Interval(
	id='interval-component',
	interval=5*1000, # in milliseconds
	n_intervals=0
	)
	], className="row", style={'marginBottom': 10, 'marginTop': 10})
	)

	def to_OptimizeString(row):
	try:
	x_str = str(row['Year_Mon'])

	dt_format = '%Y%m%d'
	finStr = x_str + '01'

	strReportDate = datetime.datetime.strptime(finStr, dt_format)

	return strReportDate

	except Exception as e:
	x = str(e)
	print(x)

	dt_format = '%Y%m%d'
	var = '20990101'

	strReportDate = datetime.strptime(var, dt_format)

	return strReportDate

	def fetchEvent(var1, DInd):
	try:
	# Let's pass this to our map section
	iDF_M = x1.conStream(var1, DInd)

	# Converting Year_Mon to dates
	iDF_M['Year_Mon_Mod']= iDF_M.apply(lambda row: to_OptimizeString(row), axis=1)

	# Dropping old columns
	iDF_M.drop(columns=['Year_Mon'], axis=1, inplace=True)

	#Renaming new column to old column
	iDF_M.rename(columns={'Year_Mon_Mod':'Year_Mon'}, inplace=True)

	return iDF_M

	except Exception as e:
	x = str(e)
	print(x)

	iDF_M = p.DataFrame()

	return iDF_M

	# Multiple components can update everytime interval gets fired.
	@app.callback(Output('live-update-graph-1', 'figure'),
	Input('interval-component', 'n_intervals'))
	def update_graph_live(n):
	try:
	var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print('' 60)
	DInd = 'Y'

	# Let's pass this to our map section
	retDF = fetchEvent(var1, DInd)

	# Create the graph with subplots
	#fig = plotly.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.3, horizontal_spacing=0.2)
	fig = plotly.tools.make_subplots(rows=2, cols=1, vertical_spacing=0.3, horizontal_spacing=0.2)

	# Routing data to dedicated DataFrame
	retDFNC = retDF.loc[(retDF['Status'] == 'NewConfirmed')]

	# Adding different chart into one dashboard
	# First Use Case – New Confirmed
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Brazil,'type':'scatter','name':'Brazil'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Canada,'type':'scatter','name':'Canada'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Germany,'type':'scatter','name':'Germany'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.India,'type':'scatter','name':'India'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Indonesia,'type':'scatter','name':'Indonesia'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.UnitedKingdom,'type':'scatter','name':'United Kingdom'},1,1)
	fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.UnitedStates,'type':'scatter','name':'United States'},1,1)

	return fig

	except Exception as e:
	x = str(e)
	print(x)

	# Create the graph with subplots
	fig = plotly.tools.make_subplots(rows=2, cols=1, vertical_spacing=0.2)

	fig['layout']['margin'] = {
	'l': 30, 'r': 10, 'b': 30, 't': 10
	}
	fig['layout']['legend'] = {'x': 0, 'y': 1, 'xanchor': 'left'}

	return fig

	# Multiple components can update everytime interval gets fired.
	@app.callback(Output('live-update-graph-2', 'figure'),
	Input('interval-component', 'n_intervals'))
	def update_graph_live(n):
	try:
	var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print('' 60)
	DInd = 'Y'

	# Let's pass this to our map section
	retDF = fetchEvent(var1, DInd)

	# Create the graph with subplots
	#fig = plotly.tools.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.3, horizontal_spacing=0.2)
	fig = plotly.tools.make_subplots(rows=2, cols=1, vertical_spacing=0.3, horizontal_spacing=0.2)

	# Routing data to dedicated DataFrame
	retDFND = retDF.loc[(retDF['Status'] == 'NewDeaths')]

	# Adding different chart into one dashboard
	# Second Use Case – New Confirmed
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.Brazil,'type':'bar','name':'Brazil'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.Canada,'type':'bar','name':'Canada'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.Germany,'type':'bar','name':'Germany'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.India,'type':'bar','name':'India'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.Indonesia,'type':'bar','name':'Indonesia'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.UnitedKingdom,'type':'bar','name':'United Kingdom'},1,1)
	fig.append_trace({'x':retDFND.Year_Mon,'y':retDFND.UnitedStates,'type':'bar','name':'United States'},1,1)

	return fig

	except Exception as e:
	x = str(e)
	print(x)

	# Create the graph with subplots
	fig = plotly.tools.make_subplots(rows=2, cols=1, vertical_spacing=0.2)

	fig['layout']['margin'] = {
	'l': 30, 'r': 10, 'b': 30, 't': 10
	}
	fig['layout']['legend'] = {'x': 0, 'y': 1, 'xanchor': 'left'}

	return fig

	if __name__ == '__main__':
	app.run_server(debug=True)

view raw

dashboard_realtime.py

hosted with ❤ by GitHub

Let us explore the critical snippet as this is a brand new script –

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

app.layout = html.Div(
    html.Div([
        html.H1("Covid-19 Trend Dashboard",
                        className='text-center text-primary mb-4'),
        html.H5(children='''
            Dash: Covid-19 Trend - (Present Vs Future)
        '''),
        html.P("Covid-19: New Confirmed Cases:",
               style={"textDecoration": "underline"}),
        dcc.Graph(id='live-update-graph-1'),
        html.P("Covid-19: New Death Cases:",
               style={"textDecoration": "underline"}),
        dcc.Graph(id='live-update-graph-2'),
        dcc.Interval(
            id='interval-component',
            interval=5*1000, # in milliseconds
            n_intervals=0
        )
    ], className="row", style={'marginBottom': 10, 'marginTop': 10})
)

You need to understand the basics of HTML as this framework works seamlessly with it. To know more about the supported HTML, one needs to visit the following link.

def to_OptimizeString(row):
    try:
        x_str = str(row['Year_Mon'])

        dt_format = '%Y%m%d'
        finStr = x_str + '01'

        strReportDate = datetime.datetime.strptime(finStr, dt_format)

        return strReportDate

    except Exception as e:
        x = str(e)
        print(x)

        dt_format = '%Y%m%d'
        var = '20990101'

        strReportDate = datetime.strptime(var, dt_format)

        return strReportDate

The application is converting Year-Month combinations from string to date for better projection.

Also, we’ve implemented a dashboard that will refresh every five milliseconds.

def fetchEvent(var1, DInd):
    try:
        # Let's pass this to our map section
        iDF_M = x1.conStream(var1, DInd)

        # Converting Year_Mon to dates
        iDF_M['Year_Mon_Mod']= iDF_M.apply(lambda row: to_OptimizeString(row), axis=1)

        # Dropping old columns
        iDF_M.drop(columns=['Year_Mon'], axis=1, inplace=True)

        #Renaming new column to old column
        iDF_M.rename(columns={'Year_Mon_Mod':'Year_Mon'}, inplace=True)

        return iDF_M

    except Exception as e:
        x = str(e)
        print(x)

        iDF_M = p.DataFrame()

        return iDF_M

The application will consume all the events from the Ably Queue using the above snippet.

@app.callback(Output('live-update-graph-1', 'figure'),
              Input('interval-component', 'n_intervals'))
def update_graph_live(n):

We’ve implemented the callback mechanism to get the latest data from the Queue & then update the graph accordingly & finally share the updated chart & return that to our method, which is calling it.

# Routing data to dedicated DataFrame
retDFNC = retDF.loc[(retDF['Status'] == 'NewConfirmed')]

Based on the flag, we’re pushing the data into our target dataframe, from where the application will consume the data into the charts.

fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Brazil,'type':'scatter','name':'Brazil'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Canada,'type':'scatter','name':'Canada'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Germany,'type':'scatter','name':'Germany'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.India,'type':'scatter','name':'India'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.Indonesia,'type':'scatter','name':'Indonesia'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.UnitedKingdom,'type':'scatter','name':'United Kingdom'},1,1)
fig.append_trace({'x':retDFNC.Year_Mon,'y':retDFNC.UnitedStates,'type':'scatter','name':'United States'},1,1)

Different country’s KPI elements are fetched & mapped into their corresponding axis to project the graph with visual details.

Same approach goes for the other graph as well.

Run:

Let us run the application –

Dashboard:

So, we’ve done it.

You will get the complete codebase in the following Github link.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.

One more thing you need to understand is that this prediction based on limited data points. The actual event may happen differently. Ideally, countries are taking a cue from this kind of analysis & are initiating appropriate measures to avoid the high-curve. And, that is one of the main objective of time series analysis.

There is always a room for improvement of this kind of models & the solution associated with it. I’ve shown the basic ways to achieve the same for the education purpose only.

Predicting Flipkart business growth factor using Linear-Regression Machine Learning Model

Posted on May 16, 2020March 11, 2021 by SatyakiDe in call, code, features, function, Linear-Regression, machine-learning, member function, Pandas, pattern matching, Python, regexp_substr, regular expression, snippet, String Manipulation

Hi Guys,

Today, We’ll be exploring the potential business growth factor using the “Linear-Regression Machine Learning” model. We’ve prepared a set of dummy data & based on that, we’ll predict.

Let’s explore a few sample data –

So, based on these data, we would like to predict YearlyAmountSpent dependent on any one of the following features, i.e. [ Time On App / Time On Website / Flipkart Membership Duration (In Year) ].

You need to install the following packages –

pip install pandas
pip install matplotlib
pip install sklearn

We’ll be discussing only the main calling script & class script. However, we’ll be posting the parameters without discussing it. And, we won’t discuss clsL.py as we’ve already discussed that in our previous post.

1. clsConfig.py (This script contains all the parameter details.)

################################################
#### Written By: SATYAKI DE                 ####
#### Written On: 15-May-2020                ####
####                                        ####
#### Objective: This script is a config     ####
#### file, contains all the keys for        ####
#### Machine-Learning. Application will     ####
#### process these information & perform    ####
#### various analysis on Linear-Regression. ####
################################################

import os
import platform as pl

class clsConfig(object):
    Curr_Path = os.path.dirname(os.path.realpath(__file__))

    os_det = pl.system()
    if os_det == "Windows":
        sep = '\\'
    else:
        sep = '/'

    config = {
        'APP_ID': 1,
        'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
        'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
        'LOG_PATH': Curr_Path + sep + 'log' + sep,
        'REPORT_PATH': Curr_Path + sep + 'report',
        'FILE_NAME': Curr_Path + sep + 'Data' + sep + 'FlipkartCustomers.csv',
        'SRC_PATH': Curr_Path + sep + 'Data' + sep,
        'APP_DESC_1': 'IBM Watson Language Understand!',
        'DEBUG_IND': 'N',
        'INIT_PATH': Curr_Path
    }

2. clsLinearRegression.py (This is the main script, which will invoke the Machine-Learning API & return 0 if successful.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 15-May-2020              ####
#### Modified On 15-May-2020              ####
####                                      ####
#### Objective: Main scripts for Linear   ####
#### Regression.                          ####
##############################################

import pandas as p
import numpy as np
import regex as re

import matplotlib.pyplot as plt
from clsConfig import clsConfig as cf

# %matplotlib inline -- for Jupyter Notebook
class clsLinearRegression:
    def __init__(self):
        self.fileName =  cf.config['FILE_NAME']

    def predictResult(self):
        try:

            inputFileName = self.fileName

            # Reading from Input File
            df = p.read_csv(inputFileName)

            print()
            print('Projecting sample rows: ')
            print(df.head())

            print()
            x_row = df.shape[0]
            x_col = df.shape[1]

            print('Total Number of Rows: ', x_row)
            print('Total Number of columns: ', x_col)

            # Adding Features
            x = df[['TimeOnApp', 'TimeOnWebsite', 'FlipkartMembershipInYear']]

            # Target Variable - Trying to predict
            y = df['YearlyAmountSpent']

            # Now Train-Test Split of your source data
            from sklearn.model_selection import train_test_split

            # test_size => % of allocated data for your test cases
            # random_state => A specific set of random split on your data
            X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)

            # Importing Model
            from sklearn.linear_model import LinearRegression

            # Creating an Instance
            lm = LinearRegression()

            # Train or Fit my model on Training Data
            lm.fit(X_train, Y_train)

            # Creating a prediction value
            flipKartSalePrediction = lm.predict(X_test)

            # Creating a scatter plot based on Actual Value & Predicted Value
            plt.scatter(Y_test, flipKartSalePrediction)

            # Adding meaningful Label
            plt.xlabel('Actual Values')
            plt.ylabel('Predicted Values')

            # Checking Individual Metrics
            from sklearn import metrics

            print()
            mea_val = metrics.mean_absolute_error(Y_test, flipKartSalePrediction)
            print('Mean Absolute Error (MEA): ', mea_val)

            mse_val = metrics.mean_squared_error(Y_test, flipKartSalePrediction)
            print('Mean Square Error (MSE): ', mse_val)

            rmse_val = np.sqrt(metrics.mean_squared_error(Y_test, flipKartSalePrediction))
            print('Square root Mean Square Error (RMSE): ', rmse_val)

            print()

            # Check Variance Score - R^2 Value
            print('Variance Score:')
            var_score = str(round(metrics.explained_variance_score(Y_test, flipKartSalePrediction) * 100, 2)).strip()
            print('Our Model is', var_score, '% accurate. ')
            print()

            # Finding Coeficent on X_train.columns
            print()
            print('Finding Coeficent: ')

            cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])
            print('Printing the All the Factors: ')
            print(cedf)

            print()

            # Getting the Max Value from it
            cedf['MaxFactorForBusiness'] = cedf['Coefficient'].max()

            # Filtering the max Value to identify the biggest Business factor
            dfMax = cedf[(cedf['MaxFactorForBusiness'] == cedf['Coefficient'])]

            # Dropping the derived column
            dfMax.drop(columns=['MaxFactorForBusiness'], inplace=True)
            dfMax = dfMax.reset_index()

            print(dfMax)

            # Extracting Actual Business Factor from Pandas dataframe
            str_factor_temp = str(dfMax.iloc[0]['index'])
            str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)
            str_value = str(round(float(dfMax.iloc[0]['Coefficient']),2))

            print()
            print('*' * 80)
            print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')
            print('*' * 80)
            print()

            # This is require when you are trying to print from conventional
            # front & not using Jupyter notebook.
            plt.show()

            return 0

        except Exception  as e:
            x = str(e)
            print('Error : ', x)

            return 1

Key lines from the above snippet –

# Adding Features
x = df[['TimeOnApp', 'TimeOnWebsite', 'FlipkartMembershipInYear']]

Our application creating a subset of the main datagram, which contains all the features.

# Target Variable - Trying to predict
y = df['YearlyAmountSpent']

Now, the application is setting the target variable into ‘Y.’

# Now Train-Test Split of your source data
from sklearn.model_selection import train_test_split

# test_size => % of allocated data for your test cases
# random_state => A specific set of random split on your data
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)

As per “Supervised Learning,” our application is splitting the dataset into two subsets. One is to train the model & another segment is to test your final model. However, you can divide the data into three sets that include the performance statistics for a large dataset. In our case, we don’t need that as this data is significantly less.

# Train or Fit my model on Training Data
lm.fit(X_train, Y_train)

Our application is now training/fit the data into the model.

# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)

Our application projected the outcome based on the predicted data in a scatterplot graph.

Also, the following concepts captured by using our program. For more details, I’ve provided the external link for your reference –

And, the implementation has shown as –

mea_val = metrics.mean_absolute_error(Y_test, flipKartSalePrediction)
print('Mean Absolute Error (MEA): ', mea_val)

mse_val = metrics.mean_squared_error(Y_test, flipKartSalePrediction)
print('Mean Square Error (MSE): ', mse_val)

rmse_val = np.sqrt(metrics.mean_squared_error(Y_test, flipKartSalePrediction))
print('Square Root Mean Square Error (RMSE): ', rmse_val)

At this moment, we would like to check the credibility of our model by using the variance score are as follows –

var_score = str(round(metrics.explained_variance_score(Y_test, flipKartSalePrediction) * 100, 2)).strip()
print('Our Model is', var_score, '% accurate. ')

Finally, extracting the coefficient to find out, which particular feature will lead Flikkart for better sale & growth by taking the maximum of coefficient value month the all features are as shown below –

cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])

# Getting the Max Value from it
cedf['MaxFactorForBusiness'] = cedf['Coefficient'].max()

# Filtering the max Value to identify the biggest Business factor
dfMax = cedf[(cedf['MaxFactorForBusiness'] == cedf['Coefficient'])]

# Dropping the derived column
dfMax.drop(columns=['MaxFactorForBusiness'], inplace=True)
dfMax = dfMax.reset_index()

Note that we’ve used a regular expression to split the camel-case column name from our feature & represent that with a much more meaningful name without changing the column name.

# Extracting Actual Business Factor from Pandas dataframe
str_factor_temp = str(dfMax.iloc[0]['index'])
str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)
str_value = str(round(float(dfMax.iloc[0]['Coefficient']),2))

print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')

3. callLinear.py (This is the first calling script.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 15-May-2020              ####
#### Modified On 15-May-2020              ####
####                                      ####
#### Objective: Main calling scripts.     ####
##############################################

from clsConfig import clsConfig as cf
import clsL as cl
import logging
import datetime
import clsLinearRegression as cw

# Disbling Warning
def warn(*args, **kwargs):
    pass

import warnings
warnings.warn = warn

# Lookup functions from
# Azure cloud SQL DB

var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

def main():
    try:
        ret_1 = 0
        general_log_path = str(cf.config['LOG_PATH'])

        # Enabling Logging Info
        logging.basicConfig(filename=general_log_path + 'MachineLearning_LinearRegression.log', level=logging.INFO)

        # Initiating Log Class
        l = cl.clsL()

        # Moving previous day log files to archive directory
        log_dir = cf.config['LOG_PATH']
        curr_ver =datetime.datetime.now().strftime("%Y-%m-%d")

        tmpR0 = "*" * 157

        logging.info(tmpR0)
        tmpR9 = 'Start Time: ' + str(var)
        logging.info(tmpR9)
        logging.info(tmpR0)

        print("Log Directory::", log_dir)
        tmpR1 = 'Log Directory::' + log_dir
        logging.info(tmpR1)

        print('Machine Learning - Linear Regression Prediction : ')
        print('-' * 200)

        # Create the instance of the Linear-Regression Class
        x2 = cw.clsLinearRegression()

        ret = x2.predictResult()

        if ret == 0:
            print('Successful Linear-Regression Prediction Generated!')
        else:
            print('Failed to generate Linear-Regression Prediction!')

        print("-" * 200)
        print()

        print('Finding Analysis points..')
        print("*" * 200)
        logging.info('Finding Analysis points..')
        logging.info(tmpR0)


        tmpR10 = 'End Time: ' + str(var)
        logging.info(tmpR10)
        logging.info(tmpR0)

    except ValueError as e:
        print(str(e))
        logging.info(str(e))

    except Exception as e:
        print("Top level Error: args:{0}, message{1}".format(e.args, e.message))

if __name__ == "__main__":
    main()

Key snippet from the above script –

# Create the instance of the Linear-Regression
x2 = cw.clsLinearRegression()

ret = x2.predictResult()

In the above snippet, our application initially creating an instance of the main class & finally invokes the “predictResult” method.

Let’s run our application –

Step 1:

First, the application will fetch the following sample rows from our source file – if it is successful.

Step 2:

Then, It will create the following scatterplot by executing the following snippet –

# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)

Note that our model is pretty accurate & it has a balanced success rate compared to our predicted numbers.

Step 3:

Finally, it is successfully able to project the critical feature are shown below –

From the above picture, you can see that our model is pretty accurate (89% approx).

Also, highlighted red square identifying the key-features & their confidence score & finally, the projecting the winner feature marked in green.

So, as per that, we’ve come to one conclusion that Flipkart’s business growth depends on the tenure of their subscriber, i.e., old members are prone to buy more than newer members.

Let’s look into our directory structure –

So, we’ve done it.

I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀

Note: All the data posted here are representational data & available over the internet & for educational purpose only.

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Tag: prediction

Neural prophet – The enhanced version of Facebook’s forecasting API

Like this:

Predicting Flipkart business growth factor using Linear-Regression Machine Learning Model

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: