This site mainly deals with various use cases demonstrated using Python, Data Science, Cloud basics, SQL Server, Oracle, Teradata along with SQL & their implementation. Expecting yours active participation & time. This blog can be access from your TP, Tablet & mobile also. Please provide your feedback.
Today, I’ll be discussing a short but critical python topic. That is capturing the performance matrix by analyzing the memory profiling.
We’ll take any ordinary scripts & then use this package to analyze them.
But, before we start, why don’t we see the demo & then go through it?
Demo
Isn’t exciting? Let us understand in details.
For this, we’ve used the following package –
pip install memory-profiler
How you can run this?
All you have to do is to modify your existing python function & add this “profile” keyword. And this will open a brand new information shop for you.
#####################################################
#### Written By: SATYAKI DE ####
#### Written On: 22-Jul-2022 ####
#### Modified On 30-Aug-2022 ####
#### ####
#### Objective: This is the main calling ####
#### python script that will invoke the ####
#### clsReadForm class to initiate ####
#### the reading capability in real-time ####
#### & display text from a formatted forms. ####
#####################################################
# We keep the setup code in a different class as shown below.
import clsReadForm as rf
from clsConfig import clsConfig as cf
import datetime
import logging
###############################################
### Global Section ###
###############################################
# Instantiating all the main class
x1 = rf.clsReadForm()
###############################################
### End of Global Section ###
###############################################
@profile
def main():
try:
# Other useful variables
debugInd = 'Y'
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
var1 = datetime.datetime.now()
print('Start Time: ', str(var))
# End of useful variables
# Initiating Log Class
general_log_path = str(cf.conf['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path + 'readingForm.log', level=logging.INFO)
print('Started extracting text from formatted forms!')
# Execute all the pass
r1 = x1.startProcess(debugInd, var)
if (r1 == 0):
print('Successfully extracted text from the formatted forms!')
else:
print('Failed to extract the text from the formatted forms!')
var2 = datetime.datetime.now()
c = var2 - var1
minutes = c.total_seconds() / 60
print('Total difference in minutes: ', str(minutes))
print('End Time: ', str(var1))
except Exception as e:
x = str(e)
print('Error: ', x)
if __name__ == "__main__":
main()
Let us analyze the code. As you can see that, we’ve converted a normal python main function & mar it as @profile.
The next step is to run the following command –
python -m memory_profiler readingForm.py
This will trigger the script & it will collect all the memory information against individual lines & display it as shown in the demo.
I think this will give all the python developer a great insight about their quality of the code, which they have developed. To know more on this you can visit the following link.
I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.
Till then, Happy Avenging! 🙂
Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.
This week we’re going to extend one of our earlier posts & trying to read an entire text from streaming using computer vision. If you want to view the previous post, please click the following link.
But, before we proceed, why don’t we view the demo first?
Demo
Architecture:
Let us understand the architecture flow –
Architecture flow
The above diagram shows that the application, which uses the Open-CV, analyzes individual frames from the source & extracts the complete text within the video & displays it on top of the target screen besides prints the same in the console.
Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.
clsReadingTextFromStream.py (This is the main class of python script that will extract the text from the WebCAM streaming in real-time.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Please find the key snippet from the above script –
# Two output layer names for the text detector model
lNames = cf.conf['LAYER_DET']
# Tesseract OCR text param values
strVal = "-l " + str(cf.conf['LANG']) + " --oem " + str(cf.conf['OEM_VAL']) + " --psm " + str(cf.conf['PSM_VAL']) + ""
config = (strVal)
The first line contains the two output layers’ names for the text detector model. Among them, the first one indicates the outcome possibilities & the second one use to derive the bounding box coordinates of the predicted text.
The second line contains various options for the tesseract APIs. You need to understand the opportunities in detail to make them work. These are the essential options for our use case –
Language – The intended language, for example, English, Spanish, Hindi, Bengali, etc.
OEM flag – In this case, the application will use 4 to indicate LSTM neural net model for OCR.
OEM Value – In this case, the selected value is 7, indicating that the application treats the ROI as a single line of text.
For more details, please refer to the config file.
print("[INFO] Loading Text Detector...")
net = cv2.dnn.readNet(modelPath)
The above lines bring the already created model & load it to memory for evaluation.
# Setting new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (wt, ht)
rW = origW / float(newW)
rH = origH / float(newH)
# Resize the frame and grab the new frame dimensions
frame = cv2.resize(frame, (newW, newH))
(H, W) = frame.shape[:2]
# Construct a blob from the frame and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H), sParam, swapRB=True, crop=False)
net.setInput(blob)
(confScore, imgGeo) = net.forward(lNames)
# Decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = self.predictText(confScore, imgGeo)
boxes = non_max_suppression(np.array(rects), probs=confidences)
The above lines are more of preparing individual frames to get the bounding box by resizing the height & width followed by a forward pass of the model to obtain two output layer sets. And then apply the non-maxima suppression to remove the weak, overlapping bounding box by interpreting the prediction. In short, this will identify the potential text region & put the bounding box surrounding it.
# Initialize the list of results
res = []
# Getting BoundingBox boundaries
res = self.findBoundBox(boxes, res, rW, rH, orig, origW, origH, pad)
The above function will create the bounding box surrounding the predicted text regions. Also, we will capture the expected text inside the result variable.
for (spX, spY, epX, epY) in boxes:
# Scale the bounding box coordinates based on the respective
# ratios
spX = int(spX * rW)
spY = int(spY * rH)
epX = int(epX * rW)
epY = int(epY * rH)
# To obtain a better OCR of the text we can potentially
# apply a bit of padding surrounding the bounding box.
# And, computing the deltas in both the x and y directions
dX = int((epX - spX) * pad)
dY = int((epY - spY) * pad)
# Apply padding to each side of the bounding box, respectively
spX = max(0, spX - dX)
spY = max(0, spY - dY)
epX = min(origW, epX + (dX * 2))
epY = min(origH, epY + (dY * 2))
# Extract the actual padded ROI
roi = orig[spY:epY, spX:epX]
Now, the application will scale the bounding boxes based on the previously computed ratio for actual text recognition. In this process, the application also padded the bounding boxes & then extracted the padded region of interest.
# Choose the proper OCR Config
text = pytesseract.image_to_string(roi, config=config)
# Add the bounding box coordinates and OCR'd text to the list
# of results
res.append(((spX, spY, epX, epY), text))
Using OCR options, the application extracts the text within the video frame & adds that to the res list.
# Sort the results bounding box coordinates from top to bottom
res = sorted(res, key=lambda r:r[0][1])
It then sends a sorted output to the primary calling functions.
for ((spX, spY, epX, epY), text) in res:
# Display the text OCR by using Tesseract APIs
print("Reading Text::")
print("=" *60)
print(text)
print("=" *60)
# Removing the non-ASCII text so it can draw the text on the frame
# using OpenCV, then draw the text and a bounding box surrounding
# the text region of the input frame
text = "".join([c if ord(c) < aRange else "" for c in text]).strip()
output = orig.copy()
cv2.rectangle(output, (spX, spY), (epX, epY), drawTag, 2)
cv2.putText(output, text, (spX, spY - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.2, drawTag, 3)
# Show the output frame
cv2.imshow(title, output)
Finally, it fetches the potential text region along with the text & then prints on top of the source video. Also, it removed some non-printable characters during this time to avoid any cryptic texts.
readingVideo.py (Main calling script.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
# Instantiating all the main class
x1 = rtfs.clsReadingTextFromStream()
# Execute all the pass
r1 = x1.processStream(debugInd, var)
if (r1 == 0):
print('Successfully read text from the Live Stream!')
else:
print('Failed to read text from the Live Stream!')
The above lines instantiate the main calling class & then invoke the function to get the desired extracted text from the live streaming video if that is successful.
FOLDER STRUCTURE:
Here is the folder structure that contains all the files & directories in MAC O/S –
You will get the complete codebase in the following Github link.
Unfortunately, I cannot upload the model due to it’s size. I will share on the need basis.
I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.
Till then, Happy Avenging! 🙂
Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality.
Today, I’ll be demonstrating a short but significant topic. There are widespread facts that, on many occasions, Python is relatively slower than other strongly typed programming languages like C++, Java, or even the latest version of PHP.
I found a relatively old post with a comparison shown between Python and the other popular languages. You can find the details at this link.
However, I haven’t verified the outcome. So, I can’t comment on the final statistics provided on that link.
My purpose is to find cases where I can take certain tricks to improve performance drastically.
One preferable option would be the use of Cython. That involves the middle ground between C & Python & brings the best out of both worlds.
The other option would be the use of GPU for vector computations. That would drastically increase the processing power. Today, we’ll be exploring this option.
Let’s find out what we need to prepare our environment before we try out on this.
Step – 1 (Installing dependent packages):
pip install pyopencl
pip install plaidml-keras
So, we will be taking advantage of the Keras package to use our GPU. And, the screen should look like this –
Installation Process of Python-based Packages
Once we’ve installed the packages, we’ll configure the package showing on the next screen.
Configuration of Packages
For our case, we need to install pandas as we’ll be using numpy, which comes default with it.
Installation of supplemental packages
Let’s explore our standard snippet to test this use case.
Case 1 (Normal computational code in Python):
##############################################
#### Written By: SATYAKI DE ####
#### Written On: 18-Jan-2020 ####
#### ####
#### Objective: Main calling scripts for ####
#### normal execution. ####
##############################################
import numpy as np
from timeit import default_timer as timer
def pow(a, b, c):
for i in range(a.size):
c[i] = a[i] ** b[i]
def main():
vec_size = 100000000
a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)
start = timer()
pow(a, b, c)
duration = timer() - start
print(duration)
if __name__ == '__main__':
main()
Case 2 (GPU-based computational code in Python):
#################################################
#### Written By: SATYAKI DE ####
#### Written On: 18-Jan-2020 ####
#### ####
#### Objective: Main calling scripts for ####
#### use of GPU to speed-up the performance. ####
#################################################
import numpy as np
from timeit import default_timer as timer
# Adding GPU Instance
from os import environ
environ["KERAS_BACKEND"] = "plaidml.keras.backend"
def pow(a, b):
return a ** b
def main():
vec_size = 100000000
a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)
start = timer()
c = pow(a, b)
duration = timer() - start
print(duration)
if __name__ == '__main__':
main()
And, here comes the output for your comparisons –
Case 1 Vs Case 2:
Performance Comparisons
As you can see, there is a significant improvement that we can achieve using this. However, it has limited scope. Not everywhere you get the benefits. Until or unless Python decides to work on the performance side, you better need to explore either of the two options that I’ve discussed here (I didn’t mention a lot on Cython here. Maybe some other day.).
To get the codebase you can refer the following Github link.
So, finally, we have done it.
I’ll bring some more exciting topic in the coming days from the Python verse.
Till then, Happy Avenging! 😀
Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only.
Today, I’ll be using a popular tool known as Mulesoft to generate a mock API & then we’ll be testing the same using python. Mulesoft is an excellent tool to rapidly develop API & also can integrate multiple cloud environments as an Integration platform. You can use their Anypoint platform to quickly design such APIs for your organization. You can find the details in the following link. However, considering the cost, many organization has to devise their own product or tool to do the same. That’s where developing a Python or Node.js or C# comes adequately considering the cloud platform.
Before we start, let us quickly know what Mock API is?
A mock API server imitates a real API server by providing realistic responses to requests. They can be on your local machine or the public Internet. Responses can be static or dynamic, and simulate the data the real API would return, matching the schema with data types, objects, and arrays.
And why do we need that?
A mock API server is useful during development and testing when live data is either unavailable or unreliable. While designing an API, you can use mock APIs to work concurrently on the front and back-end, as well as to gather feedback from developers. Our mock API sever guide for testing covers how you can use a mock API server so the absence of a real API doesn’t hold you back.
Often with internal projects, the API consumer (such as a front end developer through REST APIs) moves faster than the backend team building the API. This API mocking guide shows how a mock API server allows developers to consume a working API with the same interface as the eventual production API. As an added benefit, the backend team can discover where the mock API doesn’t meet the developer’s needs without spending developer time on features that may be removed or changed. This fast feedback loop can make engineering teams much more efficient.
If you need more information on this topic, you can refer to the following link.
Great! Since now we have a background of mock API – let’s explore how Mulesoft can help us here?
Mulesoft used the “RESTfulAPIModelingLanguage (RAML)” language. We’ll be using this language to develop our mock API. To know more about this, you can view the following link.
Under the developer section, you can find Tutorials as shown in the screenshot given below –
You can select any of the categories & learn basic scripting from it.
Now, let’s take a look at the process of creating a Mulesoft free account to test our theories.
Step 1:
Click the following link, and you will see the page as shown below –
Step 2:
Now, click the login shown in the RED square. You will see the following page –
Step 3:
Please provide your credentials if you already have an account. Else, you have to click the “Sign-Up” & then you will need to provide the few details as shown below –
Step 4:
Once, you successfully create the account, you will see the following page –
So, now we are set. To design an API, you will need to click the design center as marked within the white square.
Once you click the “Start designing” button, this will land into the next screen.
As shown above, you need to click the “Create new” for fresh API design.
This will prompt you the next screen –
Now, you need to create the – “Create API specification” as marked in the RED square box. And, that will prompt you the following screen –
You have to provide a meaningful name of our API & you can choose either Text or Visual editor. For this task, we’ll be selecting the Text Editor. And we’ll select RAML 1.0 as our preferred language. Once, we provide all the relevant information, the “Create Specification” button marked in Green will be activated. And then you need to click it. It will lead you to the next screen –
Since we’ll be preparing this for mock API, we need to activate that by clicking the toggle button marked in the GREEN square box on the top-right side. And, this will generate an automated baseUri script as shown below –
Now, we’re ready to develop our RAML code for the mock API. Let’s look into the RAML code.
1. phonevalisd.raml (This is the mock API script, which will send the response of an API request by returning a mock JSON if successful conditions met.)
#%RAML 1.0# Created By - Satyaki De# Date: 01-Mar-2020# Description: This is an Mock API
baseUri: https://anypoint.mulesoft.com/mocking/api/v1/links/09KK0pos-1080-4049-9e04-a093456a64a8/#
title: PhoneVSD
securitySchemes:
basic :
type: Basic Authentication
displayName: Satyaki's Basic Authentication
description: API Only works with the basic authentication
protocols:
- HTTP
description: This is a REST API Json base service to verify any phone numbers.
documentation:
- title: PHONE VERIFY API
content: This is a Mock API, which will simulate the activity of a Phone Validation API.
types:
apiresponse:
properties:
valid: boolean
number: string
local_format: string
international_format: string
country_prefix: string
country_code: string
country_name: string
location: string
carrier: string
line_type: string
/validate:
get:
queryParameters:
access_key: string
number: string
country_code: string
format: string
description: For Validating the phone
displayName: Validate phone
protocols:
- HTTP
responses:
403:
body:
application/json:
properties:
message: string
example:
{
message : "Resource does not exists!"
}
400:
body:
application/json:
properties:
message: string
example:
{
message : "API Key is invalid!"
}
200:
body:
application/json:
type: apiresponse
example:
{
"valid":true,
"number":"17579758240",
"local_format":"7579758240",
"international_format":"+17579758240",
"country_prefix":"+1",
"country_code":"US",
"country_name":"United States of America",
"location":"Nwptnwszn1",
"carrier":"MetroPCS Communications Inc.",
"line_type":"mobile"
}
Let’s quickly explore the critical snippet from the above script.
We’ve created a provision for a few specific cases of response as part of our business logic & standards.
Once, we’re done with our coding, we need to focus on two places as shown in the below picture –
The snippet marked in RED square box, identifying our mandatory input parameters shown in the code as well as the right-hand side of the walls.
To test this mock API locally, you can pass these key parameters as follows –
Now, you have to click the Send button marked in a GREEN square box. This will send your query parameters & as per our API response, you can see the output just below the Send button as follows –
Now, we’re good to publish this mock API in the Mulesoft Anywhere portal. This will help us to test it from an external application i.e., Python-based application for our case. So, click the “Publish” button highlighted with the Blue square box. That will prompt the following screen –
Now, we’ll click the “Public to Exchange” button marked with the GREEN square box. This will prompt the next screen as shown below –
Now, you need to fill up the relevant details & then click – “Publish to Exchange,” as shown above. And, that will lead to the following screen –
And, after a few second you will see the next screen –
Now, you can click “Done” to close this popup. And, to verify the status, you can check it by clicking the top-left side of the code-editor & then click “Design Center” as shown below –
So, we’re done with our Mulesoft mock API design & deployment. Let’s test it from our Python application. We’ll be only discussing the key snippets here.
2. clsConfig.py (This is the parameter file for our mock API script.)
################################################## Written By: SATYAKI DE ######## Written On: 04-Apr-2020 ######## ######## Objective: This script is a config ######## file, contains all the keys for ######## Mulesoft Mock API. Application will ######## process these information & perform ######## the call to our newly developed Mock ######## API in Mulesoft. ##################################################importosimportplatformasplclassclsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))
os_det = pl.system()
if os_det =="Windows":
sep ='\\'else:
sep ='/'
config = {
'APP_ID': 1,
'URL': "https://anypoint.mulesoft.com/mocking/api/v1/links/a23e4e71-9c25-317b-834b-10b0debc3a30/validate",
'CLIENT_SECRET': 'a12345670bacb1e3cec55e2f1234567d',
'API_TYPE': "application/json",
'CACHE': "no-cache",
'CON': "keep-alive",
'ARCH_DIR': Curr_Path + sep +'arch'+ sep,
'PROFILE_PATH': Curr_Path + sep +'profile'+ sep,
'LOG_PATH': Curr_Path + sep +'log'+ sep,
'REPORT_PATH': Curr_Path + sep +'report',
'SRC_PATH': Curr_Path + sep +'Src_File'+ sep,
'APP_DESC_1': 'Mule Mock API Calling!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path
}
Today, I’ll share a little different topic in Python compared to my last couple of posts, where I have demonstrated the use of Python in the field of machine learning & forecast modeling.
We’ll explore to create meaningful sample data points for Airlines & hotel reservations. At this moment, this industry is the hard-hit due to the pandemic. And I personally wish a speedy recovery to all employees who risked their lives to maintain the operation or might have lost their jobs due to this time.
I’ll be providing only major scripts & will show how you can extract critical data from their API.
However, to create the API, you need to register in Amadeus as a developer & follow specific steps to get the API details. You will need to register using the following link.
Step 1:
Once you provide the necessary details, you need to activate your account by clicking the email validation.
Step 2:
As part of the next step, you will be clicking the “Self-Service Workspace” option as marked in the green box shown above.
Now, you have to click “My apps“ & under that, you need to click – “Create new app” shown below –
Step 3:
You need to provide the following details before creating the API. Note that once you create – it will take 30 minutes to activate the API-link.
Step 4:
You will come to the next page once you click the “Create” button in the previous step.
For production, you need to create a separate key shown above.
You need to install the following packages –
pip install amadeus
And, the installation process is shown as –
pip install flatten_json
And, this installation process is shown as –
1. clsAmedeus (This is the API script, which will send the API requests & return JSON if successful.)
################################################## Written By: SATYAKI DE ######## Written On: 05-Jul-2020 ######## Modified On 05-Jul-2020 ######## ######## Objective: Main calling scripts. ##################################################fromamadeusimport Client, ResponseError
importjsonfromclsConfigimport clsConfig as cf
classclsAmedeus:
def__init__(self):
self.client_id = cf.config['CLIENT_ID']
self.client_secret = cf.config['CLIENT_SECRET']
self.type = cf.config['API_TYPE']
defflightOffers(self, origLocn, destLocn, departDate, noOfAdult):
try:
cnt =0# Setting Clients
amadeus = Client(
client_id=str(self.client_id),
client_secret=str(self.client_secret)
)
# Flight Offers
response = amadeus.shopping.flight_offers_search.get(
originLocationCode=origLocn,
destinationLocationCode=destLocn,
departureDate=departDate,
adults=noOfAdult)
ResJson = response.data
return ResJson
exceptExceptionas e:
print(e)
x =str(e)
ResJson = {'errorDetails': x}
return ResJson
defcheapestDate(self, origLocn, destLocn):
try:
# Setting Clients
amadeus = Client(
client_id=self.client_id,
client_secret=self.client_secret
)
# Flight Offers# Flight Cheapest Date Search
response = amadeus.shopping.flight_dates.get(origin=origLocn, destination=destLocn)
ResJson = response.data
return ResJson
exceptExceptionas e:
print(e)
x =str(e)
ResJson = {'errorDetails': x}
return ResJson
deflistOfHotelsByCity(self, origLocn):
try:
# Setting Clients
amadeus = Client(
client_id=self.client_id,
client_secret=self.client_secret
)
# Hotel Search# Get list of Hotels by city code
response = amadeus.shopping.hotel_offers.get(cityCode=origLocn)
ResJson = response.data
return ResJson
exceptExceptionas e:
print(e)
x =str(e)
ResJson = {'errorDetails': x}
return ResJson
deflistOfOffersBySpecificHotels(self, hotelID):
try:
# Setting Clients
amadeus = Client(
client_id=self.client_id,
client_secret=self.client_secret
)
# Get list of offers for a specific hotel
response = amadeus.shopping.hotel_offers_by_hotel.get(hotelId=hotelID)
ResJson = response.data
return ResJson
exceptExceptionas e:
print(e)
x =str(e)
ResJson = {'errorDetails': x}
return ResJson
defhotelReview(self, hotelID):
try:
# Setting Clients
amadeus = Client(
client_id=self.client_id,
client_secret=self.client_secret
)
# Hotel Ratings# What travelers think about this hotel?
response = amadeus.e_reputation.hotel_sentiments.get(hotelIds=hotelID)
ResJson = response.data
return ResJson
exceptExceptionas e:
print(e)
x =str(e)
ResJson = {'errorDetails': x}
return ResJson
defprocess(self, choice, origLocn, destLocn, departDate, noOfAdult, hotelID):
try:
# Main Area to call apropriate choiceif choice ==1:
resJson =self.flightOffers(origLocn, destLocn, departDate, noOfAdult)
elif choice ==2:
resJson =self.cheapestDate(origLocn, destLocn)
elif choice ==3:
resJson =self.listOfHotelsByCity(origLocn)
elif choice ==4:
resJson =self.listOfOffersBySpecificHotels(hotelID)
elif choice ==5:
resJson =self.hotelReview(hotelID)
else:
resJson = {'errorDetails': 'Invalid Options!'}
# Converting back to JSON
jdata = json.dumps(resJson)
# Checking the begining character# for the new package# As that requires dictionary array# Hence, We'll be adding '[' if this# is missing from the return payload
SYM = jdata[:1]
if SYM !='[':
rdata ='['+ jdata +']'else:
rdata = jdata
ResJson = json.loads(rdata)
return ResJson
except ResponseError as error:
x =str(error)
resJson = {'errorDetails': x}
return resJson
Let’s explore the key lines –
Creating an instance of the client by providing the recently acquired API Key & API-Secret.
Today, I’ll be demonstrating some scenarios based on open-source data from Canada. In this post, I will only explain some of the significant parts of the code. Not the entire range of scripts here.
Let’s explore a couple of sample source data –
I would like to explore how much this disease caused an impact on the elderly in Canada.
Let’s explore the source directory structure –
For this, you need to install the following packages –
In this case, we’ve downloaded the data from Canada’s site. However, they have created API. So, you can consume the data through that way as well. Since the volume is a little large. I decided to download that in CSV & then use that for my analysis.
Before I start, let me explain a couple of critical assumptions that I had to make due to data impurities or availabilities.
If there is no data available for a specific case, my application will consider that patient as COVID-Active.
We will consider the patient is affected through Community-spreading until we have data to find it otherwise.
If there is no data available for gender, we’re marking these records as “Other.” So, that way, we’re making it into that category, where the patient doesn’t want to disclose their sexual orientation.
If we don’t have any data, then by default, the application is considering the patient is alive.
Lastly, my application considers the middle point of the age range data for all the categories, i.e., the patient’s age between 20 & 30 will be considered as 25.
1. clsCovidAnalysisByCountryAdv (This is the main script, which will invoke the Machine-Learning API & return 0 if successful.)
################################################## Written By: SATYAKI DE ######## Written On: 01-Jun-2020 ######## Modified On 01-Jun-2020 ######## ######## Objective: Main scripts for Logistic ######## Regression. ##################################################importpandasaspimportclsLaslogimportdatetimeimportmatplotlib.pyplotaspltimportseabornassnsfromclsConfigimport clsConfig as cf
# %matplotlib inline -- for Jupyter NotebookclassclsCovidAnalysisByCountryAdv:
def__init__(self):
self.fileName_1 = cf.config['FILE_NAME_1']
self.fileName_2 = cf.config['FILE_NAME_2']
self.Ind = cf.config['DEBUG_IND']
self.subdir =str(cf.config['LOG_DIR_NAME'])
defsetDefaultActiveCases(self, row):
try:
str_status =str(row['case_status'])
if str_status =='Not Reported':
return'Active'else:
return str_status
except:
return'Active'defsetDefaultExposure(self, row):
try:
str_exposure =str(row['exposure'])
if str_exposure =='Not Reported':
return'Community'else:
return str_exposure
except:
return'Community'defsetGender(self, row):
try:
str_gender =str(row['gender'])
if str_gender =='Not Reported':
return'Other'else:
return str_gender
except:
return'Other'defsetSurviveStatus(self, row):
try:
# 0 - Deceased# 1 - Alive
str_active =str(row['ActiveCases'])
if str_active =='Deceased':
return0else:
return1except:
return1defgetAgeFromGroup(self, row):
try:
# We'll take the middle of the Age group# If a age range falls with 20, we'll# consider this as 10.# Similarly, a age group between 20 & 30,# should reflect by 25.# Anything above 80 will be considered as# 85
str_age_group =str(row['AgeGroup'])
if str_age_group =='<20':
return10elif str_age_group =='20-29':
return25elif str_age_group =='30-39':
return35elif str_age_group =='40-49':
return45elif str_age_group =='50-59':
return55elif str_age_group =='60-69':
return65elif str_age_group =='70-79':
return75else:
return85except:
return100defpredictResult(self):
try:
# Initiating Logging Instances
clog = log.clsL()
# Important variables
var = datetime.datetime.now().strftime(".%H.%M.%S")
print('Target File Extension will contain the following:: ', var)
Ind =self.Ind
subdir =self.subdir
######################################## ## Using Logistic Regression to ## Idenitfy the following scenarios - ## ## Age wise Infection Vs Deaths ## ########################################
inputFileName_2 =self.fileName_2
# Reading from Input File
df_2 = p.read_csv(inputFileName_2)
# Fetching only relevant columns
df_2_Mod = df_2[['date_reported','age_group','gender','exposure','case_status']]
df_2_Mod['State'] = df_2['province_abbr']
print()
print('Projecting 2nd file sample rows: ')
print(df_2_Mod.head())
print()
x_row_1 = df_2_Mod.shape[0]
x_col_1 = df_2_Mod.shape[1]
print('Total Number of Rows: ', x_row_1)
print('Total Number of columns: ', x_col_1)
########################################################################################## Few Assumptions ########################################################################################### By default, if there is no data on exposure - We'll treat that as community spreading ## By default, if there is no data on case_status - We'll consider this as active ## By default, if there is no data on gender - We'll put that under a separate Gender ## category marked as the "Other". This includes someone who doesn't want to identify ## his/her gender or wants to be part of LGBT community in a generic term. ## ## We'll transform our data accordingly based on the above logic. ##########################################################################################
df_2_Mod['ActiveCases'] = df_2_Mod.apply(lambda row: self.setDefaultActiveCases(row), axis=1)
df_2_Mod['ExposureStatus'] = df_2_Mod.apply(lambda row: self.setDefaultExposure(row), axis=1)
df_2_Mod['Gender'] = df_2_Mod.apply(lambda row: self.setGender(row), axis=1)
# Filtering all other records where we don't get any relevant information# Fetching Data for
df_3 = df_2_Mod[(df_2_Mod['age_group'] !='Not Reported')]
# Dropping unwanted columns
df_3.drop(columns=['exposure'], inplace=True)
df_3.drop(columns=['case_status'], inplace=True)
df_3.drop(columns=['date_reported'], inplace=True)
df_3.drop(columns=['gender'], inplace=True)
# Renaming one existing column
df_3.rename(columns={"age_group": "AgeGroup"}, inplace=True)
# Creating important feature# 0 - Deceased# 1 - Alive
df_3['Survived'] = df_3.apply(lambda row: self.setSurviveStatus(row), axis=1)
clog.logr('2.df_3'+ var +'.csv', Ind, df_3, subdir)
print()
print('Projecting Filter sample rows: ')
print(df_3.head())
print()
x_row_2 = df_3.shape[0]
x_col_2 = df_3.shape[1]
print('Total Number of Rows: ', x_row_2)
print('Total Number of columns: ', x_col_2)
# Let's do some basic checkings
sns.set_style('whitegrid')
#sns.countplot(x='Survived', hue='Gender', data=df_3, palette='RdBu_r')# Fixing Gender Column# This will check & indicate yellow for missing entries#sns.heatmap(df_3.isnull(), yticklabels=False, cbar=False, cmap='viridis')#sex = p.get_dummies(df_3['Gender'], drop_first=True)
sex = p.get_dummies(df_3['Gender'])
df_4 = p.concat([df_3, sex], axis=1)
print('After New addition of columns: ')
print(df_4.head())
clog.logr('3.df_4'+ var +'.csv', Ind, df_4, subdir)
# Dropping unwanted columns for our Machine Learning
df_4.drop(columns=['Gender'], inplace=True)
df_4.drop(columns=['ActiveCases'], inplace=True)
df_4.drop(columns=['Male','Other','Transgender'], inplace=True)
clog.logr('4.df_4_Mod'+ var +'.csv', Ind, df_4, subdir)
# Fixing Spread Columns
spread = p.get_dummies(df_4['ExposureStatus'], drop_first=True)
df_5 = p.concat([df_4, spread], axis=1)
print('After Spread columns:')
print(df_5.head())
clog.logr('5.df_5'+ var +'.csv', Ind, df_5, subdir)
# Dropping unwanted columns for our Machine Learning
df_5.drop(columns=['ExposureStatus'], inplace=True)
clog.logr('6.df_5_Mod'+ var +'.csv', Ind, df_5, subdir)
# Fixing Age Columns
df_5['Age'] = df_5.apply(lambda row: self.getAgeFromGroup(row), axis=1)
df_5.drop(columns=["AgeGroup"], inplace=True)
clog.logr('7.df_6'+ var +'.csv', Ind, df_5, subdir)
# Fixing Dummy Columns Name# Renaming one existing column Travel-Related with Travel_Related
df_5.rename(columns={"Travel-Related": "TravelRelated"}, inplace=True)
clog.logr('8.df_7'+ var +'.csv', Ind, df_5, subdir)
# Removing state for temporary basis
df_5.drop(columns=['State'], inplace=True)
# df_5.drop(columns=['State','Other','Transgender','Pending','TravelRelated','Male'], inplace=True)# Casting this entire dataframe into Integer# df_5_temp.apply(p.to_numeric)print('Info::')
print(df_5.info())
print("*"*60)
print(df_5.describe())
print("*"*60)
clog.logr('9.df_8'+ var +'.csv', Ind, df_5, subdir)
print('Intermediate Sample Dataframe for Age::')
print(df_5.head())
# Plotting it to Graphsns.jointplot(x="Age", y='Survived', data=df_5)
sns.jointplot(x="Age", y='Survived', data=df_5, kind='kde', color='red')
plt.xlabel("Age")
plt.ylabel("Data Point (0 - Died Vs 1 - Alive)")# Another check with Age Group
sns.countplot(x='Survived', hue='Age', data=df_5, palette='RdBu_r')
plt.xlabel("Survived(0 - Died Vs 1 - Alive)")
plt.ylabel("Total No Of Patient")
df_6 = df_5.drop(columns=['Survived'], axis=1)
clog.logr('10.df_9'+ var +'.csv', Ind, df_6, subdir)
# Train & Split Data
x_1 = df_6
y_1 = df_5['Survived']
# Now Train-Test Split of your source datafromsklearn.model_selectionimport train_test_split
# test_size => % of allocated data for your test cases# random_state => A specific set of random split on your data
X_train_1, X_test_1, Y_train_1, Y_test_1 = train_test_split(x_1, y_1, test_size=0.3, random_state=101)
# Importing Modelfromsklearn.linear_modelimport LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train_1, Y_train_1)
# Adding Predictions to it
predictions_1 = logmodel.predict(X_test_1)
fromsklearn.metricsimport classification_report
print('Classification Report:: ')
print(classification_report(Y_test_1, predictions_1))
fromsklearn.metricsimport confusion_matrix
print('Confusion Matrix:: ')
print(confusion_matrix(Y_test_1, predictions_1))
# This is require when you are trying to print from conventional# front & not using Jupyter notebook.
plt.show()
return0exceptExceptionas e:
x =str(e)
print('Error : ', x)
return1
Key snippets from the above script –
df_2_Mod['ActiveCases'] = df_2_Mod.apply(lambda row: self.setDefaultActiveCases(row), axis=1)df_2_Mod['ExposureStatus'] = df_2_Mod.apply(lambda row: self.setDefaultExposure(row), axis=1)df_2_Mod['Gender'] = df_2_Mod.apply(lambda row: self.setGender(row), axis=1)# Filtering all other records where we don't get any relevant information# Fetching Data fordf_3 = df_2_Mod[(df_2_Mod['age_group'] != 'Not Reported')]# Dropping unwanted columnsdf_3.drop(columns=['exposure'], inplace=True)df_3.drop(columns=['case_status'], inplace=True)df_3.drop(columns=['date_reported'], inplace=True)df_3.drop(columns=['gender'], inplace=True)# Renaming one existing columndf_3.rename(columns={"age_group": "AgeGroup"}, inplace=True)# Creating important feature# 0 - Deceased# 1 - Alivedf_3['Survived'] = df_3.apply(lambda row: self.setSurviveStatus(row), axis=1)
The above lines point to the critical transformation areas, where the application is invoking various essential business logic.
The above lines will transform the data into this –
As you can see, we’ve transformed the row values into columns with binary values. This kind of transformation is beneficial.
# Plotting it to Graphsns.jointplot(x="Age", y='Survived', data=df_5)sns.jointplot(x="Age", y='Survived', data=df_5, kind='kde', color='red')plt.xlabel("Age")plt.ylabel("Data Point (0 - Died Vs 1 - Alive)")# Another check with Age Groupsns.countplot(x='Survived', hue='Age', data=df_5, palette='RdBu_r')plt.xlabel("Survived(0 - Died Vs 1 - Alive)")plt.ylabel("Total No Of Patient")
The above lines will process the data & visualize based on that.
x_1 = df_6y_1 = df_5['Survived']
In the above snippet, we’ve assigned the features & target variable for our final logistic regression model.
# Now Train-Test Split of your source datafrom sklearn.model_selection import train_test_split# test_size => % of allocated data for your test cases# random_state => A specific set of random split on your dataX_train_1, X_test_1, Y_train_1, Y_test_1 = train_test_split(x_1, y_1, test_size=0.3, random_state=101)# Importing Modelfrom sklearn.linear_model import LogisticRegressionlogmodel = LogisticRegression()logmodel.fit(X_train_1, Y_train_1)
In the above snippet, we’re splitting the primary data & create a set of test & train data. Once we have the collection, the application will put the logistic regression model. And, finally, we’ll fit the training data.
The above lines, finally use the model & then we feed our test data.
Let’s see how it runs –
And, here is the log directory –
For better understanding, I’m just clubbing both the diagram at one place & the final outcome is showing as follows –
So, from the above picture, we can see that the maximum vulnerable patients are patients who are 80+. The next two categories that also suffered are 70+ & 60+.
Also, We’ve checked the Female Vs. Male in the following code –
sns.countplot(x='Survived', hue='Female', data=df_5, palette='RdBu_r')plt.xlabel("Survived(0 - Died Vs 1 - Alive)")plt.ylabel("Female Vs Male (Including Other Genders)")
And, the analysis represents through this –
In this case, you have to consider that the Male part includes all the other genders apart from the actual Male. Hence, I believe death for females would be more compared to people who identified themselves as males.
So, finally, we’ve done it.
During this challenging time, I would request you to follow strict health guidelines & stay healthy.
N.B.: All the data that are used here can be found in the public domain. We use this solely for educational purposes. You can find the details here.
Today, We’ll be exploring the potential business growth factor using the “Linear-Regression Machine Learning” model. We’ve prepared a set of dummy data & based on that, we’ll predict.
Let’s explore a few sample data –
So, based on these data, we would like to predict YearlyAmountSpent dependent on any one of the following features, i.e. [ Time On App / Time On Website / Flipkart Membership Duration (In Year) ].
You need to install the following packages –
pip install pandas
pip install matplotlib
pip install sklearn
We’ll be discussing only the main calling script & class script. However, we’ll be posting the parameters without discussing it. And, we won’t discuss clsL.py as we’ve already discussed that in our previous post.
1. clsConfig.py (This script contains all the parameter details.)
#################################################### Written By: SATYAKI DE ######## Written On: 15-May-2020 ######## ######## Objective: This script is a config ######## file, contains all the keys for ######## Machine-Learning. Application will ######## process these information & perform ######## various analysis on Linear-Regression. ####################################################importosimportplatformasplclassclsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))
os_det = pl.system()
if os_det =="Windows":
sep ='\\'else:
sep ='/'
config = {
'APP_ID': 1,
'ARCH_DIR': Curr_Path + sep +'arch'+ sep,
'PROFILE_PATH': Curr_Path + sep +'profile'+ sep,
'LOG_PATH': Curr_Path + sep +'log'+ sep,
'REPORT_PATH': Curr_Path + sep +'report',
'FILE_NAME': Curr_Path + sep +'Data'+ sep +'FlipkartCustomers.csv',
'SRC_PATH': Curr_Path + sep +'Data'+ sep,
'APP_DESC_1': 'IBM Watson Language Understand!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path
}
2. clsLinearRegression.py (This is the main script, which will invoke the Machine-Learning API & return 0 if successful.)
################################################## Written By: SATYAKI DE ######## Written On: 15-May-2020 ######## Modified On 15-May-2020 ######## ######## Objective: Main scripts for Linear ######## Regression. ##################################################importpandasaspimportnumpyasnpimportregexasreimportmatplotlib.pyplotaspltfromclsConfigimport clsConfig as cf
# %matplotlib inline -- for Jupyter NotebookclassclsLinearRegression:
def__init__(self):
self.fileName = cf.config['FILE_NAME']
defpredictResult(self):
try:
inputFileName =self.fileName
# Reading from Input File
df = p.read_csv(inputFileName)
print()
print('Projecting sample rows: ')
print(df.head())
print()
x_row = df.shape[0]
x_col = df.shape[1]
print('Total Number of Rows: ', x_row)
print('Total Number of columns: ', x_col)
# Adding Features
x = df[['TimeOnApp', 'TimeOnWebsite', 'FlipkartMembershipInYear']]
# Target Variable - Trying to predict
y = df['YearlyAmountSpent']
# Now Train-Test Split of your source datafromsklearn.model_selectionimport train_test_split
# test_size => % of allocated data for your test cases# random_state => A specific set of random split on your data
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)
# Importing Modelfromsklearn.linear_modelimport LinearRegression
# Creating an Instance
lm = LinearRegression()
# Train or Fit my model on Training Data
lm.fit(X_train, Y_train)
# Creating a prediction value
flipKartSalePrediction = lm.predict(X_test)
# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)
# Adding meaningful Label
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
# Checking Individual Metricsfromsklearnimport metrics
print()
mea_val = metrics.mean_absolute_error(Y_test, flipKartSalePrediction)
print('Mean Absolute Error (MEA): ', mea_val)
mse_val = metrics.mean_squared_error(Y_test, flipKartSalePrediction)
print('Mean Square Error (MSE): ', mse_val)
rmse_val = np.sqrt(metrics.mean_squared_error(Y_test, flipKartSalePrediction))
print('Square root Mean Square Error (RMSE): ', rmse_val)
print()
# Check Variance Score - R^2 Valueprint('Variance Score:')
var_score =str(round(metrics.explained_variance_score(Y_test, flipKartSalePrediction) *100, 2)).strip()
print('Our Model is', var_score, '% accurate. ')
print()
# Finding Coeficent on X_train.columnsprint()
print('Finding Coeficent: ')
cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])
print('Printing the All the Factors: ')
print(cedf)
print()
# Getting the Max Value from it
cedf['MaxFactorForBusiness'] = cedf['Coefficient'].max()
# Filtering the max Value to identify the biggest Business factor
dfMax = cedf[(cedf['MaxFactorForBusiness'] == cedf['Coefficient'])]
# Dropping the derived column
dfMax.drop(columns=['MaxFactorForBusiness'], inplace=True)
dfMax = dfMax.reset_index()
print(dfMax)
# Extracting Actual Business Factor from Pandas dataframe
str_factor_temp =str(dfMax.iloc[0]['index'])
str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)
str_value =str(round(float(dfMax.iloc[0]['Coefficient']),2))
print()
print('*'*80)
print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')
print('*'*80)
print()
# This is require when you are trying to print from conventional# front & not using Jupyter notebook.
plt.show()
return0exceptExceptionas e:
x =str(e)
print('Error : ', x)
return1
Our application creating a subset of the main datagram, which contains all the features.
# Target Variable - Trying to predicty = df['YearlyAmountSpent']
Now, the application is setting the target variable into ‘Y.’
# Now Train-Test Split of your source datafrom sklearn.model_selection import train_test_split# test_size => % of allocated data for your test cases# random_state => A specific set of random split on your dataX_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)
As per “Supervised Learning,” our application is splitting the dataset into two subsets. One is to train the model & another segment is to test your final model. However, you can divide the data into three sets that include the performance statistics for a large dataset. In our case, we don’t need that as this data is significantly less.
# Train or Fit my model on Training Datalm.fit(X_train, Y_train)
Our application is now training/fit the data into the model.
# Creating a scatter plot based on Actual Value & Predicted Valueplt.scatter(Y_test, flipKartSalePrediction)
Our application projected the outcome based on the predicted data in a scatterplot graph.
Also, the following concepts captured by using our program. For more details, I’ve provided the external link for your reference –
Finally, extracting the coefficient to find out, which particular feature will lead Flikkart for better sale & growth by taking the maximum of coefficient value month the all features are as shown below –
cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])# Getting the Max Value from itcedf['MaxFactorForBusiness'] = cedf['Coefficient'].max()# Filtering the max Value to identify the biggest Business factordfMax = cedf[(cedf['MaxFactorForBusiness'] == cedf['Coefficient'])]# Dropping the derived columndfMax.drop(columns=['MaxFactorForBusiness'], inplace=True)dfMax = dfMax.reset_index()
Note that we’ve used a regular expression to split the camel-case column name from our feature & represent that with a much more meaningful name without changing the column name.
# Extracting Actual Business Factor from Pandas dataframestr_factor_temp = str(dfMax.iloc[0]['index'])str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)str_value = str(round(float(dfMax.iloc[0]['Coefficient']),2))print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')
3. callLinear.py (This is the first calling script.)
################################################## Written By: SATYAKI DE ######## Written On: 15-May-2020 ######## Modified On 15-May-2020 ######## ######## Objective: Main calling scripts. ##################################################fromclsConfigimport clsConfig as cf
importclsLasclimportloggingimportdatetimeimportclsLinearRegressionascw# Disbling Warningdefwarn(*args, **kwargs):
passimportwarnings
warnings.warn = warn
# Lookup functions from# Azure cloud SQL DB
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
defmain():
try:
ret_1 =0
general_log_path =str(cf.config['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path +'MachineLearning_LinearRegression.log', level=logging.INFO)
# Initiating Log Class
l = cl.clsL()
# Moving previous day log files to archive directory
log_dir = cf.config['LOG_PATH']
curr_ver =datetime.datetime.now().strftime("%Y-%m-%d")
tmpR0 ="*"*157
logging.info(tmpR0)
tmpR9 ='Start Time: '+str(var)
logging.info(tmpR9)
logging.info(tmpR0)
print("Log Directory::", log_dir)
tmpR1 ='Log Directory::'+ log_dir
logging.info(tmpR1)
print('Machine Learning - Linear Regression Prediction : ')
print('-'*200)
# Create the instance of the Linear-Regression Class
x2 = cw.clsLinearRegression()
ret = x2.predictResult()
if ret ==0:
print('Successful Linear-Regression Prediction Generated!')
else:
print('Failed to generate Linear-Regression Prediction!')
print("-"*200)
print()
print('Finding Analysis points..')
print("*"*200)
logging.info('Finding Analysis points..')
logging.info(tmpR0)
tmpR10 ='End Time: '+str(var)
logging.info(tmpR10)
logging.info(tmpR0)
exceptValueErroras e:
print(str(e))
logging.info(str(e))
exceptExceptionas e:
print("Top level Error: args:{0}, message{1}".format(e.args, e.message))
if __name__ =="__main__":
main()
Key snippet from the above script –
# Create the instance of the Linear-Regressionx2 = cw.clsLinearRegression()ret = x2.predictResult()
In the above snippet, our application initially creating an instance of the main class & finally invokes the “predictResult” method.
Let’s run our application –
Step 1:
First, the application will fetch the following sample rows from our source file – if it is successful.
Step 2:
Then, It will create the following scatterplot by executing the following snippet –
# Creating a scatter plot based on Actual Value & Predicted Valueplt.scatter(Y_test, flipKartSalePrediction)
Note that our model is pretty accurate & it has a balanced success rate compared to our predicted numbers.
Step 3:
Finally, it is successfully able to project the critical feature are shown below –
From the above picture, you can see that our model is pretty accurate (89% approx).
Also, highlighted red square identifying the key-features & their confidence score & finally, the projecting the winner feature marked in green.
So, as per that, we’ve come to one conclusion that Flipkart’s business growth depends on the tenure of their subscriber, i.e., old members are prone to buy more than newer members.
Let’s look into our directory structure –
So, we’ve done it.
I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀
Note: All the data posted here are representational data & available over the internet & for educational purpose only.
Today, I’ll be discussing the following topic – “How to analyze text using IBM Watson implementing through Python.”
IBM has significantly improved in the field of Visual Image Analysis or Text language analysis using its IBM Watson cloud platform. In this particular topic, we’ll be exploring the natural languages only.
To access IBM API, we need to first create an IBM Cloud account from this site.
Let us quickly go through the steps to create the IBM Language Understanding service. Click the Catalog on top of your browser menu as shown in the below picture –
After that, click the AI option on your left-hand side of the panel marked in RED.
Click the Watson-Studio & later choose the plan. In our case, We’ll select the “Lite” option as IBM provided this platform for all the developers to explore their cloud for free.
Clicking the create option will lead to a blank page of Watson Studio as shown below –
And, now, we need to click the Get Started button to launch it. This will lead to Create Project page, which can be done using the following steps –
Now, clicking the create a project will lead you to the next screen –
You can choose either an empty project, or you can create it from a sample file. In this case, we’ll be selecting the first option & this will lead us to the below page –
And, then you will click the “Create” option, which will lead you to the next screen –
Now, you need to click “Add to Project.” This will give you a variety of services that you want to explore/use from the list. If you want to create your own natural language classifier, which you can do that as follows –
Once, you click it – you need to select the associate service –
Here, you need to click the hyperlink, which prompts to the next screen –
You need to check the price for both the Visual & Natural Language Classifier. They are pretty expensive. The visual classifier has the Lite plan. However, it has limitations of output.
Clicking the “Create” will prompt to the next screen –
After successful creation, you will be redirected to the following page –
Now, We’ll be adding our “Natural Language Understand” for our test –
This will prompt the next screen –
Once, it is successful. You will see the service registered as shown below –
If you click the service marked in RED, it will lead you to another page, where you will get the API Key & Url. You need both of this information in Python application to access this API as shown below –
Now, we’re ready with the necessary cloud set-up. After this, we need to install the Python package for IBM Cloud as shown below –
We’ve noticed that, recently, IBM has launched one upgraded package. Hence, we installed that one as well. I would recommend you to install this second package directly instead of the first one shown above –
Now, we’re done with our set-up.
Let’s see the directory structure –
We’ll be discussing only the main calling script & class script. However, we’ll be posting the parameters without discussing it. And, we won’t discuss clsL.py as we’ve already discussed that in our previous post.
1. clsConfig.py (This script contains all the parameter details.)
################################################## Written By: SATYAKI DE ######## Written On: 04-Apr-2020 ######## ######## Objective: This script is a config ######## file, contains all the keys for ######## IBM Cloud API. Application will ######## process these information & perform ######## various analysis on IBM Watson cloud.##################################################importosimportplatformasplclassclsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))
os_det = pl.system()
if os_det =="Windows":
sep ='\\'else:
sep ='/'
config = {
'APP_ID': 1,
'SERVICE_URL': "https://api.eu-gb.natural-language-understanding.watson.cloud.ibm.com/instances/xxxxxxxxxxxxxxXXXXXXXXXXxxxxxxxxxxxxxxxx",
'API_KEY': "Xxxxxxxxxxxxxkdkdfifd984djddkkdkdkdsSSdkdkdd",
'API_TYPE': "application/json",
'CACHE': "no-cache",
'CON': "keep-alive",
'ARCH_DIR': Curr_Path + sep +'arch'+ sep,
'PROFILE_PATH': Curr_Path + sep +'profile'+ sep,
'LOG_PATH': Curr_Path + sep +'log'+ sep,
'REPORT_PATH': Curr_Path + sep +'report',
'SRC_PATH': Curr_Path + sep +'Src_File'+ sep,
'APP_DESC_1': 'IBM Watson Language Understand!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path
}
Note that you will be placing your API_KEY & URL here, as shown in the configuration file.
2. clsIBMWatson.py (This is the main script, which will invoke the IBM Watson API based on the input from the user & return 0 if successful.)
################################################## Written By: SATYAKI DE ######## Written On: 04-Apr-2020 ######## Modified On 04-Apr-2020 ######## ######## Objective: Main scripts to invoke ######## IBM Watson Language Understand API. ##################################################importloggingfromclsConfigimport clsConfig as cf
importclsLasclimportjsonfromibm_watsonimport NaturalLanguageUnderstandingV1
fromibm_cloud_sdk_core.authenticatorsimport IAMAuthenticator
fromibm_watson.natural_language_understanding_v1import Features, EntitiesOptions, KeywordsOptions, SentimentOptions, CategoriesOptions, ConceptsOptions
fromibm_watsonimport ApiException
classclsIBMWatson:
def__init__(self):
self.api_key = cf.config['API_KEY']
self.service_url = cf.config['SERVICE_URL']
defcalculateExpressionFromUrl(self, inputUrl, inputVersion):
try:
api_key =self.api_key
service_url =self.service_url
print('-'*60)
print('Beginning of the IBM Watson for Input Url.')
print('-'*60)
authenticator = IAMAuthenticator(api_key)
# Authentication via service credentials provided in our config files
service = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)
service.set_service_url(service_url)
response = service.analyze(
url=inputUrl,
features=Features(entities=EntitiesOptions(),
sentiment=SentimentOptions(),
concepts=ConceptsOptions())).get_result()
print(json.dumps(response, indent=2))
return0except ApiException as ex:
print('-'*60)
print("Method failed for Url with status code "+str(ex.code) +": "+ ex.message)
print('-'*60)
return1defcalculateExpressionFromText(self, inputText, inputVersion):
try:
api_key =self.api_key
service_url =self.service_url
print('-'*60)
print('Beginning of the IBM Watson for Input Url.')
print('-'*60)
authenticator = IAMAuthenticator(api_key)
# Authentication via service credentials provided in our config files
service = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)
service.set_service_url(service_url)
response = service.analyze(
text=inputText,
features=Features(entities=EntitiesOptions(),
sentiment=SentimentOptions(),
concepts=ConceptsOptions())).get_result()
print(json.dumps(response, indent=2))
return0except ApiException as ex:
print('-'*60)
print("Method failed for Url with status code "+str(ex.code) +": "+ ex.message)
print('-'*60)
return1
Some of the key lines from the above snippet –
authenticator = IAMAuthenticator(api_key)# Authentication via service credentials provided in our config filesservice = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)service.set_service_url(service_url)
By providing the API Key & Url, the application is initiating the service for Watson.
Based on your type of input, it will bring the features of entities, sentiment & concepts here. Apart from that, you can additionally check the following features as well – Keywords & Categories.
3. callIBMWatsonAPI.py (This is the first calling script. Based on user choice, it will receive input either as Url or as the plain text & then analyze it.)
################################################## Written By: SATYAKI DE ######## Written On: 04-Apr-2020 ######## Modified On 04-Apr-2020 ######## ######## Objective: Main calling scripts. ##################################################fromclsConfigimport clsConfig as cf
importclsLasclimportloggingimportdatetimeimportclsIBMWatsonascw# Disbling Warningdefwarn(*args, **kwargs):
passimportwarnings
warnings.warn = warn
# Lookup functions from# Azure cloud SQL DB
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
defmain():
try:
ret_1 =0
general_log_path =str(cf.config['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path +'IBMWatson_NaturalLanguageAnalysis.log', level=logging.INFO)
# Initiating Log Class
l = cl.clsL()
# Moving previous day log files to archive directory
log_dir = cf.config['LOG_PATH']
curr_ver =datetime.datetime.now().strftime("%Y-%m-%d")
tmpR0 ="*"*157
logging.info(tmpR0)
tmpR9 ='Start Time: '+str(var)
logging.info(tmpR9)
logging.info(tmpR0)
print("Log Directory::", log_dir)
tmpR1 ='Log Directory::'+ log_dir
logging.info(tmpR1)
print('Welcome to IBM Wantson Language Understanding Calling Program: ')
print('-'*60)
print('Please Press 1 for Understand the language from Url.')
print('Please Press 2 for Understand the language from your input-text.')
input_choice =int(input('Please provide your choice:'))
# Create the instance of the IBM Watson Class
x2 = cw.clsIBMWatson()
# Let's pass this to our map sectionif input_choice ==1:
textUrl =str(input('Please provide the complete input url:'))
ret_1 = x2.calculateExpressionFromUrl(textUrl, curr_ver)
elif input_choice ==2:
inputText =str(input('Please provide the input text:'))
ret_1 = x2.calculateExpressionFromText(inputText, curr_ver)
else:
print('Invalid options!')
if ret_1 ==0:
print('Successful IBM Watson Language Understanding Generated!')
else:
print('Failed to generate IBM Watson Language Understanding!')
print("-"*60)
print()
print('Finding Analysis points..')
print("*"*157)
logging.info('Finding Analysis points..')
logging.info(tmpR0)
tmpR10 ='End Time: '+str(var)
logging.info(tmpR10)
logging.info(tmpR0)
exceptValueErroras e:
print(str(e))
print("Invalid option!")
logging.info("Invalid option!")
exceptExceptionas e:
print("Top level Error: args:{0}, message{1}".format(e.args, e.message))
if __name__ =="__main__":
main()
This script is pretty straight forward as it is first creating an instance of the main class & then based on the user input, it is calling the respective functions here.
As of now, IBM Watson can work on a list of languages, which are available here.
If you want to start from scratch, please refer to the following link.
Please find the screenshot of our application run –
Case 1 (With Url):
Case 2 (With Plain text):
Now, Don’t forget to delete all the services from your IBM Cloud.
As you can see, from the service, you need to delete all the services one-by-one as shown in the figure.
So, we’ve done it.
To explore my photography, you can visit the following link.
I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀
Note: All the data posted here are representational data & available over the internet & for educational purpose only.
Today, We’ll be discussing one more graphical package in Python, which is also known as PyQt. To faster design the GUI, we’ll be exploring another tool called Qt Designer, which is available for multiple OS platforms.
This is similar to any other GUI based IDE like Microsoft Visual Studio, where you can quickly generate your GUI template.
The majority of the internet post talks about using PyQt5 or PyQt4 packages. But, when speaking about using the .ui file inside your Python code – they either demonstrate fundamental options without any event or, they convert & generate the .ui file into .py file & then they use it. This certainly not making it very useful for many of the developers who are trying to use it for the first time. Hence, My main goal is to use the .ui file inside my Python script as it is & use all the components out of it & assign various working events.
In this post, we’ll discuss only with one script & then we’ll showcase the output in the form of video (No audio). You can verify the output for both MAC & Windows.
Before we start, let us check the directory structure between Windows & MAC –
Let us explore how the GUI should look like ->
So, as you can see that this tool is like any other GUI based tool, basically you can create anything by simply drag & drop method.
Before we start discussing our code, here is the sample basicAdv.ui file for your reference.
You need to install the following framework –
pip install PyQt5
1. GUIPyQt5.py (This script contains all the GUI details & it will invoke the instance along with the logic.)
################################################## Written By: SATYAKI DE ######## Written On: 12-Mar-2020 ######## Modified On 12-Mar-2020 ######## ######## Objective: Main calling scripts. ##################################################fromPyQt5import QtWidgets, uic, QtGui, QtCore
fromPyQt5.QtWidgetsimport*importsysclassUi(QtWidgets.QMainWindow):
def__init__(self):
# Instantiating the main classsuper(Ui, self).__init__()
# Loading the Graphical Design without# converting it to any kind of Python code
uic.loadUi('basicAdv.ui', self)
# Adding all the essential buttonsself.prtBtn =self.findChild(QtWidgets.QPushButton, 'prtBtn') # Find the buttonself.prtBtn.clicked.connect(self.printButtonClick) # Remember to pass the definition/method, not the return value!self.clrBtn =self.findChild(QtWidgets.QPushButton, 'clrBtn') # Find the buttonself.clrBtn.clicked.connect(self.clearButtonClick) # Remember to pass the definition/method, not the return value!self.addBtn =self.findChild(QtWidgets.QPushButton, 'addBtn') # Find the buttonself.addBtn.clicked.connect(self.addItem) # Remember to pass the definition/method, not the return value!self.selectImgBtn =self.findChild(QtWidgets.QPushButton, 'selectImgBtn') # Find the buttonself.selectImgBtn.clicked.connect(self.setImage) # Remember to pass the definition/method, not the return value!self.cnfBtn =self.findChild(QtWidgets.QPushButton, 'cnfBtn') # Find the buttonself.cnfBtn.clicked.connect(self.showDialog) # Remember to pass the definition/method, not the return value!# Adding other static input/output elementsself.input =self.findChild(QtWidgets.QLineEdit, 'input')
self.qlabel =self.findChild(QtWidgets.QLabel, 'qlabel')
self.lineEdit =self.findChild(QtWidgets.QLineEdit, 'lineEdit')
self.listWidget =self.findChild(QtWidgets.QListWidget, 'listWidget')
self.imageLbl =self.findChild(QtWidgets.QLabel, 'imageLbl')
# Adding Comboboxself.combo =self.findChild(QtWidgets.QComboBox, 'sComboBox') # Find the ComboBox# Adding static element to itself.combo.addItem("Sourav Ganguly")
self.combo.addItem("Kapil Dev")
self.combo.addItem("Sunil Gavaskar")
self.combo.addItem("M. S. Dhoni")
# Click Eventself.combo.activated[str].connect(self.onChanged) # Remember to pass the definition/method, not the return value!# Adding list Boxself.listwidget2 =self.findChild(QtWidgets.QListWidget, 'listwidget2') # Find the List# Adding static element to itself.listwidget2.insertItem(0, "Aamir Khan")
self.listwidget2.insertItem(1, "Shahruk Khan")
self.listwidget2.insertItem(2, "Salman Khan")
self.listwidget2.insertItem(3, "Hrittik Roshon")
self.listwidget2.insertItem(4, "Amitabh Bachhan")
# Click Eventself.listwidget2.clicked.connect(self.showIndividualElement)
# Adding Group Boxself.groupBox =self.findChild(QtWidgets.QGroupBox, 'groupBox') # Find the ComboBoxself.groupBox.setCheckable(True)
# Adding Individual Radio Buttonself.rdButton1 =self.findChild(QtWidgets.QRadioButton, 'rdButton1') # Find the buttonself.rdButton1.setChecked(True)
self.rdButton1.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton1)) # Remember to pass the definition/method, not the return value!self.rdButton2 =self.findChild(QtWidgets.QRadioButton, 'rdButton2') # Find the buttonself.rdButton2.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton2)) # Remember to pass the definition/method, not the return value!self.rdButton3 =self.findChild(QtWidgets.QRadioButton, 'rdButton3') # Find the buttonself.rdButton3.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton3)) # Remember to pass the definition/method, not the return value!self.rdButton4 =self.findChild(QtWidgets.QRadioButton, 'rdButton4') # Find the buttonself.rdButton4.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton4)) # Remember to pass the definition/method, not the return value!self.show()
defprintRadioButtonClick(self, radioOption):
if radioOption.text() =='China':
if radioOption.isChecked() ==True:
print(radioOption.text() +' is selected')
else:
print(radioOption.text() +' is deselected')
if radioOption.text() =='India':
if radioOption.isChecked() ==True:
print(radioOption.text() +' is selected')
else:
print(radioOption.text() +' is deselected')
if radioOption.text() =='Japan':
if radioOption.isChecked() ==True:
print(radioOption.text() +' is selected')
else:
print(radioOption.text() +' is deselected')
if radioOption.text() =='France':
if radioOption.isChecked() ==True:
print(radioOption.text() +' is selected')
else:
print(radioOption.text() +' is deselected')
defprintButtonClick(self):
# This is executed when the button is pressedprint('Input text:'+self.input.text())
defclearButtonClick(self):
# This is executed when the button is pressedself.input.clear()
defonChanged(self, text):
self.qlabel.setText(text)
self.qlabel.adjustSize()
self.lineEdit.clear() # Clear the textdefaddItem(self):
value =self.lineEdit.text() # Get the value of the lineEditself.lineEdit.clear() # Clear the textself.listWidget.addItem(value) # Add the value we got to the listdefsetImage(self):
fileName, _ = QtWidgets.QFileDialog.getOpenFileName(None, "Select Image", "", "Image Files (*.png *.jpg *jpeg *.bmp);;All Files (*)") # Ask for fileif fileName: # If the user gives a file
pixmap = QtGui.QPixmap(fileName) # Setup pixmap with the provided image
pixmap = pixmap.scaled(self.imageLbl.width(), self.imageLbl.height(), QtCore.Qt.KeepAspectRatio) # Scale pixmapself.imageLbl.setPixmap(pixmap) # Set the pixmap onto the labelself.imageLbl.setAlignment(QtCore.Qt.AlignCenter) # Align the label to centerdefshowDialog(self):
msgBox = QMessageBox()
msgBox.setIcon(QMessageBox.Information)
msgBox.setText("Message box pop up window")
msgBox.setWindowTitle("MessageBox Example")
msgBox.setStandardButtons(QMessageBox.Ok | QMessageBox.Cancel)
msgBox.buttonClicked.connect(self.msgButtonClick)
returnValue = msgBox.exec()
if returnValue == QMessageBox.Ok:
print('OK clicked')
defmsgButtonClick(self, i):
print("Button clicked is:", i.text())
defshowIndividualElement(self, qmodelindex):
item =self.listwidget2.currentItem()
print(item.text())
if __name__ =="__main__":
importsys
app = QtWidgets.QApplication(sys.argv)
window = Ui()
window.show()
sys.exit(app.exec_())
Let us explore a few key lines from this script. Rests are almost identical.
# Loading the Graphical Design without# converting it to any kind of Python codeuic.loadUi('basicAdv.ui', self)
Loading the GUI created using Qt Designer into the Python environment.
# Adding all the essential buttonsself.prtBtn = self.findChild(QtWidgets.QPushButton, 'prtBtn') # Find the buttonself.prtBtn.clicked.connect(self.printButtonClick) # Remember to pass the definition/method, not the return value!
In this case, we’re dynamically binding the component from the GUI by using the findChild method & then on the next line, we’re invoking the appropriate event associated with that. In this case, it is – self.printButtonClick.
The printButtonClick as mentioned earlier is a method & that contains the following snippet –
def printButtonClick(self): # This is executed when the button is pressed print('Input text:' + self.input.text())
As you can see, this event will capture the text from the input textbox & print it on our terminal.
Here is the snippet for those widgets, which is part of only input/output & they generally don’t have an event of their own. But, we need to bind them with our Python application.
Group Box, along with the radio button, works slightly different than our drop-down list.
For each radio button, we’ll have a dedicated text value that represents a different country in this context.
And, our application will bind all the radio button & then they will use one standard method for all of these four options as shown below –
# Adding Individual Radio Buttonself.rdButton1 = self.findChild(QtWidgets.QRadioButton, 'rdButton1') # Find the buttonself.rdButton1.setChecked(True)self.rdButton1.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton1)) # Remember to pass the definition/method, not the return value!self.rdButton2 = self.findChild(QtWidgets.QRadioButton, 'rdButton2') # Find the buttonself.rdButton2.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton2)) # Remember to pass the definition/method, not the return value!self.rdButton3 = self.findChild(QtWidgets.QRadioButton, 'rdButton3') # Find the buttonself.rdButton3.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton3)) # Remember to pass the definition/method, not the return value!self.rdButton4 = self.findChild(QtWidgets.QRadioButton, 'rdButton4') # Find the buttonself.rdButton4.toggled.connect(lambda: self.printRadioButtonClick(self.rdButton4)) # Remember to pass the definition/method, not the return value!
Also, note that, by default, rdButton1 is set to True i.e., it will be selected when the form load initially.
Let’s explore the printRadioButtonClick event.
def printRadioButtonClick(self, radioOption): if radioOption.text() == 'China': if radioOption.isChecked() == True: print(radioOption.text() + ' is selected') else: print(radioOption.text() + ' is deselected') if radioOption.text() == 'India': if radioOption.isChecked() == True: print(radioOption.text() + ' is selected') else: print(radioOption.text() + ' is deselected') if radioOption.text() == 'Japan': if radioOption.isChecked() == True: print(radioOption.text() + ' is selected') else: print(radioOption.text() + ' is deselected') if radioOption.text() == 'France': if radioOption.isChecked() == True: print(radioOption.text() + ' is selected') else: print(radioOption.text() + ' is deselected')
This will capture the radio button option & based on the currently clicked button, it will fetch the text out of it. Finally, that will match with the logic here & based on that, our application will display the output.
Finally, the Image process is slightly different.
Initially, our application will load the component from the .ui file & bind them with the Python environment –
Image load option will only work when the user clicks the button that triggers the following sets of actions –
self.selectImgBtn = self.findChild(QtWidgets.QPushButton, 'selectImgBtn') # Find the buttonself.selectImgBtn.clicked.connect(self.setImage) # Remember to pass the definition/method, not the return value!
Let’s explore the setImage method –
def setImage(self): fileName, _ = QtWidgets.QFileDialog.getOpenFileName(None, "Select Image", "", "Image Files (*.png *.jpg *jpeg *.bmp);;All Files (*)") # Ask for file if fileName: # If the user gives a file pixmap = QtGui.QPixmap(fileName) # Setup pixmap with the provided image pixmap = pixmap.scaled(self.imageLbl.width(), self.imageLbl.height(), QtCore.Qt.KeepAspectRatio) # Scale pixmap self.imageLbl.setPixmap(pixmap) # Set the pixmap onto the label self.imageLbl.setAlignment(QtCore.Qt.AlignCenter) # Align the label to center
This will prompt the corresponding dialogue box for choosing the right images out of the respective O/S.
Last but not least, the use of MsgBox, which can be extremely useful for many GUI based programming.
This msgbox doesn’t exist in the form. However, we’re creating it on the event of the “Confirm Button” as shown below –
self.cnfBtn = self.findChild(QtWidgets.QPushButton, 'cnfBtn') # Find the buttonself.cnfBtn.clicked.connect(self.showDialog) # Remember to pass the definition/method, not the return value!
This will prompt the showDialog method to trigger –
def showDialog(self): msgBox = QMessageBox() msgBox.setIcon(QMessageBox.Information) msgBox.setText("Message box pop up window") msgBox.setWindowTitle("MessageBox Example") msgBox.setStandardButtons(QMessageBox.Ok | QMessageBox.Cancel) msgBox.buttonClicked.connect(self.msgButtonClick) returnValue = msgBox.exec() if returnValue == QMessageBox.Ok: print('OK clicked')
And, based on your options (“OK”/”Cancel”), it will prompt the final captured message in your console.
Let’s explore the videos of output from Windows O/S –
Let’s explore the video output from MAC VM –
For more information on this package – please check the following link.
So, as you can see, finally we’ve achieved it. We’ve demonstrated cross-platform GUI applications using native Python. And, here we didn’t even convert the ui design file to python script either.
Please share your feedback.
I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀
Note: All the data posted here are representational data & available over the internet & for educational purpose only.
Today, I’ll be presenting a different kind of post here. I’ll be trying to predict health issues for senior citizens based on “realtime weather data” by blending open-source population data using some mock risk factor calculation. At the end of the post, I’ll be plotting these numbers into some graphs for better understanding.
Let’s drive!
For this first, we need realtime weather data. To do that, we need to subscribe to the data from OpenWeather API. For that, you have to register as a developer & you’ll receive a similar email from them once they have approved –
So, from the above picture, you can see that, you’ll be provided one API key & also offered a couple of useful API documentation. I would recommend exploring all the links before you try to use it.
You can also view your API key once you logged into their console. You can also create multiple API keys & the screen should look something like this –
For security reasons, I’ll be hiding my own keys & the same should be applicable for you as well.
I would say many of these free APIs might have some issues. So, I would recommend you to start testing the open API through postman before you jump into the Python development. Here is the glimpse of my test through the postman –
Once, I can see that the API is returning the result. I can work on it.
Apart from that, one needs to understand that these API might have limited use & also you need to know the consequences in terms of price & tier in case if you exceeded the limit. Here is the detail for this API –
For our demo, I’ll be using the Free tire only.
Let’s look into our other source data. We got the top 10 city population-wise over there internet. Also, we have collected sample Senior Citizen percentage against sex ratio across those cities. We have masked these values on top of that as this is just for education purposes.
1. CityDetails.csv
Here is the glimpse of this file –
So, this file only contains the total population across the top 10 cities in the USA.
2. SeniorCitizen.csv
This file contains the Sex ratio of Senior citizens across those top 10 cities by population.
Again, we are not going to discuss any script, which we’ve already discussed here.
Hence, we’re skipping clsL.py here.
1. clsConfig.py (This script contains all the parameters of the server.)
In the above snippet, our application first preparing the payload & the parameters received from our param script. And then invoke the GET method to extract the real-time data in the form of JSON & finally sending the JSON payload to the primary calling function.
3. clsMap.py (This script contains the main logic to prepare the MAP using seaborn package & try to plot our custom made risk factor by blending the realtime data with our statistical data received over the internet.)
################################################## Written By: SATYAKI DE ######## Written On: 19-Jan-2020 ######## Modified On 19-Jan-2020 ######## ######## Objective: Main scripts to invoke ######## plot into the Map. ##################################################importseabornassnsimportloggingfromclsConfigimport clsConfig as cf
importpandasaspimportclsLascl# This library requires later# to print the chartimportmatplotlib.pyplotaspltclassclsMap:
def__init__(self):
self.src_file = cf.config['SRC_FILE_1']
defcalculateRisk(self, row):
try:
# Let's assume some logic# 1. By default, 30% of Senior Citizen# prone to health Issue for each City# 2. Male Senior Citizen is 19% more prone# to illness than female.# 3. If humidity more than 70% or less# than 40% are 22% main cause of illness# 4. If feels like more than 280 or# less than 260 degree are 17% more prone# to illness.# Finally, this will be calculated per 1K# people around 10 blocks
str_sex =str(row['Sex'])
int_humidity =int(row['humidity'])
int_feelsLike =int(row['feels_like'])
int_population =int(str(row['Population']).replace(',',''))
float_srcitizen =float(row['SeniorCitizen'])
confidance_score =0.0
SeniorCitizenPopulation = (int_population * float_srcitizen)
if str_sex =='Male':
confidance_score = (SeniorCitizenPopulation *0.30*0.19) + confidance_score
else:
confidance_score = (SeniorCitizenPopulation *0.30*0.11) + confidance_score
if ((int_humidity >70) | (int_humidity <40)):
confidance_score = confidance_score + (int_population *0.30* float_srcitizen) *0.22if ((int_feelsLike >280) | (int_feelsLike <260)):
confidance_score = confidance_score + (int_population *0.30* float_srcitizen) *0.17
final_score =round(round(confidance_score, 2) / (1000*10), 2)
return final_score
exceptExceptionas e:
x =str(e)
return x
defsetMap(self, dfInput):
try:
resVal =0
df = p.DataFrame()
debug_ind ='Y'
src_file =self.src_file
# Initiating Log Class
l = cl.clsL()
df = dfInput
# Creating a subset of desired columns
dfMod = df[['CityName', 'temp', 'Population', 'humidity', 'feels_like']]
l.logr('5.dfSuppliment.csv', debug_ind, dfMod, 'log')
# Fetching Senior Citizen Data
df = p.read_csv(src_file, index_col=False)
# Merging two frames
dfMerge = p.merge(df, dfMod, on=['CityName'])
l.logr('6.dfMerge.csv', debug_ind, dfMerge, 'log')
# Getting RiskFactor quotient from our custom made logic
dfMerge['RiskFactor'] = dfMerge.apply(lambda row: self.calculateRisk(row), axis=1)
l.logr('7.dfRiskFactor.csv', debug_ind, dfMerge, 'log')
# Generating Map plotss# sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex')# sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex', markers=['o','v'], scatter_kws={'s':25})
sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, col='Sex')
# This is required when you are running# through normal Python & not through# Jupyter Notebook
plt.show()
return resVal
exceptExceptionas e:
x =str(e)
print(x)
logging.info(x)
resVal = x
return resVal
Key lines from the above codebase –
# Creating a subset of desired columnsdfMod = df[['CityName', 'temp', 'Population', 'humidity', 'feels_like']]l.logr('5.dfSuppliment.csv', debug_ind, dfMod, 'log')# Fetching Senior Citizen Datadf = p.read_csv(src_file, index_col=False)# Merging two framesdfMerge = p.merge(df, dfMod, on=['CityName'])l.logr('6.dfMerge.csv', debug_ind, dfMerge, 'log')# Getting RiskFactor quotient from our custom made logicdfMerge['RiskFactor'] = dfMerge.apply(lambda row: self.calculateRisk(row), axis=1)l.logr('7.dfRiskFactor.csv', debug_ind, dfMerge, 'log')
Combining our Senior Citizen data with already processed data coming from our primary calling script. Also, here the application is calculating our custom logic to find out the risk factor figures. If you want to go through that, I’ve provided the logic to derive it. However, this is just a demo to find out similar figures. You should not rely on the logic that I’ve used (It is kind of my observation of life till now. :D).
The below lines are only required when you are running seaborn, not via Jupyter notebook.
plt.show()
4. callOpenMapWeatherAPI.py (This is the first calling script. This script also calls the realtime API & then blend the first file with it & pass the only relevant columns of data to our Map script to produce the graph.)
################################################## Written By: SATYAKI DE ######## Written On: 19-Jan-2020 ######## Modified On 19-Jan-2020 ######## ######## Objective: Main calling scripts. ##################################################fromclsConfigimport clsConfig as cf
importpandasaspimportclsLasclimportloggingimportdatetimeimportjsonimportclsWeatherasctimportreimportnumpyasnpimportclsMapascm# Disbling Warningdefwarn(*args, **kwargs):
passimportwarnings
warnings.warn = warn
# Lookup functions from# Azure cloud SQL DB
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
defgetMainWeather(row):
try:
# Using regular expression to fetch time part only
lkp_Columns =str(row['weather'])
jpayload =str(lkp_Columns).replace("'", '"')
#jpayload = json.dumps(lkp_Columns)
payload = json.loads(jpayload)
df_lkp = p.io.json.json_normalize(payload)
df_lkp.columns = df_lkp.columns.map(lambda x: x.split(".")[-1])
str_main_weather =str(df_lkp.iloc[0]['main'])
return str_main_weather
exceptExceptionas e:
x =str(e)
str_main_weather = x
return str_main_weather
defgetMainDescription(row):
try:
# Using regular expression to fetch time part only
lkp_Columns =str(row['weather'])
jpayload =str(lkp_Columns).replace("'", '"')
#jpayload = json.dumps(lkp_Columns)
payload = json.loads(jpayload)
df_lkp = p.io.json.json_normalize(payload)
df_lkp.columns = df_lkp.columns.map(lambda x: x.split(".")[-1])
str_description =str(df_lkp.iloc[0]['description'])
return str_description
exceptExceptionas e:
x =str(e)
str_description = x
return str_description
defmain():
try:
dfSrc = p.DataFrame()
df_ret = p.DataFrame()
ret_2 =''
debug_ind ='Y'
general_log_path =str(cf.config['LOG_PATH'])
# Enabling Logging Info
logging.basicConfig(filename=general_log_path +'consolidatedIR.log', level=logging.INFO)
# Initiating Log Class
l = cl.clsL()
# Moving previous day log files to archive directory
arch_dir = cf.config['ARCH_DIR']
log_dir = cf.config['LOG_PATH']
col_list = cf.config['COL_LIST']
col_list_1 = cf.config['COL_LIST_1']
col_list_2 = cf.config['COL_LIST_2']
tmpR0 ="*"*157
logging.info(tmpR0)
tmpR9 ='Start Time: '+str(var)
logging.info(tmpR9)
logging.info(tmpR0)
print("Archive Directory:: ", arch_dir)
print("Log Directory::", log_dir)
tmpR1 ='Log Directory::'+ log_dir
logging.info(tmpR1)
df2 = p.DataFrame()
src_file = cf.config['SRC_FILE']
# Fetching data from source file
df = p.read_csv(src_file, index_col=False)
# Creating a list of City Name from the source file
city_list = df['CityName'].tolist()
# Declaring an empty dictionary
merge_dict = {}
merge_dict['city'] = df2
start_pos =1
src_file_name ='1.'+ cf.config['SRC_FILE_INIT']
for i in city_list:
x1 = ct.clsWeather()
ret_2 = x1.searchQry(i)
# Capturing the JSON Payload
res = json.loads(ret_2)
# Converting dictionary to Pandas Dataframe# df_ret = p.read_json(ret_2, orient='records')
df_ret = p.io.json.json_normalize(res)
df_ret.columns = df_ret.columns.map(lambda x: x.split(".")[-1])
# Removing any duplicate columns
df_ret = df_ret.loc[:, ~df_ret.columns.duplicated()]
# l.logr(str(start_pos) + '.1.' + src_file_name, debug_ind, df_ret, 'log')
start_pos = start_pos +1# If all the conversion successful# you won't get any gust column# from OpenMap response. Hence, we# need to add dummy reason column# to maintain the consistent structuresif'gust'notin df_ret.columns:
df_ret = df_ret.assign(gust=999999)[['gust'] + df_ret.columns.tolist()]
# Resetting the column orders as per JSON
column_order = col_list
df_mod_ret = df_ret.reindex(column_order, axis=1)
if start_pos ==1:
merge_dict['city'] = df_mod_ret
else:
d_frames = [merge_dict['city'], df_mod_ret]
merge_dict['city'] = p.concat(d_frames)
start_pos +=1for k, v in merge_dict.items():
l.logr(src_file_name, debug_ind, merge_dict[k], 'log')
# Now opening the temporary file
temp_log_file = log_dir + src_file_name
dfNew = p.read_csv(temp_log_file, index_col=False)
# Extracting Complex columns
dfNew['WeatherMain'] = dfNew.apply(lambda row: getMainWeather(row), axis=1)
dfNew['WeatherDescription'] = dfNew.apply(lambda row: getMainDescription(row), axis=1)
l.logr('2.dfNew.csv', debug_ind, dfNew, 'log')
# Removing unwanted columns & Renaming key columns
dfNew.drop(['weather'], axis=1, inplace=True)
dfNew.rename(columns={'name': 'CityName'}, inplace=True)
l.logr('3.dfNewMod.csv', debug_ind, dfNew, 'log')
# Now joining with the main csv# to get the complete picture
dfMain = p.merge(df, dfNew, on=['CityName'])
l.logr('4.dfMain.csv', debug_ind, dfMain, 'log')
# Let's extract only relevant columns
dfSuppliment = dfMain[['CityName', 'Population', 'State', 'country', 'feels_like', 'humidity', 'pressure', 'temp', 'temp_max', 'temp_min', 'visibility', 'deg', 'gust', 'speed', 'WeatherMain', 'WeatherDescription']]
l.logr('5.dfSuppliment.csv', debug_ind, dfSuppliment, 'log')
# Let's pass this to our map section
x2 = cm.clsMap()
ret_3 = x2.setMap(dfSuppliment)
if ret_3 ==0:
print('Successful Map Generated!')
else:
print('Please check the log for further issue!')
print("-"*60)
print()
print('Finding Story points..')
print("*"*157)
logging.info('Finding Story points..')
logging.info(tmpR0)
tmpR10 ='End Time: '+str(var)
logging.info(tmpR10)
logging.info(tmpR0)
exceptValueErroras e:
print(str(e))
print("No relevant data to proceed!")
logging.info("No relevant data to proceed!")
exceptExceptionas e:
print("Top level Error: args:{0}, message{1}".format(e.args, e.message))
if __name__ =="__main__":
main()
Key snippet from the above script –
# Capturing the JSON Payloadres = json.loads(ret_2)# Converting dictionary to Pandas Dataframedf_ret = p.io.json.json_normalize(res)df_ret.columns = df_ret.columns.map(lambda x: x.split(".")[-1])
Once the application received the JSON response from the realtime API, the application is converting it to pandas dataframe.
# Removing any duplicate columnsdf_ret = df_ret.loc[:, ~df_ret.columns.duplicated()]
Since this is a complex JSON response. The application might encounter duplicate columns, which might cause a problem later. Hence, our app is removing all these duplicate columns as they are not required for our cases.
if 'gust' not in df_ret.columns: df_ret = df_ret.assign(gust=999999)[['gust'] + df_ret.columns.tolist()]
There is a possibility that the application might not receive all the desired attributes from the realtime API. Hence, the above lines will check & add a dummy column named gust for those records in case if they are not present in the JSON response.
These few lines required as our API has a limitation of responding with only one city at a time. Hence, in this case, we’re retrieving one town at a time & finally merge them into a single dataframe before creating a temporary source file for the next step.
At this moment our data should look like this –
Let’s check the weather column. We need to extract the main & description for our dashboard, which will be coming in the next installment.
The above lines extracting the weather column & replacing the single quotes with the double quotes before the application is trying to convert that to JSON. Once it converted to JSON, the json_normalize will easily serialize it & create individual columns out of it. Once you have them captured inside the pandas dataframe, you can extract the unique values & store them & return them to your primary calling function.
# Let's pass this to our map sectionx2 = cm.clsMap()ret_3 = x2.setMap(dfSuppliment)if ret_3 == 0: print('Successful Map Generated!')else: print('Please check the log for further issue!')
In the above lines, the application will invoke the Map class to calculate the remaining logic & then plotting the data into the seaborn graph.
Let’s just briefly see the central directory structure –
Here is the log directory –
And, finally, the source directory should look something like this –
You must be logged in to post a comment.