cast Archives

Exploring the new Polars library in Python

Posted on October 28, 2023October 28, 2023 by SatyakiDe in analytic function, api, cast, cloud, code, computing, Crossplatform, Data Science, function, objects, Pandas, Polars, Python, snippet, Technology

Today, I will present some valid Python packages where you can explore most of the complex SQLs by using this new package named “Polars,” which can be extremely handy on many occasions.

This post will be short posts where I’ll prepare something new on LLMs for the upcoming posts for the next month.

Why not view the demo before going through it?

Demo

Python Packages:

pip install polars
pip install pandas

Code:

Let us understand the key class & snippets.

clsConfigClient.py (Key entries that will be discussed later)

################################################
#### Written By: SATYAKI DE                 ####
#### Written On:  15-May-2020               ####
#### Modified On: 28-Oct-2023               ####
####                                        ####
#### Objective: This script is a config     ####
#### file, contains all the keys for        ####
#### personal OpenAI-based MAC-shortcuts    ####
#### enable bot.                            ####
####                                        ####
################################################

import os
import platform as pl

class clsConfigClient(object):
    Curr_Path = os.path.dirname(os.path.realpath(__file__))

    os_det = pl.system()
    if os_det == "Windows":
        sep = '\\'
    else:
        sep = '/'

    conf = {
        'APP_ID': 1,
        'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
        'LOG_PATH': Curr_Path + sep + 'log' + sep,
        'DATA_PATH': Curr_Path + sep + 'data' + sep,
        'TEMP_PATH': Curr_Path + sep + 'temp' + sep,
        'OUTPUT_DIR': 'model',
        'APP_DESC_1': 'Polars Demo!',
        'DEBUG_IND': 'Y',
        'INIT_PATH': Curr_Path,
        'TITLE': "Polars Demo!",
        'PATH' : Curr_Path,
        'OUT_DIR': 'data',
        'MERGED_FILE': 'mergedFile.csv',
        'ACCT_FILE': 'AccountAddress.csv',
        'ORDER_FILE': 'Orders.csv',
        'CUSTOMER_FILE': 'CustomerDetails.csv',
        'STATE_CITY_WISE_REPORT_FILE': 'StateCityWiseReport.csv'
    }

clsSQL.py (Main class file that contains how to use the SQL)

#####################################################
#### Written By: SATYAKI DE                      ####
#### Written On: 27-May-2023                     ####
#### Modified On 28-Oct-2023                     ####
####                                             ####
#### Objective: This is the main calling         ####
#### python class that will invoke the           ####
#### Polar class, which will enable SQL          ####
#### capabilitites.                              ####
####                                             ####
#####################################################

import polars as pl
import os
from clsConfigClient import clsConfigClient as cf
import pandas as p

###############################################
###           Global Section                ###
###############################################

# Disbling Warning
def warn(*args, **kwargs):
    pass

import warnings
warnings.warn = warn

###############################################
###    End of Global Section                ###
###############################################

class clsSQL:
    def __init__(self):
        self.acctFile = cf.conf['ACCT_FILE']
        self.orderFile = cf.conf['ORDER_FILE']
        self.stateWiseReport = cf.conf['STATE_CITY_WISE_REPORT_FILE']
        self.custFile = cf.conf['CUSTOMER_FILE']
        self.dataPath = cf.conf['DATA_PATH']

    def execSQL(self):
        try:
            dataPath = self.dataPath
            acctFile = self.acctFile
            orderFile = self.orderFile
            stateWiseReport = self.stateWiseReport
            custFile = self.custFile

            fullAcctFile = dataPath + acctFile
            fullOrderFile = dataPath + orderFile
            fullStateWiseReportFile = dataPath + stateWiseReport
            fullCustomerFile = dataPath + custFile

            ctx = pl.SQLContext(accountMaster = pl.scan_csv(fullAcctFile),
            orderMaster = pl.scan_csv(fullOrderFile),
            stateMaster = pl.scan_csv(fullStateWiseReportFile))

            querySQL = """
            SELECT orderMaster.order_id,
            orderMaster.total,
            stateMaster.state,
            accountMaster.Acct_Nbr,
            accountMaster.Name,
            accountMaster.Email,
            accountMaster.user_id,
            COUNT(*) TotalCount
            FROM orderMaster
            JOIN stateMaster USING (city)
            JOIN accountMaster USING (user_id)
            ORDER BY stateMaster.state
            """

            res = ctx.execute(querySQL, eager=True)
            res_Pandas = res.to_pandas()

            print('Result:')
            print(res_Pandas)
            print(type(res_Pandas))

            ctx_1 = pl.SQLContext(customerMaster = pl.scan_csv(fullCustomerFile),
            tempMaster=pl.from_pandas(res_Pandas))

            querySQL_1 = """
            SELECT tempMaster.order_id,
            tempMaster.total,
            tempMaster.state,
            tempMaster.Acct_Nbr,
            tempMaster.Name,
            tempMaster.Email,
            tempMaster.TotalCount,
            tempMaster.user_id,
            COUNT(*) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) StateWiseCount,
            MAX(tempMaster.Acct_Nbr) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) MaxAccountByState,
            MIN(tempMaster.Acct_Nbr) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) MinAccountByState,
            CASE WHEN tempMaster.total < 70 THEN 'SILVER' ELSE 'GOLD' END CategoryStat,
            SUM(customerMaster.Balance) OVER(PARTITION BY tempMaster.state) SumBalance
            FROM tempMaster
            JOIN customerMaster USING (user_id)
            ORDER BY tempMaster.state
            """

            res_1 = ctx_1.execute(querySQL_1, eager=True)

            finDF = res_1.to_pandas()

            print('Result 2:')
            print(finDF)

            return 0
        except Exception as e:
            discussedTopic = []
            x = str(e)
            print('Error: ', x)

            return 1

If we go through some of the key lines, we will understand how this entire package works.

But, before that, let us understand the source data –

Let us understand the steps –

Join orderMaster, stateMaster & accountMaster and fetch the selected attributes. Store this in a temporary data frame named tempMaster.
Join tempMaster & customerMaster and fetch the relevant attributes with some more aggregation, which is required for the business KPIs.

ctx = pl.SQLContext(accountMaster = pl.scan_csv(fullAcctFile),
orderMaster = pl.scan_csv(fullOrderFile),
stateMaster = pl.scan_csv(fullStateWiseReportFile))

The above method will create three temporary tables by reading the source files – AccountAddress.csv, Orders.csv & StateCityWiseReport.csv.

And, let us understand the supported SQLs –

SELECT  orderMaster.order_id,
        orderMaster.total,
        stateMaster.state,
        accountMaster.Acct_Nbr,
        accountMaster.Name,
        accountMaster.Email,
        accountMaster.user_id,
        COUNT(*) TotalCount
FROM orderMaster
JOIN stateMaster USING (city)
JOIN accountMaster USING (user_id)
ORDER BY stateMaster.state

In this step, we’re going to store the output of the above query into a temporary view named – tempMaster data frame.

Since this is a polar data frame, we’re converting it to the pandas data frame.

res_Pandas = res.to_pandas()

Finally, let us understand the next part –

ctx_1 = pl.SQLContext(customerMaster = pl.scan_csv(fullCustomerFile),
tempMaster=pl.from_pandas(res_Pandas))

In the above section, one source is getting populated from the CSV file, whereas the other source is feeding from a pandas data frame populated in the previous step.

Now, let us understand the SQL supported by this package, which is impressive –

SELECT  tempMaster.order_id,
        tempMaster.total,
        tempMaster.state,
        tempMaster.Acct_Nbr,
        tempMaster.Name,
        tempMaster.Email,
        tempMaster.TotalCount,
        tempMaster.user_id,
        COUNT(*) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) StateWiseCount,
        MAX(tempMaster.Acct_Nbr) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) MaxAccountByState,
        MIN(tempMaster.Acct_Nbr) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) MinAccountByState,
        CASE WHEN tempMaster.total < 70 THEN 'SILVER' ELSE 'GOLD' END CategoryStat,
        SUM(customerMaster.Balance) OVER(PARTITION BY tempMaster.state) SumBalance
FROM tempMaster
JOIN customerMaster USING (user_id)
ORDER BY tempMaster.state

As you can see it has the capability of all the advanced analytics SQL using partitions, and CASE statements.

The only problem with COUNT(*) with the partition is not working as expected. Not sure, whether that is related to any version issues or not.

COUNT(*) OVER(PARTITION BY tempMaster.state ORDER BY tempMaster.state, tempMaster.Acct_Nbr) StateWiseCount

I’m trying to get more information on this. Except for this statement, everything works perfectly.

1_testSQL.py (Main class file that contains how to use the SQL)

#########################################################
#### Written By: SATYAKI DE                          ####
#### Written On: 27-Jun-2023                         ####
#### Modified On 28-Oct-2023                         ####
####                                                 ####
#### Objective: This is the main class that invokes  ####
#### advanced analytic SQL in python.                ####
####                                                 ####
#########################################################

from clsConfigClient import clsConfigClient as cf
import clsL as log
import clsSQL as ccl

from datetime import datetime, timedelta

# Disbling Warning
def warn(*args, **kwargs):
    pass

import warnings
warnings.warn = warn

###############################################
###           Global Section                ###
###############################################

#Initiating Logging Instances
clog = log.clsL()
cl = ccl.clsSQL()

var = datetime.now().strftime(".%H.%M.%S")

documents = []

###############################################
###    End of Global Section                ###
###############################################
def main():
    try:
        var = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        print('*'*120)
        print('Start Time: ' + str(var))
        print('*'*120)

        r1 = cl.execSQL()

        if r1 == 0:
            print()
            print('Successfully SQL-enabled!')
        else:
            print()
            print('Failed to senable SQL!')

        print('*'*120)
        var1 = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        print('End Time: ' + str(var1))

    except Exception as e:
        x = str(e)
        print('Error: ', x)

if __name__ == '__main__':
    main()

As this is extremely easy to understand & self-explanatory.

To learn more about this package, please visit the following link.

So, finally, we’ve done it. I know that this post is relatively smaller than my earlier post. But, I think, you can get a good hack to improve some of your long-running jobs by applying this trick.

I’ll bring some more exciting topics in the coming days from the Python verse. Please share & subscribe to my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Note: All the data & scenarios posted here are representational data & scenarios & available over the internet & for educational purposes only. Some of the images (except my photo) we’ve used are available over the net. We don’t claim ownership of these images. There is always room for improvement & especially in the prediction quality.

Handling unique data using the python-based FastDataMask Package package

Posted on February 18, 2023February 19, 2023 by SatyakiDe in api, Azure, call, cast, cloud, code, commit, Data Science, FastDataMask, function, numpy, Pandas, Python

Today, I’ll discuss one widespread use case of handling unique & critical data using a new python-based FastDataMask package. But before going through the details, why don’t we view the demo & then go through it?

Demo

Great! Let us understand in detail.

Architecture:

Let us understand the flow of events –

The application first invokes the FastDataMask python package, which accepts individual data in nature & then generates non-recoverable masked data, keeping the data pattern & nature in mind. Hence, anyone can still use the data for their analysis, whereas you can encapsulate the information from unauthorized pairs of eyes. Yet, they can get the essence & close data patterns to decide from any data analysis.

Python Packages:

Following are the python packages that are necessary to develop this brilliant use case –

pip install FastDataMask==0.0.6
pip install imutils==0.5.3
pip install numpy==1.23.2
pip install pandas==1.4.3

CODE:

Let us now understand the code. For this use case, we will only discuss three python scripts. However, we need more than these three. However, we have already discussed them in some of the early posts. Hence, we will skip them here.

clsConfigClient.py (Main configuration file)

	################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 15-May-2020 ####
	#### Modified On: 15-Feb-2023 ####
	#### ####
	#### Objective: This script is a config ####
	#### file, contains all the keys for ####
	#### personal AI-driven voice assistant. ####
	#### ####
	################################################

	import os
	import platform as pl

	class clsConfigClient(object):
	Curr_Path = os.path.dirname(os.path.realpath(__file__))

	os_det = pl.system()
	if os_det == "Windows":
	sep = '\\'
	else:
	sep = '/'

	conf = {
	'APP_ID': 1,
	'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
	'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
	'LOG_PATH': Curr_Path + sep + 'log' + sep,
	'REPORT_PATH': Curr_Path + sep + 'output' + sep,
	'REPORT_DIR': 'output',
	'SRC_PATH': Curr_Path + sep + 'data' + sep,
	'APP_DESC_1': 'Masking PII Data!',
	'DEBUG_IND': 'N',
	'INIT_PATH': Curr_Path,
	'FILE_NAME': 'PII.csv',
	'TITLE': "Masking PII Data!",
	'PATH' : Curr_Path
	}

view raw

clsConfigClient.py

hosted with ❤ by GitHub

Key entries from the above scripts are as follows –

'FILE_NAME': 'PII.csv',

This excel is a dummy input file, which looks like this –

In the above screenshot, our applications will use critical information like – First Name, Email, Address, Phone, Date Of Birth, SSN & Sal.

playPII.py (Main calling python script)

	#####################################################
	#### Written By: SATYAKI DE ####
	#### Written On: 12-Feb-2023 ####
	#### Modified On 16-Feb-2023 ####
	#### ####
	#### Objective: This is the main calling ####
	#### python script that will invoke the ####
	#### newly created light data masking class. ####
	#### ####
	#####################################################

	import pandas as p
	import clsL as cl
	from clsConfigClient import clsConfigClient as cf
	import datetime

	from FastDataMask import clsCircularList as ccl

	# Disbling Warning
	def warn(args, *kwargs):
	pass

	import warnings
	warnings.warn = warn

	######################################
	### Get your global values ####
	######################################
	debug_ind = 'Y'
	charList = ccl.clsCircularList()

	CurrPath = cf.conf['SRC_PATH']
	FileName = cf.conf['FILE_NAME']
	######################################
	#### Global Flag ########
	######################################

	######################################
	### Wrapper functions to invoke ###
	### the desired class from newly ###
	### built class. ###
	######################################
	def mask_email(email):
	try:
	maskedEmail = charList.maskEmail(email)
	return maskedEmail
	except:
	return ''

	def mask_phone(phone):
	try:
	maskedPhone = charList.maskPhone(phone)
	return maskedPhone
	except:
	return ''

	def mask_name(flname):
	try:
	maskedFLName = charList.maskFLName(flname)
	return maskedFLName
	except:
	return ''

	def mask_date(dt):
	try:
	maskedDate = charList.maskDate(dt)
	return maskedDate
	except:
	return ''

	def mask_uniqueid(unqid):
	try:
	maskedUnqId = charList.maskSSN(unqid)
	return maskedUnqId
	except:
	return ''

	def mask_sal(sal):
	try:
	maskedSal = charList.maskSal(sal)
	return maskedSal
	except:
	return ''
	######################################
	### End of wrapper functions. ###
	######################################

	def main():
	try:
	var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print(''120)
	print('Start Time: ' + str(var))
	print(''120)

	inputFile = CurrPath + FileName

	print('Input File: ', inputFile)

	df = p.read_csv(inputFile)

	print(''120)
	print('Source Data: ')
	print(df)
	print(''120)

	hdr = list(df.columns.values)
	print('Headers:', hdr)

	df["MaskedFirstName"] = df["FirstName"].apply(mask_name)
	df["MaskedEmail"] = df["Email"].apply(mask_email)
	df["MaskedPhone"] = df["Phone"].apply(mask_phone)
	df["MaskedDOB"] = df["DOB"].apply(mask_date)
	df["MaskedSSN"] = df["SSN"].apply(mask_uniqueid)
	df["MaskedSal"] = df["Sal"].apply(mask_sal)

	# Dropping old columns
	df.drop(['FirstName','Email','Phone','DOB','SSN', 'Sal'], axis=1, inplace=True)

	# Renaming columns
	df.rename(columns={'MaskedFirstName': 'FirstName'}, inplace=True)
	df.rename(columns={'MaskedEmail': 'Email'}, inplace=True)
	df.rename(columns={'MaskedPhone': 'Phone'}, inplace=True)
	df.rename(columns={'MaskedDOB': 'DOB'}, inplace=True)
	df.rename(columns={'MaskedSSN': 'SSN'}, inplace=True)
	df.rename(columns={'MaskedSal': 'Sal'}, inplace=True)

	# Repositioning columns of dataframe
	df = df[hdr]

	print('Masked DF: ')
	print(df)

	print(''120)
	var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
	print('End Time: ' + str(var1))

	except Exception as e:
	x = str(e)
	print('Error: ', x)

	if __name__ == "__main__":
	main()

view raw

playPII.py

hosted with ❤ by GitHub

Let us understand the key lines in details –

def mask_email(email):
    try:
        maskedEmail = charList.maskEmail(email)
        return maskedEmail
    except:
        return ''

def mask_phone(phone):
    try:
        maskedPhone = charList.maskPhone(phone)
        return maskedPhone
    except:
        return ''

def mask_name(flname):
    try:
        maskedFLName = charList.maskFLName(flname)
        return maskedFLName
    except:
        return ''

def mask_date(dt):
    try:
        maskedDate = charList.maskDate(dt)
        return maskedDate
    except:
        return ''

def mask_uniqueid(unqid):
    try:
        maskedUnqId = charList.maskSSN(unqid)
        return maskedUnqId
    except:
        return ''

def mask_sal(sal):
    try:
        maskedSal = charList.maskSal(sal)
        return maskedSal
    except:
        return ''

These functions take a value as input and attempt to mask it using the corresponding masking method from the charList module. If the masking is successful, the process will return a masked value per input; otherwise, the application will return an empty string.

More specifically, the functions are:

mask_email: masks the email address provided as input
mask_phone: masks the phone number provided as input
mask_name: masks the first and last name supplied as input
mask_date: masks the date provided as input
mask_uniqueid: masks the unique ID (e.g., Social Security Number) provided as input
mask_sal: masks the salary supplied as input

The functions use a try-except block to handle any exceptions that may arise when calling the corresponding masking method from the charList module. If the masking method raises an exception, the function will return an empty string to handle cases where the input value is invalid, or the masking method fails for another reason.

inputFile = CurrPath + FileName
df = p.read_csv(inputFile)
hdr = list(df.columns.values)


df["MaskedFirstName"] = df["FirstName"].apply(mask_name)
df["MaskedEmail"] = df["Email"].apply(mask_email)
df["MaskedPhone"] = df["Phone"].apply(mask_phone)
df["MaskedDOB"] = df["DOB"].apply(mask_date)
df["MaskedSSN"] = df["SSN"].apply(mask_uniqueid)
df["MaskedSal"] = df["Sal"].apply(mask_sal)

# Dropping old columns
df.drop(['FirstName','Email','Phone','DOB','SSN', 'Sal'], axis=1, inplace=True)

# Renaming columns
df.rename(columns={'MaskedFirstName': 'FirstName'}, inplace=True)
df.rename(columns={'MaskedEmail': 'Email'}, inplace=True)
df.rename(columns={'MaskedPhone': 'Phone'}, inplace=True)
df.rename(columns={'MaskedDOB': 'DOB'}, inplace=True)
df.rename(columns={'MaskedSSN': 'SSN'}, inplace=True)
df.rename(columns={'MaskedSal': 'Sal'}, inplace=True)

The first line inputFile = CurrPath + FileName concatenates the current working directory path (CurrPath) with the name of a file (FileName) and assigns the resulting file path to a variable inputFile.
The second line df = p.read_csv(inputFile) – reads the file located at inputFile into a Pandas DataFrame object called df.
The following few lines apply certain functions (mask_name, mask_email, mask_phone, mask_date, mask_uniqueid, and mask_sal) to specific columns in the DataFrame to mask sensitive data. These functions likely perform some data masking or obfuscation on the input data.
The following line df.drop([‘FirstName’,’Email’,’Phone’,’DOB’,’SSN’, ‘Sal’], axis=1, inplace=True) drops the original columns that we are supposed to mask (i.e., ‘FirstName’, ‘Email’, ‘Phone’, ‘DOB’, ‘SSN’, and ‘Sal’) from the DataFrame.
The remaining lines rename the masked columns to their original names (i.e., ‘MaskedFirstName’ is renamed to ‘FirstName’, ‘MaskedEmail’ is renamed to ‘Email’, and so on).

Overall, this code reads in a file, masking specific sensitive columns, then outputting a new file with the masked data.

Now, let’s compare the output against the source data –

As you can see the blue highlighted columns are the masked column & you can compare the data pattern against the source.

So, finally, we’ve done it.

I know that this post is relatively bigger than my earlier post. But, I think, you can get all the details once you go through it.

You will get the complete codebase in the following GitHub link.

I’ll bring some more exciting topics in the coming days from the Python verse. Please share & subscribe to my post & let me know your feedback.

Till then, Happy Avenging! 🙂

Predicting health issues for Senior Citizens based on “Realtime Weather Data” in Python

Posted on January 20, 2020March 11, 2021 by SatyakiDe in analytic function, api, BI Reports, call, cast, cloud, combining, comma-separated, Data Science, function, gui, json, member function, objects, Online Bi Reports, Pandas, partition, Python, regular expression, replace, return, snippet, String Manipulation, table

Hi Guys,

Today, I’ll be presenting a different kind of post here. I’ll be trying to predict health issues for senior citizens based on “realtime weather data” by blending open-source population data using some mock risk factor calculation. At the end of the post, I’ll be plotting these numbers into some graphs for better understanding.

Let’s drive!

For this first, we need realtime weather data. To do that, we need to subscribe to the data from OpenWeather API. For that, you have to register as a developer & you’ll receive a similar email from them once they have approved –

So, from the above picture, you can see that, you’ll be provided one API key & also offered a couple of useful API documentation. I would recommend exploring all the links before you try to use it.

You can also view your API key once you logged into their console. You can also create multiple API keys & the screen should look something like this –

For security reasons, I’ll be hiding my own keys & the same should be applicable for you as well.

I would say many of these free APIs might have some issues. So, I would recommend you to start testing the open API through postman before you jump into the Python development. Here is the glimpse of my test through the postman –

Once, I can see that the API is returning the result. I can work on it.

Apart from that, one needs to understand that these API might have limited use & also you need to know the consequences in terms of price & tier in case if you exceeded the limit. Here is the detail for this API –

For our demo, I’ll be using the Free tire only.

Let’s look into our other source data. We got the top 10 city population-wise over there internet. Also, we have collected sample Senior Citizen percentage against sex ratio across those cities. We have masked these values on top of that as this is just for education purposes.

1. CityDetails.csv

Here is the glimpse of this file –

So, this file only contains the total population across the top 10 cities in the USA.

2. SeniorCitizen.csv

This file contains the Sex ratio of Senior citizens across those top 10 cities by population.

Again, we are not going to discuss any script, which we’ve already discussed here.

Hence, we’re skipping clsL.py here.

1. clsConfig.py (This script contains all the parameters of the server.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 19-Jan-2019              ####
####                                      ####
#### Objective: This script is a config   ####
#### file, contains all the keys for      ####
#### azure cosmos db. Application will    ####
#### process these information & perform  ####
#### various CRUD operation on Cosmos DB. ####
##############################################

import os
import platform as pl

class clsConfig(object):
    Curr_Path = os.path.dirname(os.path.realpath(__file__))

    os_det = pl.system()
    if os_det == "Windows":
        sep = '\\'
    else:
        sep = '/'

    config = {
        'APP_ID': 1,
        'URL': "http://api.openweathermap.org/data/2.5/weather",
        'API_HOST': "api.openweathermap.org",
        'API_KEY': "XXXXXXXXXXXXXXXXXXXXXX",
        'API_TYPE': "application/json",
        'CACHE': "no-cache",
        'CON': "keep-alive",
        'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
        'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
        'LOG_PATH': Curr_Path + sep + 'log' + sep,
        'REPORT_PATH': Curr_Path + sep + 'report',
        'SRC_PATH': Curr_Path + sep + 'Src_File' + sep,
        'APP_DESC_1': 'Open Weather Forecast',
        'DEBUG_IND': 'N',
        'INIT_PATH': Curr_Path,
        'SRC_FILE': Curr_Path + sep + 'Src_File' + sep + 'CityDetails.csv',
        'SRC_FILE_1': Curr_Path + sep + 'Src_File' + sep + 'SeniorCitizen.csv',
        'SRC_FILE_INIT': 'CityDetails.csv',
        'COL_LIST': ['base', 'all', 'cod', 'lat', 'lon', 'dt', 'feels_like', 'humidity', 'pressure', 'temp', 'temp_max', 'temp_min', 'name', 'country', 'sunrise', 'sunset', 'type', 'timezone', 'visibility', 'weather', 'deg', 'gust', 'speed'],
        'COL_LIST_1': ['base', 'all', 'cod', 'lat', 'lon', 'dt', 'feels_like', 'humidity', 'pressure', 'temp', 'temp_max', 'temp_min', 'CityName', 'country', 'sunrise', 'sunset', 'type', 'timezone', 'visibility', 'deg', 'gust', 'speed', 'WeatherMain', 'WeatherDescription'],
        'COL_LIST_2': ['CityName', 'Population', 'State']
    }

2. clsWeather.py (This script contains the main logic to extract the realtime data from our subscribed weather API.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 19-Jan-2020              ####
#### Modified On 19-Jan-2020              ####
####                                      ####
#### Objective: Main scripts to invoke    ####
#### Indian Railway API.                  ####
##############################################

import requests
import logging
import json
from clsConfig import clsConfig as cf

class clsWeather:
    def __init__(self):
        self.url = cf.config['URL']
        self.openmapapi_host = cf.config['API_HOST']
        self.openmapapi_key = cf.config['API_KEY']
        self.openmapapi_cache = cf.config['CACHE']
        self.openmapapi_con = cf.config['CON']
        self.type = cf.config['API_TYPE']

    def searchQry(self, rawQry):
        try:
            url = self.url
            openmapapi_host = self.openmapapi_host
            openmapapi_key = self.openmapapi_key
            openmapapi_cache = self.openmapapi_cache
            openmapapi_con = self.openmapapi_con
            type = self.type

            querystring = {"appid": openmapapi_key, "q": rawQry}

            print('Input JSON: ', str(querystring))

            headers = {
                'host': openmapapi_host,
                'content-type': type,
                'Cache-Control': openmapapi_cache,
                'Connection': openmapapi_con
            }

            response = requests.request("GET", url, headers=headers, params=querystring)

            ResJson  = response.text

            jdata = json.dumps(ResJson)
            ResJson = json.loads(jdata)

            return ResJson

        except Exception as e:
            ResJson = ''
            x = str(e)
            print(x)

            logging.info(x)
            ResJson = {'errorDetails': x}

            return ResJson

The key lines from this script –

querystring = {"appid": openmapapi_key, "q": rawQry}

print('Input JSON: ', str(querystring))

headers = {
    'host': openmapapi_host,
    'content-type': type,
    'Cache-Control': openmapapi_cache,
    'Connection': openmapapi_con
}

response = requests.request("GET", url, headers=headers, params=querystring)

ResJson  = response.text

In the above snippet, our application first preparing the payload & the parameters received from our param script. And then invoke the GET method to extract the real-time data in the form of JSON & finally sending the JSON payload to the primary calling function.

3. clsMap.py (This script contains the main logic to prepare the MAP using seaborn package & try to plot our custom made risk factor by blending the realtime data with our statistical data received over the internet.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 19-Jan-2020              ####
#### Modified On 19-Jan-2020              ####
####                                      ####
#### Objective: Main scripts to invoke    ####
#### plot into the Map.                   ####
##############################################

import seaborn as sns
import logging
from clsConfig import clsConfig as cf
import pandas as p
import clsL as cl

# This library requires later
# to print the chart
import matplotlib.pyplot as plt

class clsMap:
    def __init__(self):
        self.src_file =  cf.config['SRC_FILE_1']

    def calculateRisk(self, row):
        try:
            # Let's assume some logic
            # 1. By default, 30% of Senior Citizen
            # prone to health Issue for each City
            # 2. Male Senior Citizen is 19% more prone
            # to illness than female.
            # 3. If humidity more than 70% or less
            # than 40% are 22% main cause of illness
            # 4. If feels like more than 280 or
            # less than 260 degree are 17% more prone
            # to illness.
            # Finally, this will be calculated per 1K
            # people around 10 blocks

            str_sex = str(row['Sex'])

            int_humidity = int(row['humidity'])
            int_feelsLike = int(row['feels_like'])
            int_population = int(str(row['Population']).replace(',',''))
            float_srcitizen = float(row['SeniorCitizen'])

            confidance_score = 0.0

            SeniorCitizenPopulation = (int_population * float_srcitizen)

            if str_sex == 'Male':
                confidance_score = (SeniorCitizenPopulation * 0.30 * 0.19) + confidance_score
            else:
                confidance_score = (SeniorCitizenPopulation * 0.30 * 0.11) + confidance_score

            if ((int_humidity > 70) | (int_humidity < 40)):
                confidance_score = confidance_score + (int_population * 0.30 * float_srcitizen) * 0.22

            if ((int_feelsLike > 280) | (int_feelsLike < 260)):
                confidance_score = confidance_score + (int_population * 0.30 * float_srcitizen) * 0.17

            final_score = round(round(confidance_score, 2) / (1000 * 10), 2)

            return final_score

        except Exception as e:
            x = str(e)

            return x

    def setMap(self, dfInput):
        try:
            resVal = 0
            df = p.DataFrame()
            debug_ind = 'Y'
            src_file =  self.src_file

            # Initiating Log Class
            l = cl.clsL()

            df = dfInput

            # Creating a subset of desired columns
            dfMod = df[['CityName', 'temp', 'Population', 'humidity', 'feels_like']]

            l.logr('5.dfSuppliment.csv', debug_ind, dfMod, 'log')

            # Fetching Senior Citizen Data
            df = p.read_csv(src_file, index_col=False)

            # Merging two frames
            dfMerge = p.merge(df, dfMod, on=['CityName'])

            l.logr('6.dfMerge.csv', debug_ind, dfMerge, 'log')

            # Getting RiskFactor quotient from our custom made logic
            dfMerge['RiskFactor'] = dfMerge.apply(lambda row: self.calculateRisk(row), axis=1)

            l.logr('7.dfRiskFactor.csv', debug_ind, dfMerge, 'log')

            # Generating Map plotss
            # sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex')
            # sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex', markers=['o','v'], scatter_kws={'s':25})
            sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, col='Sex')

            # This is required when you are running
            # through normal Python & not through
            # Jupyter Notebook
            plt.show()

            return resVal

        except Exception as e:
            x = str(e)
            print(x)

            logging.info(x)
            resVal = x

            return resVal

Key lines from the above codebase –

# Creating a subset of desired columns
dfMod = df[['CityName', 'temp', 'Population', 'humidity', 'feels_like']]

l.logr('5.dfSuppliment.csv', debug_ind, dfMod, 'log')

# Fetching Senior Citizen Data
df = p.read_csv(src_file, index_col=False)

# Merging two frames
dfMerge = p.merge(df, dfMod, on=['CityName'])

l.logr('6.dfMerge.csv', debug_ind, dfMerge, 'log')

# Getting RiskFactor quotient from our custom made logic
dfMerge['RiskFactor'] = dfMerge.apply(lambda row: self.calculateRisk(row), axis=1)

l.logr('7.dfRiskFactor.csv', debug_ind, dfMerge, 'log')

Combining our Senior Citizen data with already processed data coming from our primary calling script. Also, here the application is calculating our custom logic to find out the risk factor figures. If you want to go through that, I’ve provided the logic to derive it. However, this is just a demo to find out similar figures. You should not rely on the logic that I’ve used (It is kind of my observation of life till now. :D).

The below lines are only required when you are running seaborn, not via Jupyter notebook.

plt.show()

4. callOpenMapWeatherAPI.py (This is the first calling script. This script also calls the realtime API & then blend the first file with it & pass the only relevant columns of data to our Map script to produce the graph.)

##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 19-Jan-2020              ####
#### Modified On 19-Jan-2020              ####
####                                      ####
#### Objective: Main calling scripts.     ####
##############################################

from clsConfig import clsConfig as cf
import pandas as p
import clsL as cl
import logging
import datetime
import json
import clsWeather as ct
import re
import numpy as np
import clsMap as cm

# Disbling Warning
def warn(*args, **kwargs):
    pass

import warnings
warnings.warn = warn

# Lookup functions from
# Azure cloud SQL DB

var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

def getMainWeather(row):
    try:
        # Using regular expression to fetch time part only

        lkp_Columns = str(row['weather'])
        jpayload = str(lkp_Columns).replace("'", '"')

        #jpayload = json.dumps(lkp_Columns)
        payload = json.loads(jpayload)

        df_lkp = p.io.json.json_normalize(payload)
        df_lkp.columns = df_lkp.columns.map(lambda x: x.split(".")[-1])

        str_main_weather = str(df_lkp.iloc[0]['main'])

        return str_main_weather

    except Exception as e:
        x = str(e)
        str_main_weather = x

        return str_main_weather

def getMainDescription(row):
    try:
        # Using regular expression to fetch time part only

        lkp_Columns = str(row['weather'])
        jpayload = str(lkp_Columns).replace("'", '"')

        #jpayload = json.dumps(lkp_Columns)
        payload = json.loads(jpayload)

        df_lkp = p.io.json.json_normalize(payload)
        df_lkp.columns = df_lkp.columns.map(lambda x: x.split(".")[-1])

        str_description = str(df_lkp.iloc[0]['description'])

        return str_description

    except Exception as e:
        x = str(e)
        str_description = x

        return str_description

def main():
    try:
        dfSrc = p.DataFrame()
        df_ret = p.DataFrame()
        ret_2 = ''
        debug_ind = 'Y'

        general_log_path = str(cf.config['LOG_PATH'])

        # Enabling Logging Info
        logging.basicConfig(filename=general_log_path + 'consolidatedIR.log', level=logging.INFO)

        # Initiating Log Class
        l = cl.clsL()

        # Moving previous day log files to archive directory
        arch_dir = cf.config['ARCH_DIR']
        log_dir = cf.config['LOG_PATH']
        col_list = cf.config['COL_LIST']
        col_list_1 = cf.config['COL_LIST_1']
        col_list_2 = cf.config['COL_LIST_2']

        tmpR0 = "*" * 157

        logging.info(tmpR0)
        tmpR9 = 'Start Time: ' + str(var)
        logging.info(tmpR9)
        logging.info(tmpR0)

        print("Archive Directory:: ", arch_dir)
        print("Log Directory::", log_dir)
        tmpR1 = 'Log Directory::' + log_dir
        logging.info(tmpR1)

        df2 = p.DataFrame()

        src_file =  cf.config['SRC_FILE']

        # Fetching data from source file
        df = p.read_csv(src_file, index_col=False)

        # Creating a list of City Name from the source file
        city_list = df['CityName'].tolist()

        # Declaring an empty dictionary
        merge_dict = {}
        merge_dict['city'] = df2

        start_pos = 1
        src_file_name = '1.' + cf.config['SRC_FILE_INIT']

        for i in city_list:
            x1 = ct.clsWeather()
            ret_2 = x1.searchQry(i)

            # Capturing the JSON Payload
            res = json.loads(ret_2)

            # Converting dictionary to Pandas Dataframe
            # df_ret = p.read_json(ret_2, orient='records')

            df_ret = p.io.json.json_normalize(res)
            df_ret.columns = df_ret.columns.map(lambda x: x.split(".")[-1])

            # Removing any duplicate columns
            df_ret = df_ret.loc[:, ~df_ret.columns.duplicated()]

            # l.logr(str(start_pos) + '.1.' + src_file_name, debug_ind, df_ret, 'log')
            start_pos = start_pos + 1

            # If all the conversion successful
            # you won't get any gust column
            # from OpenMap response. Hence, we
            # need to add dummy reason column
            # to maintain the consistent structures

            if 'gust' not in df_ret.columns:
                df_ret = df_ret.assign(gust=999999)[['gust'] + df_ret.columns.tolist()]

            # Resetting the column orders as per JSON
            column_order = col_list
            df_mod_ret = df_ret.reindex(column_order, axis=1)

            if start_pos == 1:
                merge_dict['city'] = df_mod_ret
            else:
                d_frames = [merge_dict['city'], df_mod_ret]
                merge_dict['city'] = p.concat(d_frames)

            start_pos += 1

        for k, v in merge_dict.items():
            l.logr(src_file_name, debug_ind, merge_dict[k], 'log')

        # Now opening the temporary file
        temp_log_file = log_dir + src_file_name

        dfNew = p.read_csv(temp_log_file, index_col=False)

        # Extracting Complex columns
        dfNew['WeatherMain'] = dfNew.apply(lambda row: getMainWeather(row), axis=1)
        dfNew['WeatherDescription'] = dfNew.apply(lambda row: getMainDescription(row), axis=1)

        l.logr('2.dfNew.csv', debug_ind, dfNew, 'log')

        # Removing unwanted columns & Renaming key columns
        dfNew.drop(['weather'], axis=1, inplace=True)
        dfNew.rename(columns={'name': 'CityName'}, inplace=True)

        l.logr('3.dfNewMod.csv', debug_ind, dfNew, 'log')

        # Now joining with the main csv
        # to get the complete picture
        dfMain = p.merge(df, dfNew, on=['CityName'])

        l.logr('4.dfMain.csv', debug_ind, dfMain, 'log')

        # Let's extract only relevant columns
        dfSuppliment = dfMain[['CityName', 'Population', 'State', 'country', 'feels_like', 'humidity', 'pressure', 'temp', 'temp_max', 'temp_min', 'visibility', 'deg', 'gust', 'speed', 'WeatherMain', 'WeatherDescription']]

        l.logr('5.dfSuppliment.csv', debug_ind, dfSuppliment, 'log')

        # Let's pass this to our map section
        x2 = cm.clsMap()
        ret_3 = x2.setMap(dfSuppliment)

        if ret_3 == 0:
            print('Successful Map Generated!')
        else:
            print('Please check the log for further issue!')

        print("-" * 60)
        print()

        print('Finding Story points..')
        print("*" * 157)
        logging.info('Finding Story points..')
        logging.info(tmpR0)


        tmpR10 = 'End Time: ' + str(var)
        logging.info(tmpR10)
        logging.info(tmpR0)

    except ValueError as e:
        print(str(e))
        print("No relevant data to proceed!")
        logging.info("No relevant data to proceed!")

    except Exception as e:
        print("Top level Error: args:{0}, message{1}".format(e.args, e.message))

if __name__ == "__main__":
    main()

Key snippet from the above script –

# Capturing the JSON Payload
res = json.loads(ret_2)

# Converting dictionary to Pandas Dataframe
df_ret = p.io.json.json_normalize(res)
df_ret.columns = df_ret.columns.map(lambda x: x.split(".")[-1])

Once the application received the JSON response from the realtime API, the application is converting it to pandas dataframe.

# Removing any duplicate columns
df_ret = df_ret.loc[:, ~df_ret.columns.duplicated()]

Since this is a complex JSON response. The application might encounter duplicate columns, which might cause a problem later. Hence, our app is removing all these duplicate columns as they are not required for our cases.

if 'gust' not in df_ret.columns:
    df_ret = df_ret.assign(gust=999999)[['gust'] + df_ret.columns.tolist()]

There is a possibility that the application might not receive all the desired attributes from the realtime API. Hence, the above lines will check & add a dummy column named gust for those records in case if they are not present in the JSON response.

if start_pos == 1:
    merge_dict['city'] = df_mod_ret
else:
    d_frames = [merge_dict['city'], df_mod_ret]
    merge_dict['city'] = p.concat(d_frames)

These few lines required as our API has a limitation of responding with only one city at a time. Hence, in this case, we’re retrieving one town at a time & finally merge them into a single dataframe before creating a temporary source file for the next step.

At this moment our data should look like this –

Let’s check the weather column. We need to extract the main & description for our dashboard, which will be coming in the next installment.

# Extracting Complex columns
dfNew['WeatherMain'] = dfNew.apply(lambda row: getMainWeather(row), axis=1)
dfNew['WeatherDescription'] = dfNew.apply(lambda row: getMainDescription(row), axis=1)

Hence, we’ve used the following two functions to extract these values & the critical snippet from one of the service is as follows –

lkp_Columns = str(row['weather'])
jpayload = str(lkp_Columns).replace("'", '"')
payload = json.loads(jpayload)

df_lkp = p.io.json.json_normalize(payload)
df_lkp.columns = df_lkp.columns.map(lambda x: x.split(".")[-1])

str_main_weather = str(df_lkp.iloc[0]['main'])

The above lines extracting the weather column & replacing the single quotes with the double quotes before the application is trying to convert that to JSON. Once it converted to JSON, the json_normalize will easily serialize it & create individual columns out of it. Once you have them captured inside the pandas dataframe, you can extract the unique values & store them & return them to your primary calling function.

# Let's pass this to our map section
x2 = cm.clsMap()
ret_3 = x2.setMap(dfSuppliment)

if ret_3 == 0:
    print('Successful Map Generated!')
else:
    print('Please check the log for further issue!')

In the above lines, the application will invoke the Map class to calculate the remaining logic & then plotting the data into the seaborn graph.

Let’s just briefly see the central directory structure –

Here is the log directory –

And, finally, the source directory should look something like this –

Now, let’s runt the application –

Following lines are essential –

sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex')

This will project the plot like this –

Or,

sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, hue='Sex', markers=['o','v'], scatter_kws={'s':25})

This will lead to the following figures –

As you can see, here, using the marker of (‘o’/’v’) leads to two different symbols for the different gender.

Or,

sns.lmplot(x='RiskFactor', y='SeniorCitizen', data=dfMerge, col='Sex')

This will lead to –

So, in this case, the application has created two completely different sets for Sex.

So, finally, we’ve done it. 😀

In the next post, I’ll be doing some more improvisation on top of these data sets. Till then – Happy Avenging! 🙂

Note: All the data posted here are representational data & available over the internet & for educational purpose only.

Building an Azure Function using Python (Crossover between Reality Stone & Time Stone in Python Verse)

Posted on June 9, 2019January 2, 2021 by SatyakiDe in analytic function, api, call, cast, code, combining, commit, Data Science, function, member function, null, objects, operating system, Pandas

Hi Guys!

Today, we’ll be discussing a preview features from Microsoft Azure. Building an Azure function using Python on it’s Linux/Ubuntu VM. Since this is a preview feature, we cannot implement this to production till now. However, my example definitely has more detailed steps & complete code guide compared to whatever available over the internet.

In this post, I will take one of my old posts & enhance it as per this post. Hence, I’ll post those modified scripts. However, I won’t discuss the logic in details as most of these scripts have cosmetic changes to cater to this requirement.

In this post, we’ll only show Ubuntu run & there won’t be Windows or MAC comparison.

Initial Environment Preparation:

Set-up new virtual machine on Azure.
Set-up Azure function environments on that server.

Set-up new virtual machine on Azure:

I’m not going into the details of how to create Ubuntu VM on Microsoft Azure. You can refer the steps in more information here.

After successful creation, the VM will look like this –

Detailed information you can get after clicking this hyperlink over the name of the VM.

You have to open port 7071 for application testing from the local using postman.

You can get it from the network option under VM as follows –

Make sure that you are restricting these ports to specific network & not open to ALL traffic.

So, your VM is ready now.

To update Azure CLI, you need to use the following commands –

sudo apt-get update && sudo apt-get install –only-upgrade -y azure-cli

Set-up Azure function environments on that server:

To set-up the environment, you don’t have to go for Python installation as by default Ubuntu in Microsoft Azure comes up with desired Python version, i.e., Python3.6. However, to run the python application, you need to install the following app –

Microsoft SDK. You will get the details from this link.
Installing node-js. You will get the details from this link.
You need to install a docker. However, as per Microsoft official version, this is not required. But, you can create a Docker container to distribute the python function in Azure application. I would say you can install this just in case if you want to continue with this approach. You will get the details over here. If you want to know details about the Docker. And, how you want to integrate python application. You can refer to this link.
Your desired python packages. In this case, we’ll be modifying this post – “Encryption/Decryption, JSON, API, Flask Framework in Python (Crossover between Reality Stone & Time Stone in Python Verse).” We’ll be modifying a couple of lines only to cater to this functionality & deploying the same as an Azure function.
Creating an Azure function template on Ubuntu. The essential detail you’ll get it from here. However, over there, it was not shown in detailed steps of python packages & how you can add all the dependencies to publish it in details. It was an excellent post to start-up your knowledge.

Let’s see these components status & very brief details –

Microsoft SDK:

To check the dot net version. You need to type the following commands in Ubuntu –

dotnet –info

And, the output will look like this –

Node-Js:

Following is the way to verify your node-js version & details –

node -v
npm -v

And, the output looks like this –

Docker:

Following is the way to test your docker version –

docker -v

And, the output will look like this –

Python Packages:

Following are the python packages that we need to run & publish that in Azure cloud as an Azure function –

pip freeze | grep -v “pkg-resources” > requirements.txt

And, the output is –

You must be wondered that why have I used this grep commands here. I’ve witnessed that on many occassion in Microsoft Azure’s Linux VM it produces one broken package called resource=0.0.0, which will terminate the deployment process. Hence, this is very crucial to eliminate those broken packages.

Now, we’re ready for our python scripts. But, before that, let’s see the directory structure over here –

Creating an Azure Function Template on Ubuntu:

Before we post our python scripts, we’ll create these following components, which is essential for our Python-based Azure function –

Creating a group:

Creating a group either through Azure CLI or using a docker, you can proceed. The commands for Azure CLI is as follows –

az group create –name “rndWestUSGrp” –location westus

It is advisable to use double quotes for parameters value. Otherwise, you might land-up getting the following error – “Error: “resourceGroupName” should satisfy the constraint – “Pattern”: /^[-w._]+$/“.

I’m sure. You don’t want to face that again. And, here is the output –

Note that, here I haven’t used the double-quotes. But, to avoid any unforeseen issues – you should use double-quotes. You can refer the docker command from the above link, which I’ve shared earlier.

Now, you need to create one storage account where the metadata information of your function will be stored. You will create that as follows –

az storage account create –name cryptpy2019 –location westus –resource-group rndWestUSGrp –sku Standard_LRS

And, the output will look like this –

Great. Now, we’ll create a virtual environment for Python3.6.

python3.6 -m venv .env
source .env/bin/activate

Now, we’ll create a local function project.

func init encPro

And, the output you will get is as follows –

Inside this directory, you’ll see the following files –

You need to edit the host.json with these default lines –

{
“version”: “2.0”,
“extensionBundle”: {
“id”: “Microsoft.Azure.Functions.ExtensionBundle”,
“version”: “[1.*, 2.0.0)”
}
}

And, the final content of these two files (excluding the requirements.txt) will look like this –

Finally, we’ll create the template function by this following command –

func new

This will follow with steps finish it. You need to choose Python as your programing language. You need to choose an HTTP trigger template. Once you created that successfully, you’ll see the following files –

Note that, our initial function name is -> getVal.

By default, Azure will generate some default code inside the __init__.py. The details of those two files can be found here.

Since we’re ready with our environment setup. We can now discuss our Python scripts –

1. clsConfigServer.py (This script contains all the parameters of the server.)

###########################################
#### Written By: SATYAKI DE        ########
#### Written On: 10-Feb-2019       ########
####                               ########
#### Objective: Parameter File     ########
###########################################

import os
import platform as pl

# Checking with O/S system
os_det = pl.system()

class clsConfigServer(object):
    Curr_Path = os.path.dirname(os.path.realpath(__file__))

    if os_det == "Windows":
        config = {
            'FILE': 'acct_addr_20180112.csv',
            'SRC_FILE_PATH': Curr_Path + '\\' + 'src_file\\',
            'PROFILE_FILE_PATH': Curr_Path + '\\' + 'profile\\',
            'HOST_IP_ADDR': '0.0.0.0',
            'DEF_SALT': 'iooquzKtqLwUwXG3rModqj_fIl409vemWg9PekcKh2o=',
            'ACCT_NBR_SALT': 'iooquzKtqLwUwXG3rModqj_fIlpp1vemWg9PekcKh2o=',
            'NAME_SALT': 'iooquzKtqLwUwXG3rModqj_fIlpp1026Wg9PekcKh2o=',
            'PHONE_SALT': 'iooquzKtqLwUwXG3rMM0F5_fIlpp1026Wg9PekcKh2o=',
            'EMAIL_SALT': 'iooquzKtqLwU0653rMM0F5_fIlpp1026Wg9PekcKh2o='
        }
    else:
        config = {
            'FILE': 'acct_addr_20180112.csv',
            'SRC_FILE_PATH': Curr_Path + '/' + 'src_file/',
            'PROFILE_FILE_PATH': Curr_Path + '/' + 'profile/',
            'HOST_IP_ADDR': '0.0.0.0',
            'DEF_SALT': 'iooquzKtqLwUwXG3rModqj_fIl409vemWg9PekcKh2o=',
            'ACCT_NBR_SALT': 'iooquzKtqLwUwXG3rModqj_fIlpp1vemWg9PekcKh2o=',
            'NAME_SALT': 'iooquzKtqLwUwXG3rModqj_fIlpp1026Wg9PekcKh2o=',
            'PHONE_SALT': 'iooquzKtqLwUwXG3rMM0F5_fIlpp1026Wg9PekcKh2o=',
            'EMAIL_SALT': 'iooquzKtqLwU0653rMM0F5_fIlpp1026Wg9PekcKh2o='
        }

2. clsEnDec.py (This script is a lighter version of encryption & decryption of our previously discussed scenario. Hence, we won’t discuss in details. You can refer my earlier post to understand the logic of this script.)

###########################################
#### Written By: SATYAKI DE        ########
#### Written On: 25-Jan-2019       ########
#### Package Cryptography needs to ########
#### install in order to run this  ########
#### script.                       ########
####                               ########
#### Objective: This script will   ########
#### encrypt/decrypt based on the  ########
#### hidden supplied salt value.   ########
###########################################

from cryptography.fernet import Fernet
import logging

from getVal.clsConfigServer import clsConfigServer as csf

class clsEnDec(object):

    def __init__(self):
        # Calculating Key
        self.token = str(csf.config['DEF_SALT'])

    def encrypt_str(self, data, token):
        try:
            # Capturing the Salt Information
            t1 = self.token
            t2 = token

            if t2 == '':
                salt = t1
            else:
                salt = t2

            logging.info("Encrypting the value!")

            # Checking Individual Types inside the Dataframe
            cipher = Fernet(salt)
            encr_val = str(cipher.encrypt(bytes(data,'utf8'))).replace("b'","").replace("'","")

            strV1 = "Encrypted value:: " + str(encr_val)
            logging.info(strV1)

            return encr_val

        except Exception as e:
            x = str(e)
            print(x)
            encr_val = ''

            return encr_val

    def decrypt_str(self, data, token):
        try:
            # Capturing the Salt Information
            t1 = self.token
            t2 = token

            if t2 == '':
                salt = t1
            else:
                salt = t2

            logging.info("Decrypting the value!")

            # Checking Individual Types inside the Dataframe
            cipher = Fernet(salt)
            decr_val = str(cipher.decrypt(bytes(data,'utf8'))).replace("b'","").replace("'","")

            strV2 = "Decrypted value:: " + str(decr_val)
            logging.info(strV2)

            return decr_val

        except Exception as e:
            x = str(e)
            print(x)
            decr_val = ''

            return decr_val

3. clsFlask.py (This is the main server script that will the encrypt/decrypt class from our previous scenario. This script will capture the requested JSON from the client, who posted from the clients like another python script or third-party tools like Postman.)

###########################################
#### Written By: SATYAKI DE            ####
#### Written On: 25-Jan-2019           ####
#### Package Flask package needs to    ####
#### install in order to run this      ####
#### script.                           ####
####                                   ####
#### Objective: This script will       ####
#### encrypt/decrypt based on the      ####
#### supplied salt value. Also,        ####
#### this will capture the individual  ####
#### element & stored them into JSON   ####
#### variables using flask framework.  ####
###########################################

from getVal.clsConfigServer import clsConfigServer as csf
from getVal.clsEnDec import clsEnDecAuth

getVal = clsEnDec()

import logging

class clsFlask(object):
    def __init__(self):
        self.xtoken = str(csf.config['DEF_SALT'])

    def getEncryptProcess(self, dGroup, input_data, dTemplate):
        try:
            # It is sending default salt value
            xtoken = self.xtoken

            # Capturing the individual element
            dGroup = dGroup
            input_data = input_data
            dTemplate = dTemplate

            # This will check the mandatory json elements
            if ((dGroup != '') & (dTemplate != '')):

                # Based on the Group & Element it will fetch the salt
                # Based on the specific salt it will encrypt the data
                if ((dGroup == 'GrDet') & (dTemplate == 'subGrAcct_Nbr')):
                    xtoken = str(csf.config['ACCT_NBR_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.encrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrName')):
                    xtoken = str(csf.config['NAME_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.encrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrPhone')):
                    xtoken = str(csf.config['PHONE_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.encrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrEmail')):
                    xtoken = str(csf.config['EMAIL_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.encrypt_str(input_data, xtoken)
                else:
                    ret_val = ''
            else:
                ret_val = ''

            # Return value
            return ret_val

        except Exception as e:
            ret_val = ''
            # Return the valid json Error Response
            return ret_val

    def getDecryptProcess(self, dGroup, input_data, dTemplate):
        try:
            xtoken = self.xtoken

            # Capturing the individual element
            dGroup = dGroup
            input_data = input_data
            dTemplate = dTemplate

            # This will check the mandatory json elements
            if ((dGroup != '') & (dTemplate != '')):

                # Based on the Group & Element it will fetch the salt
                # Based on the specific salt it will decrypt the data
                if ((dGroup == 'GrDet') & (dTemplate == 'subGrAcct_Nbr')):
                    xtoken = str(csf.config['ACCT_NBR_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.decrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrName')):
                    xtoken = str(csf.config['NAME_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.decrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrPhone')):
                    xtoken = str(csf.config['PHONE_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.decrypt_str(input_data, xtoken)
                elif ((dGroup == 'GrDet') & (dTemplate == 'subGrEmail')):
                    xtoken = str(csf.config['EMAIL_SALT'])

                    strV1 = "xtoken: " + str(xtoken)
                    logging.info(strV1)
                    strV2 = "Flask Input Data: " + str(input_data)
                    logging.info(strV2)

                    #x = cen.clsEnDecAuth()
                    ret_val = getVal.decrypt_str(input_data, xtoken)
                else:
                    ret_val = ''
            else:
                ret_val = ''

            # Return value
            return ret_val

        except Exception as e:
            ret_val = ''
            # Return the valid Error Response
            return ret_val

4. __init__.py (This autogenerated script contains the primary calling methods of encryption & decryption based on the element header & values after enhanced as per the functionality.)

###########################################
#### Written By: SATYAKI DE            ####
#### Written On: 08-Jun-2019           ####
#### Package Flask package needs to    ####
#### install in order to run this      ####
#### script.                           ####
####                                   ####
#### Objective: Main Calling scripts.  ####
#### This is an autogenrate scripts.   ####
#### However, to meet the functionality####
#### we've enhanced as per our logic.  ####
###########################################
__all__ = ['clsFlask']

import logging
import azure.functions as func
import json

from getVal.clsFlask import clsFlask

getVal = clsFlask()

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python Encryption function processed a request.')

    str_val = 'Input Payload:: ' + str(req.get_json())
    str_1 = str(req.get_json())

    logging.info(str_val)

    ret_val = {}
    DataIn = ''
    dGroup = ''
    dTemplate = ''
    flg = ''

    if (str_1 != ''):
        try:
            req_body = req.get_json()
            dGroup = req_body.get('dataGroup')

            try:
                DataIn = req_body.get('data')
                strV15 = 'If Part:: ' + str(DataIn)

                logging.info(strV15)

                if ((DataIn == '') | (DataIn == None)):
                    raise ValueError

                flg = 'Y'
            except ValueError:
                DataIn = req_body.get('edata')
                strV15 = 'Else Part:: ' + str(DataIn)
                logging.info(strV15)
                flg = 'N'
            except:
                DataIn = req_body.get('edata')
                strV15 = 'Else Part:: ' + str(DataIn)
                logging.info(strV15)
                flg = 'N'

            dTemplate = req_body.get('dataTemplate')

        except ValueError:
            pass

    strV5 = "Encrypt Decrypt Flag:: " + flg
    logging.info(strV5)

    if (flg == 'Y'):

        if ((DataIn != '') & ((dGroup != '') & (dTemplate != ''))):

            logging.info("Encryption Started!")
            ret_val = getVal.getEncryptProcess(dGroup, DataIn, dTemplate)
            strVal2 = 'Return Payload:: ' + str(ret_val)
            logging.info(strVal2)

            xval = json.dumps(ret_val)

            return func.HttpResponse(xval)
        else:
            return func.HttpResponse(
                 "Please pass a data in the request body",
                 status_code=400
            )
    else:

        if ((DataIn != '') & ((dGroup != '') & (dTemplate != ''))):

            logging.info("Decryption Started!")
            ret_val2 = getVal.getDecryptProcess(dGroup, DataIn, dTemplate)
            strVal3 = 'Return Payload:: ' + str(ret_val)
            logging.info(strVal3)

            xval1 = json.dumps(ret_val2)

            return func.HttpResponse(xval1)
        else:
            return func.HttpResponse(
                "Please pass a data in the request body",
                status_code=400
            )

In this script, based on the value of an flg variable, we’re calling our encryption or decryption methods. And, the value of the flg variable is set based on the following logic –

try:
    DataIn = req_body.get('data')
    strV15 = 'If Part:: ' + str(DataIn)

    logging.info(strV15)

    if ((DataIn == '') | (DataIn == None)):
        raise ValueError

    flg = 'Y'
except ValueError:
    DataIn = req_body.get('edata')
    strV15 = 'Else Part:: ' + str(DataIn)
    logging.info(strV15)
    flg = 'N'
except:
    DataIn = req_body.get('edata')
    strV15 = 'Else Part:: ' + str(DataIn)
    logging.info(strV15)
    flg = 'N'

So, if the application gets the “data” element then – it will consider the data needs to be encrypted; otherwise, it will go for decryption. And, based on that – it is setting the value.

Now, we’re ready to locally run our application –

func host start

And, the output will look like this –

Let’s test it from postman –

Encrypt:

Decrypt:

Great. Now, we’re ready to publish this application to Azure cloud.

As in our earlier steps, we’ve already built our storage account for the metadata. Please scroll to top to view that again. Now, using that information, we’ll make the function app with a more meaningful name –

az functionapp create –resource-group rndWestUSGrp –os-type Linux \
–consumption-plan-location westus –runtime python \
–name getEncryptDecrypt –storage-account cryptpy2019

Let’s publish the function –

sudo func azure functionapp publish “getEncryptDecrypt” –build-native-deps

On many occassion, without the use of “–build-native-deps” might leads to failure. Hence, I’ve added that to avoid such scenarios.

Now, we need to test our first published complex Azure function with Python through postman –

Encrypt:

Decrypt:

Wonderful! So, it is working.

You can see the function under the Azure portal –

Let’s see some other important features of this function –

Monitor: You can monitor two ways. One is by clicking the monitor options you will get the individual requests level details & also get to see the log information over here –

Clicking Application Insights will give you another level of detailed logs, which can be very useful for debugging. We’ll touch this at the end of this post with a very brief discussion.

As you can see, clicking individual lines will show the details further.

Let’s quickly check the application insights –

Application Insights will give you a SQL like an interface where you can get the log details of all your requests.

You can expand the individual details for further information.

You can change the parameter name & other details & click the run button to get all the log details for your debugging purpose.

So, finally, we’ve achieved our goal. This is relatively long posts. But, I’m sure this will help you to create your first python-based function on the Azure platform.

Hope, you will like this approach. Let me know your comment on the same.

I’ll bring some more exciting topic in the coming days from the Python verse.

Till then, Happy Avenging! 😀

Note: All the data posted here are representational data & available over the internet.

Magic SQL

Posted on January 5, 2010 by SatyakiDe in cast, comma-separated, dbms_debug_vc2coll, dbms_xmlgen, getxmltype, return, v$version, wm_concat, xquery

Few years before in OTN one of the user is looking for a solutions, which we think might not possible to provide in a single SQL solution. At that time Michael came to rescue that and for the first time he showed some interesting XML Kung-Fu to all of us and earned a great reputation for providing magic solution to others. I personally love to call them as Magic SQL.

The following SQL calculates number of rows in each table in a specific schema without visiting any dba views. This particular script is my 2nd personal favourite.

scott@ORCL>
scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE    11.1.0.6.0      Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.01
scott@ORCL>
scott@ORCL>
scott@ORCL>select table_name,
  2         DBMS_XMLGEN.getxmltype(
  3                                 'select count(*) c from '||table_name
  4                               ).extract('//text()').getnumberval() tot_rows
  5  from user_tables
  6  where iot_type is null
  7  or iot_type != 'IOT_OVERFLOW';

TABLE_NAME                       TOT_ROWS
------------------------------ ----------
DEPT                                    4
EMP                                    14
BONUS                                   0
SALGRADE                                5
EMP_DETAILS                             3
T_EMP                                   0
AUDIT_T                                 0
C_C                                     4
TRAIN_DET                               2
TEST_COM_IMP                            2
TIME_P                                  1

TABLE_NAME                       TOT_ROWS
------------------------------ ----------
PRI_UQ                                  4
TEST_CHK                                0
ANSHUMANSAHAISGENIOUS                   1
XEUROPE                                 2
D_D                                     8
PUBLICTRANSPORT                         4
XASIA                                   2
TF1                                     0
TEST_HH                                14
TEST_SWAP                               4
XGMT                                    1

TABLE_NAME                       TOT_ROWS
------------------------------ ----------
CUSTOMER_DET                            1
FOURWHEELER                             2
SPOOL_LOG                              13
CITYTRANSPORT                           8
T1                                      2
T2                                      2
A_A                                     1
B_B                                     2
AUTOMOBILE                              1
XDUAL                                   1
S_TEMP                                  0

33 rows selected.

Elapsed: 00:00:00.42
scott@ORCL>

But, this particular script has some limitations on Index-Organized-Table. But, overall this will work brilliantly without even touching all the dba views.

Grouped information in comma-separated values.

scott@ORCL>with tt
  2  as
  3    (
  4      select 1 id, 'saty' name from dual
  5      union all
  6      select 1, 'anup' from dual
  7      union all
  8      select 1, 'sas' from dual
  9      union all
 10      select 2, 'rajib' from dual
 11      union all
 12      select 2, 'amit' from dual
 13    )
 14  select id,
 15         cast(wmsys.wm_concat(name) as varchar2(100)) src
 16  from tt
 17  group by id;

        ID SRC
---------- ------------------------------------------------------
         1 saty,anup,sas
         2 rajib,amit

Elapsed: 00:00:01.62
scott@ORCL>
scott@ORCL>

Function wm_concat is undocumented. So, you cannot use it in production environment. Even if you use – you won’t get any technical support from oracle if you have any production issue due to this. So, better not to use this function. But, certainly this reduces lots of our effort and provides a solution using single SQL. I’m still waiting to see it as documented function by Oracle. Till then, you have to go for your custom tailored solution.

The following sqls parse comma-separated values into rows. But, this cannot be applicable into any column of existing tables.

scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
  2  FROM TABLE(SYS.dbms_debug_vc2coll (24, 34, 25));

RES
----------------------------------------
24
34
25

Elapsed: 00:00:00.03
scott@ORCL>
scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
  2  FROM TABLE(SYS.dbms_debug_vc2coll ('A', 'B', 'C'));

RES
----------------------------------------
A
B
C

Elapsed: 00:00:00.05
scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
  2  FROM TABLE(SYS.dbms_debug_vc2coll (24, 'B', '@'));

RES
----------------------------------------
24
B
@

Elapsed: 00:00:00.14
scott@ORCL>

This can handle alpha numeric data type without declaring any custom type for it.

Following SQL is which i prefer most as one of the brilliant features introduced in Oracle and can surprise many developer. Write a select query to retrieve any column information without using the select clause. I know – it sounds crazy. But, you really can do that.

That is what i consider is the leading contender of my Magic SQL category.

scott@ORCL>
scott@ORCL>
scott@ORCL>xquery for $i in ora:view("emp")/ROW/ENAME return $i/text()
  2  /

Result Sequence
----------------------------------------------------------------------------
SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS

Result Sequence
----------------------------------------------------------------------------
JAMES
FORD
MILLER

14 item(s) selected.

Elapsed: 00:00:00.14
scott@ORCL>
scott@ORCL>

Hope you liked this edition.

Regards.

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Category: cast

Handling unique data using the python-based FastDataMask Package package

Like this:

Predicting health issues for Senior Citizens based on “Realtime Weather Data” in Python

Like this:

Magic SQL

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Initial Environment Preparation:

Share this:

Like this:

Share this:

Like this: