## Predicting Flipkart business growth factor using Linear-Regression Machine Learning Model

Hi Guys,

Today, We’ll be exploring the potential business growth factor using the “Linear-Regression Machine Learning” model. We’ve prepared a set of dummy data & based on that, we’ll predict.

Let’s explore a few sample data –

So, based on these data, we would like to predict YearlyAmountSpent dependent on any one of the following features, i.e. [ Time On App / Time On Website / Flipkart Membership Duration (In Year) ].

You need to install the following packages –

pip install pandas

pip install matplotlib

pip install sklearn

We’ll be discussing only the main calling script & class script. However, we’ll be posting the parameters without discussing it. And, we won’t discuss clsL.py as we’ve already discussed that in our previous post.

1. clsConfig.py (This script contains all the parameter details.)

```################################################
#### Written By: SATYAKI DE                 ####
#### Written On: 15-May-2020                ####
####                                        ####
#### Objective: This script is a config     ####
#### file, contains all the keys for        ####
#### Machine-Learning. Application will     ####
#### process these information & perform    ####
#### various analysis on Linear-Regression. ####
################################################

import os
import platform as pl

class clsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))

os_det = pl.system()
if os_det == "Windows":
sep = '\\'
else:
sep = '/'

config = {
'APP_ID': 1,
'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
'LOG_PATH': Curr_Path + sep + 'log' + sep,
'REPORT_PATH': Curr_Path + sep + 'report',
'FILE_NAME': Curr_Path + sep + 'Data' + sep + 'FlipkartCustomers.csv',
'SRC_PATH': Curr_Path + sep + 'Data' + sep,
'APP_DESC_1': 'IBM Watson Language Understand!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path
}
```

2. clsLinearRegression.py (This is the main script, which will invoke the Machine-Learning API & return 0 if successful.)

```##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 15-May-2020              ####
#### Modified On 15-May-2020              ####
####                                      ####
#### Objective: Main scripts for Linear   ####
#### Regression.                          ####
##############################################

import pandas as p
import numpy as np
import regex as re

import matplotlib.pyplot as plt
from clsConfig import clsConfig as cf

# %matplotlib inline -- for Jupyter Notebook
class clsLinearRegression:
def __init__(self):
self.fileName =  cf.config['FILE_NAME']

def predictResult(self):
try:

inputFileName = self.fileName

print()
print('Projecting sample rows: ')

print()
x_row = df.shape[0]
x_col = df.shape[1]

print('Total Number of Rows: ', x_row)
print('Total Number of columns: ', x_col)

x = df[['TimeOnApp', 'TimeOnWebsite', 'FlipkartMembershipInYear']]

# Target Variable - Trying to predict
y = df['YearlyAmountSpent']

# Now Train-Test Split of your source data
from sklearn.model_selection import train_test_split

# test_size => % of allocated data for your test cases
# random_state => A specific set of random split on your data
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)

# Importing Model
from sklearn.linear_model import LinearRegression

# Creating an Instance
lm = LinearRegression()

# Train or Fit my model on Training Data
lm.fit(X_train, Y_train)

# Creating a prediction value
flipKartSalePrediction = lm.predict(X_test)

# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)

plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')

# Checking Individual Metrics
from sklearn import metrics

print()
mea_val = metrics.mean_absolute_error(Y_test, flipKartSalePrediction)
print('Mean Absolute Error (MEA): ', mea_val)

mse_val = metrics.mean_squared_error(Y_test, flipKartSalePrediction)
print('Mean Square Error (MSE): ', mse_val)

rmse_val = np.sqrt(metrics.mean_squared_error(Y_test, flipKartSalePrediction))
print('Square root Mean Square Error (RMSE): ', rmse_val)

print()

# Check Variance Score - R^2 Value
print('Variance Score:')
var_score = str(round(metrics.explained_variance_score(Y_test, flipKartSalePrediction) * 100, 2)).strip()
print('Our Model is', var_score, '% accurate. ')
print()

# Finding Coeficent on X_train.columns
print()
print('Finding Coeficent: ')

cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])
print('Printing the All the Factors: ')
print(cedf)

print()

# Getting the Max Value from it

# Filtering the max Value to identify the biggest Business factor

# Dropping the derived column
dfMax = dfMax.reset_index()

print(dfMax)

# Extracting Actual Business Factor from Pandas dataframe
str_factor_temp = str(dfMax.iloc[0]['index'])
str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)
str_value = str(round(float(dfMax.iloc[0]['Coefficient']),2))

print()
print('*' * 80)
print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')
print('*' * 80)
print()

# This is require when you are trying to print from conventional
# front & not using Jupyter notebook.
plt.show()

return 0

except Exception  as e:
x = str(e)
print('Error : ', x)

return 1
```

Key lines from the above snippet –

```# Adding Features
x = df[['TimeOnApp', 'TimeOnWebsite', 'FlipkartMembershipInYear']]```

Our application creating a subset of the main datagram, which contains all the features.

```# Target Variable - Trying to predict
y = df['YearlyAmountSpent']```

Now, the application is setting the target variable into ‘Y.’

```# Now Train-Test Split of your source data
from sklearn.model_selection import train_test_split

# test_size => % of allocated data for your test cases
# random_state => A specific set of random split on your data
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.4, random_state=101)```

As per “Supervised Learning,” our application is splitting the dataset into two subsets. One is to train the model & another segment is to test your final model. However, you can divide the data into three sets that include the performance statistics for a large dataset. In our case, we don’t need that as this data is significantly less.

```# Train or Fit my model on Training Data
lm.fit(X_train, Y_train)```

Our application is now training/fit the data into the model.

```# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)```

Our application projected the outcome based on the predicted data in a scatterplot graph.

Also, the following concepts captured by using our program. For more details, I’ve provided the external link for your reference –

1. Mean Absolute Error (MEA)
2. Mean Square Error (MSE)
3. Square Root Mean Square Error (RMSE)

And, the implementation has shown as –

```mea_val = metrics.mean_absolute_error(Y_test, flipKartSalePrediction)
print('Mean Absolute Error (MEA): ', mea_val)

mse_val = metrics.mean_squared_error(Y_test, flipKartSalePrediction)
print('Mean Square Error (MSE): ', mse_val)

rmse_val = np.sqrt(metrics.mean_squared_error(Y_test, flipKartSalePrediction))
print('Square Root Mean Square Error (RMSE): ', rmse_val)```

At this moment, we would like to check the credibility of our model by using the variance score are as follows –

```var_score = str(round(metrics.explained_variance_score(Y_test, flipKartSalePrediction) * 100, 2)).strip()
print('Our Model is', var_score, '% accurate. ')```

Finally, extracting the coefficient to find out, which particular feature will lead Flikkart for better sale & growth by taking the maximum of coefficient value month the all features are as shown below –

```cedf = p.DataFrame(lm.coef_, x.columns, columns=['Coefficient'])

# Getting the Max Value from it

# Filtering the max Value to identify the biggest Business factor

# Dropping the derived column
dfMax = dfMax.reset_index()```

Note that we’ve used a regular expression to split the camel-case column name from our feature & represent that with a much more meaningful name without changing the column name.

```# Extracting Actual Business Factor from Pandas dataframe
str_factor_temp = str(dfMax.iloc[0]['index'])
str_factor = re.sub("([a-z])([A-Z])", "\g<1> \g<2>", str_factor_temp)
str_value = str(round(float(dfMax.iloc[0]['Coefficient']),2))

print('Major Busienss Activity - (', str_factor, ') - ', str_value, '%')```

3. callLinear.py (This is the first calling script.)

```##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 15-May-2020              ####
#### Modified On 15-May-2020              ####
####                                      ####
#### Objective: Main calling scripts.     ####
##############################################

from clsConfig import clsConfig as cf
import clsL as cl
import logging
import datetime
import clsLinearRegression as cw

# Disbling Warning
def warn(*args, **kwargs):
pass

import warnings
warnings.warn = warn

# Lookup functions from
# Azure cloud SQL DB

var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

def main():
try:
ret_1 = 0
general_log_path = str(cf.config['LOG_PATH'])

# Enabling Logging Info
logging.basicConfig(filename=general_log_path + 'MachineLearning_LinearRegression.log', level=logging.INFO)

# Initiating Log Class
l = cl.clsL()

# Moving previous day log files to archive directory
log_dir = cf.config['LOG_PATH']
curr_ver =datetime.datetime.now().strftime("%Y-%m-%d")

tmpR0 = "*" * 157

logging.info(tmpR0)
tmpR9 = 'Start Time: ' + str(var)
logging.info(tmpR9)
logging.info(tmpR0)

print("Log Directory::", log_dir)
tmpR1 = 'Log Directory::' + log_dir
logging.info(tmpR1)

print('Machine Learning - Linear Regression Prediction : ')
print('-' * 200)

# Create the instance of the Linear-Regression Class
x2 = cw.clsLinearRegression()

ret = x2.predictResult()

if ret == 0:
print('Successful Linear-Regression Prediction Generated!')
else:
print('Failed to generate Linear-Regression Prediction!')

print("-" * 200)
print()

print('Finding Analysis points..')
print("*" * 200)
logging.info('Finding Analysis points..')
logging.info(tmpR0)

tmpR10 = 'End Time: ' + str(var)
logging.info(tmpR10)
logging.info(tmpR0)

except ValueError as e:
print(str(e))
logging.info(str(e))

except Exception as e:
print("Top level Error: args:{0}, message{1}".format(e.args, e.message))

if __name__ == "__main__":
main()
```

Key snippet from the above script –

```# Create the instance of the Linear-Regression
x2 = cw.clsLinearRegression()

ret = x2.predictResult()```

In the above snippet, our application initially creating an instance of the main class & finally invokes the “predictResult” method.

Let’s run our application –

Step 1:

First, the application will fetch the following sample rows from our source file – if it is successful.

Step 2:

Then, It will create the following scatterplot by executing the following snippet –

```# Creating a scatter plot based on Actual Value & Predicted Value
plt.scatter(Y_test, flipKartSalePrediction)```

Note that our model is pretty accurate & it has a balanced success rate compared to our predicted numbers.

Step 3:

Finally, it is successfully able to project the critical feature are shown below –

From the above picture, you can see that our model is pretty accurate (89% approx).

Also, highlighted red square identifying the key-features & their confidence score & finally, the projecting the winner feature marked in green.

So, as per that, we’ve come to one conclusion that Flipkart’s business growth depends on the tenure of their subscriber, i.e., old members are prone to buy more than newer members.

Let’s look into our directory structure –

So, we’ve done it.

I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀

Note: All the data posted here are representational data & available over the internet & for educational purpose only.

## Analyzing Language using IBM Watson using Python

Hi Guys,

Today, I’ll be discussing the following topic – “How to analyze text using IBM Watson implementing through Python.”

IBM has significantly improved in the field of Visual Image Analysis or Text language analysis using its IBM Watson cloud platform. In this particular topic, we’ll be exploring the natural languages only.

To access IBM API, we need to first create an IBM Cloud account from this site.

Let us quickly go through the steps to create the IBM Language Understanding service. Click the Catalog on top of your browser menu as shown in the below picture –

After that, click the AI option on your left-hand side of the panel marked in RED.

Click the Watson-Studio & later choose the plan. In our case, We’ll select the “Lite” option as IBM provided this platform for all the developers to explore their cloud for free.

Clicking the create option will lead to a blank page of Watson Studio as shown below –

And, now, we need to click the Get Started button to launch it. This will lead to Create Project page, which can be done using the following steps –

Now, clicking the create a project will lead you to the next screen –

You can choose either an empty project, or you can create it from a sample file. In this case, we’ll be selecting the first option & this will lead us to the below page –

And, then you will click the “Create” option, which will lead you to the next screen –

Now, you need to click “Add to Project.” This will give you a variety of services that you want to explore/use from the list. If you want to create your own natural language classifier, which you can do that as follows –

Once, you click it – you need to select the associate service –

Here, you need to click the hyperlink, which prompts to the next screen –

You need to check the price for both the Visual & Natural Language Classifier. They are pretty expensive. The visual classifier has the Lite plan. However, it has limitations of output.

Clicking the “Create” will prompt to the next screen –

After successful creation, you will be redirected to the following page –

Now, We’ll be adding our “Natural Language Understand” for our test –

This will prompt the next screen –

Once, it is successful. You will see the service registered as shown below –

If you click the service marked in RED, it will lead you to another page, where you will get the API Key & Url. You need both of this information in Python application to access this API as shown below –

Now, we’re ready with the necessary cloud set-up. After this, we need to install the Python package for IBM Cloud as shown below –

We’ve noticed that, recently, IBM has launched one upgraded package. Hence, we installed that one as well. I would recommend you to install this second package directly instead of the first one shown above –

Now, we’re done with our set-up.

Let’s see the directory structure –

We’ll be discussing only the main calling script & class script. However, we’ll be posting the parameters without discussing it. And, we won’t discuss clsL.py as we’ve already discussed that in our previous post.

1. clsConfig.py (This script contains all the parameter details.)

```##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 04-Apr-2020              ####
####                                      ####
#### Objective: This script is a config   ####
#### file, contains all the keys for      ####
#### IBM Cloud API.   Application will    ####
#### process these information & perform  ####
#### various analysis on IBM Watson cloud.####
##############################################

import os
import platform as pl

class clsConfig(object):
Curr_Path = os.path.dirname(os.path.realpath(__file__))

os_det = pl.system()
if os_det == "Windows":
sep = '\\'
else:
sep = '/'

config = {
'APP_ID': 1,
'SERVICE_URL': "https://api.eu-gb.natural-language-understanding.watson.cloud.ibm.com/instances/xxxxxxxxxxxxxxXXXXXXXXXXxxxxxxxxxxxxxxxx",
'API_KEY': "Xxxxxxxxxxxxxkdkdfifd984djddkkdkdkdsSSdkdkdd",
'API_TYPE': "application/json",
'CACHE': "no-cache",
'CON': "keep-alive",
'ARCH_DIR': Curr_Path + sep + 'arch' + sep,
'PROFILE_PATH': Curr_Path + sep + 'profile' + sep,
'LOG_PATH': Curr_Path + sep + 'log' + sep,
'REPORT_PATH': Curr_Path + sep + 'report',
'SRC_PATH': Curr_Path + sep + 'Src_File' + sep,
'APP_DESC_1': 'IBM Watson Language Understand!',
'DEBUG_IND': 'N',
'INIT_PATH': Curr_Path
}
```

Note that you will be placing your API_KEY & URL here, as shown in the configuration file.

2. clsIBMWatson.py (This is the main script, which will invoke the IBM Watson API based on the input from the user & return 0 if successful.)

```##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 04-Apr-2020              ####
#### Modified On 04-Apr-2020              ####
####                                      ####
#### Objective: Main scripts to invoke    ####
#### IBM Watson Language Understand API.  ####
##############################################

import logging
from clsConfig import clsConfig as cf
import clsL as cl
import json
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions, SentimentOptions, CategoriesOptions, ConceptsOptions
from ibm_watson import ApiException

class clsIBMWatson:
def __init__(self):
self.api_key =  cf.config['API_KEY']
self.service_url = cf.config['SERVICE_URL']

def calculateExpressionFromUrl(self, inputUrl, inputVersion):
try:
api_key = self.api_key
service_url = self.service_url
print('-' * 60)
print('Beginning of the IBM Watson for Input Url.')
print('-' * 60)

authenticator = IAMAuthenticator(api_key)

# Authentication via service credentials provided in our config files
service = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)
service.set_service_url(service_url)

response = service.analyze(
url=inputUrl,
features=Features(entities=EntitiesOptions(),
sentiment=SentimentOptions(),
concepts=ConceptsOptions())).get_result()

print(json.dumps(response, indent=2))

return 0

except ApiException as ex:
print('-' * 60)
print("Method failed for Url with status code " + str(ex.code) + ": " + ex.message)
print('-' * 60)

return 1

def calculateExpressionFromText(self, inputText, inputVersion):
try:
api_key = self.api_key
service_url = self.service_url
print('-' * 60)
print('Beginning of the IBM Watson for Input Url.')
print('-' * 60)

authenticator = IAMAuthenticator(api_key)

# Authentication via service credentials provided in our config files
service = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)
service.set_service_url(service_url)

response = service.analyze(
text=inputText,
features=Features(entities=EntitiesOptions(),
sentiment=SentimentOptions(),
concepts=ConceptsOptions())).get_result()

print(json.dumps(response, indent=2))

return 0

except ApiException as ex:
print('-' * 60)
print("Method failed for Url with status code " + str(ex.code) + ": " + ex.message)
print('-' * 60)

return 1
```

Some of the key lines from the above snippet –

```authenticator = IAMAuthenticator(api_key)

# Authentication via service credentials provided in our config files
service = NaturalLanguageUnderstandingV1(version=inputVersion, authenticator=authenticator)
service.set_service_url(service_url)```

By providing the API Key & Url, the application is initiating the service for Watson.

```response = service.analyze(
url=inputUrl,
features=Features(entities=EntitiesOptions(),
sentiment=SentimentOptions(),
concepts=ConceptsOptions())).get_result()```

Based on your type of input, it will bring the features of entities, sentiment & concepts here. Apart from that, you can additionally check the following features as well – Keywords & Categories.

3. callIBMWatsonAPI.py (This is the first calling script. Based on user choice, it will receive input either as Url or as the plain text & then analyze it.)

```##############################################
#### Written By: SATYAKI DE               ####
#### Written On: 04-Apr-2020              ####
#### Modified On 04-Apr-2020              ####
####                                      ####
#### Objective: Main calling scripts.     ####
##############################################

from clsConfig import clsConfig as cf
import clsL as cl
import logging
import datetime
import clsIBMWatson as cw

# Disbling Warning
def warn(*args, **kwargs):
pass

import warnings
warnings.warn = warn

# Lookup functions from
# Azure cloud SQL DB

var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

def main():
try:
ret_1 = 0
general_log_path = str(cf.config['LOG_PATH'])

# Enabling Logging Info
logging.basicConfig(filename=general_log_path + 'IBMWatson_NaturalLanguageAnalysis.log', level=logging.INFO)

# Initiating Log Class
l = cl.clsL()

# Moving previous day log files to archive directory
log_dir = cf.config['LOG_PATH']
curr_ver =datetime.datetime.now().strftime("%Y-%m-%d")

tmpR0 = "*" * 157

logging.info(tmpR0)
tmpR9 = 'Start Time: ' + str(var)
logging.info(tmpR9)
logging.info(tmpR0)

print("Log Directory::", log_dir)
tmpR1 = 'Log Directory::' + log_dir
logging.info(tmpR1)

print('Welcome to IBM Wantson Language Understanding Calling Program: ')
print('-' * 60)
print('Please Press 1 for Understand the language from Url.')

# Create the instance of the IBM Watson Class
x2 = cw.clsIBMWatson()

# Let's pass this to our map section
if input_choice == 1:
textUrl = str(input('Please provide the complete input url:'))
ret_1 = x2.calculateExpressionFromUrl(textUrl, curr_ver)
elif input_choice == 2:
inputText = str(input('Please provide the input text:'))
ret_1 = x2.calculateExpressionFromText(inputText, curr_ver)
else:
print('Invalid options!')

if ret_1 == 0:
print('Successful IBM Watson Language Understanding Generated!')
else:
print('Failed to generate IBM Watson Language Understanding!')

print("-" * 60)
print()

print('Finding Analysis points..')
print("*" * 157)
logging.info('Finding Analysis points..')
logging.info(tmpR0)

tmpR10 = 'End Time: ' + str(var)
logging.info(tmpR10)
logging.info(tmpR0)

except ValueError as e:
print(str(e))
print("Invalid option!")
logging.info("Invalid option!")

except Exception as e:
print("Top level Error: args:{0}, message{1}".format(e.args, e.message))

if __name__ == "__main__":
main()
```

This script is pretty straight forward as it is first creating an instance of the main class & then based on the user input, it is calling the respective functions here.

As of now, IBM Watson can work on a list of languages, which are available here.

If you want to start from scratch, please refer to the following link.

Please find the screenshot of our application run –

Case 1 (With Url):

Case 2 (With Plain text):

Now, Don’t forget to delete all the services from your IBM Cloud.

As you can see, from the service, you need to delete all the services one-by-one as shown in the figure.

So, we’ve done it.

To explore my photography, you can visit the following link.

I’ll be posting another new post in the coming days. Till then, Happy Avenging! 😀

Note: All the data posted here are representational data & available over the internet & for educational purpose only.