Today, I’ll discuss another important topic before I will share the excellent use case next month, as I still need some time to finish that one. We’ll see how we can leverage the brilliant capability of a low-code machine-learning library named PyCaret.
But before going through the details, why don’t we view the demo & then go through it?
Architecture:
Let us understand the flow of events –

As one can see, the initial training requests are triggered from the PyCaret-driven training models. And the application can successfully process & identify the best models out of the other combinations.
Python Packages:
Following are the python packages that are necessary to develop this use case –
pip install pandas
pip install pycaret
PyCaret is dependent on a combination of other popular python packages. So, you need to install them successfully to run this package.
CODE:
- clsConfigClient.py (Main configuration file)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
################################################ | |
#### Written By: SATYAKI DE #### | |
#### Written On: 15-May-2020 #### | |
#### Modified On: 31-Mar-2023 #### | |
#### #### | |
#### Objective: This script is a config #### | |
#### file, contains all the keys for #### | |
#### personal AI-driven voice assistant. #### | |
#### #### | |
################################################ | |
import os | |
import platform as pl | |
class clsConfigClient(object): | |
Curr_Path = os.path.dirname(os.path.realpath(__file__)) | |
os_det = pl.system() | |
if os_det == "Windows": | |
sep = '\\' | |
else: | |
sep = '/' | |
conf = { | |
'APP_ID': 1, | |
'ARCH_DIR': Curr_Path + sep + 'arch' + sep, | |
'PROFILE_PATH': Curr_Path + sep + 'profile' + sep, | |
'LOG_PATH': Curr_Path + sep + 'log' + sep, | |
'DATA_PATH': Curr_Path + sep + 'data' + sep, | |
'MODEL_PATH': Curr_Path + sep + 'model' + sep, | |
'TEMP_PATH': Curr_Path + sep + 'temp' + sep, | |
'MODEL_DIR': 'model', | |
'APP_DESC_1': 'PyCaret Training!', | |
'DEBUG_IND': 'N', | |
'INIT_PATH': Curr_Path, | |
'FILE_NAME': 'Titanic.csv', | |
'MODEL_NAME': 'PyCaret-ft-personal-2023-03-31-04-29-53', | |
'TITLE': "PyCaret Training!", | |
'PATH' : Curr_Path, | |
'OUT_DIR': 'data' | |
} |
I’m skipping this section as it is self-explanatory.
- clsTrainModel.py (This is the main class that contains the core logic of low-code machine-learning library to evaluate the best model for your solutions.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##################################################### | |
#### Written By: SATYAKI DE #### | |
#### Written On: 31-Mar-2023 #### | |
#### Modified On 31-Mar-2023 #### | |
#### #### | |
#### Objective: This is the main class that #### | |
#### contains the core logic of low-code #### | |
#### machine-learning library to evaluate the #### | |
#### best model for your solutions. #### | |
#### #### | |
##################################################### | |
import clsL as cl | |
from clsConfigClient import clsConfigClient as cf | |
import datetime | |
# Import necessary libraries | |
import pandas as p | |
from pycaret.classification import * | |
# Disbling Warning | |
def warn(*args, **kwargs): | |
pass | |
import warnings | |
warnings.warn = warn | |
###################################### | |
### Get your global values #### | |
###################################### | |
debug_ind = 'Y' | |
# Initiating Logging Instances | |
clog = cl.clsL() | |
############################################### | |
### End of Global Section ### | |
############################################### | |
class clsTrainModel: | |
def __init__(self): | |
self.model_path = cf.conf['MODEL_PATH'] | |
self.model_name = cf.conf['MODEL_NAME'] | |
def trainModel(self, FullFileName): | |
try: | |
df = p.read_csv(FullFileName) | |
row_count = int(df.shape[0]) | |
print('Number of rows: ', str(row_count)) | |
print(df) | |
# Initialize the setup in PyCaret | |
clf_setup = setup( | |
data=df, | |
target="Survived", | |
train_size=0.8, # 80% for training, 20% for testing | |
categorical_features=["Sex", "Embarked"], | |
ordinal_features={"Pclass": ["1", "2", "3"]}, | |
ignore_features=["Name", "Ticket", "Cabin", "PassengerId"], | |
#silent=True, # Set to False for interactive setup | |
) | |
# Compare various models | |
best_model = compare_models() | |
# Create a specific model (e.g., Random Forest) | |
rf_model = create_model("rf") | |
# Hyperparameter tuning | |
tuned_rf_model = tune_model(rf_model) | |
# Evaluate model performance | |
plot_model(tuned_rf_model, plot="confusion_matrix") | |
plot_model(tuned_rf_model, plot="auc") | |
# Finalize the model (train on the complete dataset) | |
final_rf_model = finalize_model(tuned_rf_model) | |
# Make predictions on new data | |
new_data = df.drop("Survived", axis=1) | |
predictions = predict_model(final_rf_model, data=new_data) | |
# Writing into the Model | |
FullModelName = self.model_path + self.model_name | |
print('Model Output @:: ', str(FullModelName)) | |
print() | |
# Save the fine-tuned model | |
save_model(final_rf_model, FullModelName) | |
return 0 | |
except Exception as e: | |
x = str(e) | |
print('Error: ', x) | |
return 1 |
Let us understand the code in simple terms –
- Import necessary libraries and load the Titanic dataset.
- Initialize the PyCaret setup, specifying the target variable, train-test split, categorical and ordinal features, and features to ignore.
- Compare various models to find the best-performing one.
- Create a specific model (Random Forest in this case).
- Perform hyper-parameter tuning on the Random Forest model.
- Evaluate the model’s performance using a confusion matrix and AUC-ROC curve.
- Finalize the model by training it on the complete dataset.
- Make predictions on new data.
- Save the trained model for future use.
- trainPYCARETModel.py (This is the main calling python script that will invoke the training class of PyCaret package.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##################################################### | |
#### Written By: SATYAKI DE #### | |
#### Written On: 31-Mar-2023 #### | |
#### Modified On 31-Mar-2023 #### | |
#### #### | |
#### Objective: This is the main calling #### | |
#### python script that will invoke the #### | |
#### training class of Pycaret package. #### | |
#### #### | |
##################################################### | |
import clsL as cl | |
from clsConfigClient import clsConfigClient as cf | |
import datetime | |
import clsTrainModel as tm | |
# Disbling Warning | |
def warn(*args, **kwargs): | |
pass | |
import warnings | |
warnings.warn = warn | |
###################################### | |
### Get your global values #### | |
###################################### | |
debug_ind = 'Y' | |
# Initiating Logging Instances | |
clog = cl.clsL() | |
data_path = cf.conf['DATA_PATH'] | |
data_file_name = cf.conf['FILE_NAME'] | |
tModel = tm.clsTrainModel() | |
###################################### | |
#### Global Flag ######## | |
###################################### | |
def main(): | |
try: | |
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") | |
print('*'*120) | |
print('Start Time: ' + str(var)) | |
print('*'*120) | |
FullFileName = data_path + data_file_name | |
r1 = tModel.trainModel(FullFileName) | |
if r1 == 0: | |
print('Successfully Trained!') | |
else: | |
print('Failed to Train!') | |
print('*'*120) | |
var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") | |
print('End Time: ' + str(var1)) | |
except Exception as e: | |
x = str(e) | |
print('Error: ', x) | |
if __name__ == "__main__": | |
main() |
The above code is pretty self-explanatory as well.
- testPYCARETModel.py (This is the main calling python script that will invoke the testing script for PyCaret package.)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##################################################### | |
#### Written By: SATYAKI DE #### | |
#### Written On: 31-Mar-2023 #### | |
#### Modified On 31-Mar-2023 #### | |
#### #### | |
#### Objective: This is the main calling #### | |
#### python script that will invoke the #### | |
#### testing script for PyCaret package. #### | |
#### #### | |
##################################################### | |
import clsL as cl | |
from clsConfigClient import clsConfigClient as cf | |
import datetime | |
from pycaret.classification import load_model, predict_model | |
import pandas as p | |
# Disbling Warning | |
def warn(*args, **kwargs): | |
pass | |
import warnings | |
warnings.warn = warn | |
###################################### | |
### Get your global values #### | |
###################################### | |
debug_ind = 'Y' | |
# Initiating Logging Instances | |
clog = cl.clsL() | |
model_path = cf.conf['MODEL_PATH'] | |
model_name = cf.conf['MODEL_NAME'] | |
###################################### | |
#### Global Flag ######## | |
###################################### | |
def main(): | |
try: | |
var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") | |
print('*'*120) | |
print('Start Time: ' + str(var)) | |
print('*'*120) | |
FullFileName = model_path + model_name | |
# Load the saved model | |
loaded_model = load_model(FullFileName) | |
# Prepare new data for testing (make sure it has the same columns as the original data) | |
new_data = p.DataFrame({ | |
"Pclass": [3, 1], | |
"Sex": ["male", "female"], | |
"Age": [22, 38], | |
"SibSp": [1, 1], | |
"Parch": [0, 0], | |
"Fare": [7.25, 71.2833], | |
"Embarked": ["S", "C"] | |
}) | |
# Make predictions using the loaded model | |
predictions = predict_model(loaded_model, data=new_data) | |
# Display the predictions | |
print(predictions) | |
print('*'*120) | |
var1 = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") | |
print('End Time: ' + str(var1)) | |
except Exception as e: | |
x = str(e) | |
print('Error: ', x) | |
if __name__ == "__main__": | |
main() |
In this code, the application uses the stored model & then forecasts based on the optimized PyCaret model tuning.
Conclusion:
The above code demonstrates an end-to-end binary classification pipeline using the PyCaret library for the Titanic dataset. The goal is to predict whether a passenger survived based on the available features. Here are some conclusions you can draw from the code and data:
- Ease of use: The code showcases how PyCaret simplifies the machine learning process, from data preprocessing to model training, evaluation, and deployment. With just a few lines of code, you can perform tasks that would require much more effort using lower-level libraries.
- Model selection: The compare_models() function provides a quick and easy way to compare various machine learning algorithms and identify the best-performing one based on the chosen evaluation metric (accuracy by default). This selection helps you select a suitable model for the given problem.
- Hyper-parameter tuning: The tune_model() function automates the process of hyper-parameter tuning to improve model performance. We tuned a Random Forest model to optimize its predictive power in the example.
- Model evaluation: PyCaret provides several built-in visualization tools for assessing model performance. In the example, we used a confusion matrix and AUC-ROC curve to evaluate the performance of the tuned Random Forest model.
- Model deployment: The example demonstrates how to make predictions using the trained model and save the model for future use. This deployment showcases how PyCaret can streamline the process of deploying a machine-learning model in a production environment.
It is important to note that the conclusions drawn from the code and data are specific to the Titanic dataset and the chosen features. Adjust the feature engineering, preprocessing, and model selection steps for different datasets or problems accordingly. However, the general workflow and benefits provided by PyCaret would remain the same.
So, finally, we’ve done it.
I know that this post is relatively bigger than my earlier post. But, I think, you can get all the details once you go through it.
You will get the complete codebase in the following GitHub link.
I’ll bring some more exciting topics in the coming days from the Python verse. Please share & subscribe to my post & let me know your feedback.
Till then, Happy Avenging! 🙂
Note: All the data & scenarios posted here are representational data & scenarios & available over the internet & for educational purposes only. Some of the images (except my photo) we’ve used are available over the net. We don’t claim ownership of these images. There is always room for improvement & especially in the prediction quality.
You must log in to post a comment.