This site mainly deals with various use cases demonstrated using Python, Data Science, Cloud basics, SQL Server, Oracle, Teradata along with SQL & their implementation. Expecting yours active participation & time. This blog can be access from your TP, Tablet & mobile also. Please provide your feedback.
This is a continuation of my previous post, which can be found here. This will be our last post of this series.
Let us recap the key takaways from our previous post –
Two cloud patterns show how MCP standardizes safe AI-to-system work. Azure “agent factory”: You ask in Teams; Azure AI Foundry dispatches a specialist agent (HR/Sales). The agent calls a specific MCP server (Functions/Logic Apps) for CRM, SharePoint, or SQL via API Management. Entra ID enforces access; Azure Monitor audits. AWS “composable serverless agents”: In Bedrock, domain agents (Financial/IT Ops) invoke Lambda-based MCP tools for DynamoDB, S3, or CloudWatch through API Gateway with IAM and optional VPC. In both, agents never hold credentials; tools map one-to-one to systems, improving security, clarity, scalability, and compliance.
In this post, we’ll discuss the GCP factory pattern.
Unified Workbench Pattern (GCP):
The GCP “unified workbench” pattern prioritizes a unified, data-centric platform for AI development, integrating seamlessly with Vertex AI and Google’s expertise in AI and data analytics. This approach is well-suited for AI-first companies and data-intensive organizations that want to build agents that leverage cutting-edge research tools.
Let’s explore the following diagram based on this –
Imagine Mia, a clinical operations lead, opens a simple app and asks: “Which clinics had the longest wait times this week? Give me a quick summary I can share.”
The app quietly sends Mia’s request to Vertex AI Agent Builder—think of it as the switchboard operator.
Vertex AI picks the Data Analysis agent (the “specialist” for questions like Mia’s).
That agent doesn’t go rummaging through databases. Instead, it uses a safe, preapproved tool—an MCP Server—to query BigQuery, where the data lives.
The tool fetches results and returns them to Mia—no passwords in the open, no risky shortcuts—just the answer, fast and safely.
Now meet Ravi, a developer who asks: “Show me the latest app metrics and confirm yesterday’s patch didn’t break the login table.”
The app routes Ravi’s request to Vertex AI.
Vertex AI chooses the Developer agent.
That agent calls a different tool—an MCP Server designed for Cloud SQL—to check the login table and run a safe query.
Results come back with guardrails intact. If the agent ever needs files, there’s also a Cloud Storage tool ready to fetch or store documents.
Let us understand how the underlying flow of activities took place –
User Interface:
Entry point: Vertex AI console or a custom app.
Sends a single request; no direct credentials or system access exposed to the user.
Orchestration: Vertex AI Agent Builder (MCP Host)
Routes the request to the most suitable agent:
Agent A (Data Analysis) for analytics/BI-style questions.
Agent B (Developer) for application/data-ops tasks.
Tooling via MCP Servers on Cloud Run
Each MCP Server is a purpose-built adapter with least-privilege access to exactly one service:
Server1 → BigQuery (analytics/warehouse) — used by Agent A in this diagram.
Server2 → Cloud Storage (GCS) (files/objects) — available when file I/O is needed.
Server3 → Cloud SQL (relational DB) — used by Agent B in this diagram.
Agents never hold database credentials; they request actions from the right tool.
Enterprise Systems
BigQuery, Cloud Storage, and Cloud SQL are the systems of record that the tools interact with.
Security, Networking, and Observability
GCP IAM: AuthN/AuthZ for Vertex AI and each MCP Server (fine-grained roles, least privilege).
GCP VPC: Private network paths for all Cloud Run MCP Servers (isolation, egress control).
Cloud Monitoring: Metrics, logs, and alerts across agents and tools (auditability, SLOs).
Return Path
Results flow back from the service → MCP Server → Agent → Vertex AI → UI.
Policies and logs track who requested what, when, and how.
Why does this design work?
One entry point for questions.
Clear accountability: specialists (agents) act within guardrails.
Built-in safety (IAM/VPC) and visibility (Monitoring) for trust.
Separation of concerns: agents decide what to do; tools (MCP Servers) decide how to do it.
Scalable: add a new tool (e.g., Pub/Sub or Vertex AI Feature Store) without changing the UI or agents.
Auditable & maintainable: each tool maps to one service with explicit IAM and VPC controls.
So, we’ve concluded the series with the above post. I hope you like it.
I’ll bring some more exciting topics in the coming days from the new advanced world of technology.
Till then, Happy Avenging! 🙂
Note: All the data & scenarios posted here are representative of data & scenarios available on the internet for educational purposes only. There is always room for improvement in this kind of model & the solution associated with it. I’ve shown the basic ways to achieve the same for educational purposes only.
This week we’re planning to touch on one of the exciting posts of visually reading characters from WebCAM & predict the letters using CNN methods. Before we dig deep, why don’t we see the demo run first?
Demo
Isn’t it fascinating? As we can see, the computer can record events and read like humans. And, thanks to the brilliant packages available in Python, which can help us predict the correct letter out of an Image.
What do we need to test it out?
Preferably an external WebCAM.
A moderate or good Laptop to test out this.
Python
And a few other packages that we’ll mention next block.
What Python packages do we need?
Some of the critical packages that we must need to test out this application are –
In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep neural networks most commonly applied to analyze visual imagery.
Different Steps of CNN
We can understand from the above picture that a CNN generally takes an image as input. The neural network analyzes each pixel separately. The weights and biases of the model are then tweaked to detect the desired letters (In our use case) from the image. Like other algorithms, the data also has to pass through pre-processing stage. However, a CNN needs relatively less pre-processing than most other Deep Learning algorithms.
If you want to know more about this, there is an excellent article on CNN with some on-point animations explaining this concept. Please read it here.
Where do we get the data sets for our testing?
For testing, we are fortunate enough to have Kaggle with us. We have received a wide variety of sample data, which you can get from here.
Our use-case:
Architecture
From the above diagram, one can see that the python application will consume a live video feed of any random letters (both printed & handwritten) & predict the character as part of the machine learning model that we trained.
Code:
clsConfig.py (Configuration file for the entire application.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Since we have 26 letters, we have classified it as 26 in the numOfClasses.
Since we are talking about characters, we had to come up with a process of identifying each character as numbers & then processing our entire logic. Hence, the above parameter named word_dict captured all the characters in a python dictionary & stored them. Moreover, the application translates the final number output to more appropriate characters as the prediction.
2. clsAlphabetReading.py (Main training class to teach the model to predict alphabets from visual reader.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
We are splitting the data into Train, Test & Validation sets to get more accurate predictions and reshaping the raw data into the image by consuming the 784 data columns to 28×28 pixel images.
Since we are talking about characters, we had to come up with a process of identifying The following snippet will plot the character equivalent number into a matplotlib chart & showcase the overall distribution trend after splitting.
Y_Train_Num = np.int0(y)
count = np.zeros(numOfClasses, dtype='int')
for i in Y_Train_Num:
count[i] +=1
alphabets = []
for i in word_dict.values():
alphabets.append(i)
fig, ax = plt.subplots(1,1, figsize=(7,7))
ax.barh(alphabets, count)
plt.xlabel("Number of elements ")
plt.ylabel("Alphabets")
plt.grid()
plt.show(block=False)
plt.pause(sleepTime)
plt.close()
Note that we have tweaked the plt.show property with (block=False). This property will enable us to continue execution without human interventions after the initial pause.
# Model reshaping the training & test dataset
X_Train = X_Train.reshape(X_Train.shape[0],X_Train.shape[1],X_Train.shape[2],1)
print("Shape of Train Data: ", X_Train.shape)
X_Test = X_Test.reshape(X_Test.shape[0], X_Test.shape[1], X_Test.shape[2],1)
print("Shape of Test Data: ", X_Test.shape)
X_Validation = X_Validation.reshape(X_Validation.shape[0], X_Validation.shape[1], X_Validation.shape[2],1)
print("Shape of Validation data: ", X_Validation.shape)
# Converting the labels to categorical values
Y_Train_Catg = to_categorical(Y_Train, num_classes = numOfClasses, dtype='int')
print("Shape of Train Labels: ", Y_Train_Catg.shape)
Y_Test_Catg = to_categorical(Y_Test, num_classes = numOfClasses, dtype='int')
print("Shape of Test Labels: ", Y_Test_Catg.shape)
Y_Validation_Catg = to_categorical(Y_Validation, num_classes = numOfClasses, dtype='int')
print("Shape of validation labels: ", Y_Validation_Catg.shape)
In the above diagram, the application did reshape all three categories of data before calling the primary CNN function.
In the above snippet, the convolution layers are followed by maxpool layers, which reduce the number of features extracted. The output of the maxpool layers and convolution layers are flattened into a vector of a single dimension and supplied as an input to the Dense layer—the CNN model prepared for training the model using the training dataset.
We have used optimization parameters like Adam, RMSProp & the application we trained for eight epochs for better accuracy & predictions.
# Displaying the accuracies & losses for train & validation set
print("Validation Accuracy :", history.history['val_accuracy'])
print("Training Accuracy :", history.history['accuracy'])
print("Validation Loss :", history.history['val_loss'])
print("Training Loss :", history.history['loss'])
# Displaying the Loss Graph
plt.figure(1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['training','validation'])
plt.title('Loss')
plt.xlabel('epoch')
plt.show(block=False)
plt.pause(sleepTime1)
plt.close()
# Dsiplaying the Accuracy Graph
plt.figure(2)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training','validation'])
plt.title('Accuracy')
plt.xlabel('epoch')
plt.show(block=False)
plt.pause(sleepTime1)
plt.close()
Also, we have captured the validation Accuracy & Loss & plot them into two separate graphs for better understanding.
Also, the application is trying to get the accuracy of the model that we trained & validated with the training & validation data. This time we have used test data to predict the confidence score.
# Displaying some of the test images & their predicted labels
fig, ax = plt.subplots(3,3, figsize=(8,9))
axes = ax.flatten()
for i in range(9):
axes[i].imshow(np.reshape(X_Test[i], reshapeVal1), cmap="Greys")
pred = word_dict[np.argmax(Y_Test_Catg[i])]
print('Prediction: ', pred)
axes[i].set_title("Test Prediction: " + pred)
axes[i].grid()
plt.show(block=False)
plt.pause(sleepTime1)
plt.close()
Finally, the application testing with some random test data & tried to plot the output & prediction assessment.
As a part of the last step, the application will generate the models using a pickle package & save them under a specific location, which the reader application will use.
3. trainingVisualDataRead.py (Main application that will invoke the training class to predict alphabet through WebCam using Convolutional Neural Network (CNN).)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The python application will invoke the class & capture the returned value inside the r1 variable.
4. readingVisualData.py (Reading the model to predict Alphabet using WebCAM.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
We have initially cloned the original video frame & then it converted from BGR2GRAYSCALE while applying the threshold on it doe better prediction outcomes. Then the image has resized & reshaped for model input. Finally, the np.argmax function extracted the class index with the highest predicted probability. Furthermore, it is translated using the word_dict dictionary to an Alphabet & displayed on top of the Live View.
Also, derive the confidence score of that probability & display that on top of the Live View.
if cv2.waitKey(1) & 0xFF == ord('q'):
r1=0
break
The above code will let the developer exit from this application by pressing the “Esc” or “q”-key from the keyboard & the program will terminate.
So, we’ve done it.
You will get the complete codebase in the following Github link.
I’ll bring some more exciting topic in the coming days from the Python verse. Please share & subscribe my post & let me know your feedback.
Till then, Happy Avenging! 😀
Note: All the data & scenario posted here are representational data & scenarios & available over the internet & for educational purpose only. Some of the images (except my photo) that we’ve used are available over the net. We don’t claim the ownership of these images. There is an always room for improvement & especially the prediction quality of Alphabet.
You must be logged in to post a comment.