anthropic Archives

Real-time video summary assistance App – Part 2

Posted on April 21, 2025April 21, 2025 by SatyakiDe in ai, anthropic, api, Azure, call, cloud, code, computing, Crossplatform, Data Science, design, function, gpt3, IoT, json, LangChain, machine-learning, mcpprotocol, Microsoft, natural-language, objects, openai, Performance, Python, Real-time, sarvam, sql, Technology, video, youtubedataapi

As a continuation of the previous post, I would like to continue my discussion about the implementation of MCP protocols among agents. But before that, I want to add the quick demo one more time to recap our objectives.

Let us recap the process flow –

Also, understand the groupings of scripts by each group as posted in the previous post –

Message-Chaining Protocol (MCP) Implementation:

    clsMCPMessage.py
    clsMCPBroker.py

YouTube Transcript Extraction:

    clsYouTubeVideoProcessor.py

Language Detection:

    clsLanguageDetector.py

Translation Services & Agents:

    clsTranslationAgent.py
    clsTranslationService.py

Documentation Agent:

    clsDocumentationAgent.py
    
Research Agent:

    clsDocumentationAgent.py

Great! Now, we’ll continue with the main discussion.

CODE:

clsYouTubeVideoProcessor.py (This class processes the transcripts from YouTube. It may also translate them into English for non-native speakers.)

def extract_youtube_id(youtube_url):
    """Extract YouTube video ID from URL"""
    youtube_id_match = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11}).*', youtube_url)
    if youtube_id_match:
        return youtube_id_match.group(1)
    return None

def get_youtube_transcript(youtube_url):
    """Get transcript from YouTube video"""
    video_id = extract_youtube_id(youtube_url)
    if not video_id:
        return {"error": "Invalid YouTube URL or ID"}
    
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        
        # First try to get manual transcripts
        try:
            transcript = transcript_list.find_manually_created_transcript(["en"])
            transcript_data = transcript.fetch()
            print(f"Debug - Manual transcript format: {type(transcript_data)}")
            if transcript_data and len(transcript_data) > 0:
                print(f"Debug - First item type: {type(transcript_data[0])}")
                print(f"Debug - First item sample: {transcript_data[0]}")
            return {"text": transcript_data, "language": "en", "auto_generated": False}
        except Exception as e:
            print(f"Debug - No manual transcript: {str(e)}")
            # If no manual English transcript, try any available transcript
            try:
                available_transcripts = list(transcript_list)
                if available_transcripts:
                    transcript = available_transcripts[0]
                    print(f"Debug - Using transcript in language: {transcript.language_code}")
                    transcript_data = transcript.fetch()
                    print(f"Debug - Auto transcript format: {type(transcript_data)}")
                    if transcript_data and len(transcript_data) > 0:
                        print(f"Debug - First item type: {type(transcript_data[0])}")
                        print(f"Debug - First item sample: {transcript_data[0]}")
                    return {
                        "text": transcript_data, 
                        "language": transcript.language_code, 
                        "auto_generated": transcript.is_generated
                    }
                else:
                    return {"error": "No transcripts available for this video"}
            except Exception as e:
                return {"error": f"Error getting transcript: {str(e)}"}
    except Exception as e:
        return {"error": f"Error getting transcript list: {str(e)}"}

# ----------------------------------------------------------------------------------
# YouTube Video Processor
# ----------------------------------------------------------------------------------

class clsYouTubeVideoProcessor:
    """Process YouTube videos using the agent system"""
    
    def __init__(self, documentation_agent, translation_agent, research_agent):
        self.documentation_agent = documentation_agent
        self.translation_agent = translation_agent
        self.research_agent = research_agent
    
    def process_youtube_video(self, youtube_url):
        """Process a YouTube video"""
        print(f"Processing YouTube video: {youtube_url}")
        
        # Extract transcript
        transcript_result = get_youtube_transcript(youtube_url)
        
        if "error" in transcript_result:
            return {"error": transcript_result["error"]}
        
        # Start a new conversation
        conversation_id = self.documentation_agent.start_processing()
        
        # Process transcript segments
        transcript_data = transcript_result["text"]
        transcript_language = transcript_result["language"]
        
        print(f"Debug - Type of transcript_data: {type(transcript_data)}")
        
        # For each segment, detect language and translate if needed
        processed_segments = []
        
        try:
            # Make sure transcript_data is a list of dictionaries with text and start fields
            if isinstance(transcript_data, list):
                for idx, segment in enumerate(transcript_data):
                    print(f"Debug - Processing segment {idx}, type: {type(segment)}")
                    
                    # Extract text properly based on the type
                    if isinstance(segment, dict) and "text" in segment:
                        text = segment["text"]
                        start = segment.get("start", 0)
                    else:
                        # Try to access attributes for non-dict types
                        try:
                            text = segment.text
                            start = getattr(segment, "start", 0)
                        except AttributeError:
                            # If all else fails, convert to string
                            text = str(segment)
                            start = idx * 5  # Arbitrary timestamp
                    
                    print(f"Debug - Extracted text: {text[:30]}...")
                    
                    # Create a standardized segment
                    std_segment = {
                        "text": text,
                        "start": start
                    }
                    
                    # Process through translation agent
                    translation_result = self.translation_agent.process_text(text, conversation_id)
                    
                    # Update segment with translation information
                    segment_with_translation = {
                        **std_segment,
                        "translation_info": translation_result
                    }
                    
                    # Use translated text for documentation
                    if "final_text" in translation_result and translation_result["final_text"] != text:
                        std_segment["processed_text"] = translation_result["final_text"]
                    else:
                        std_segment["processed_text"] = text
                    
                    processed_segments.append(segment_with_translation)
            else:
                # If transcript_data is not a list, treat it as a single text block
                print(f"Debug - Transcript is not a list, treating as single text")
                text = str(transcript_data)
                std_segment = {
                    "text": text,
                    "start": 0
                }
                
                translation_result = self.translation_agent.process_text(text, conversation_id)
                segment_with_translation = {
                    **std_segment,
                    "translation_info": translation_result
                }
                
                if "final_text" in translation_result and translation_result["final_text"] != text:
                    std_segment["processed_text"] = translation_result["final_text"]
                else:
                    std_segment["processed_text"] = text
                
                processed_segments.append(segment_with_translation)
                
        except Exception as e:
            print(f"Debug - Error processing transcript: {str(e)}")
            return {"error": f"Error processing transcript: {str(e)}"}
        
        # Process the transcript with the documentation agent
        documentation_result = self.documentation_agent.process_transcript(
            processed_segments,
            conversation_id
        )
        
        return {
            "youtube_url": youtube_url,
            "transcript_language": transcript_language,
            "processed_segments": processed_segments,
            "documentation": documentation_result,
            "conversation_id": conversation_id
        }

Let us understand this step-by-step:

Part 1: Getting the YouTube Transcript

def extract_youtube_id(youtube_url):
    ...

This extracts the unique video ID from any YouTube link.

def get_youtube_transcript(youtube_url):
    ...

This gets the actual spoken content of the video.
It tries to get a manual transcript first (created by humans).
If not available, it falls back to an auto-generated version (created by YouTube’s AI).
If nothing is found, it gives back an error message like: “Transcript not available.”

Part 2: Processing the Video with Agents

class clsYouTubeVideoProcessor:
    ...

This is like the control center that tells each intelligent agent what to do with the transcript. Here are the detailed steps:

1. Start the Process

def process_youtube_video(self, youtube_url):
    ...

The system starts with a YouTube video link.
It prints a message like: “Processing YouTube video: [link]”

2. Extract the Transcript

The system runs the get_youtube_transcript() function.
If it fails, it returns an error (e.g., invalid link or no subtitles available).

3. Start a “Conversation”

The documentation agent begins a new session, tracked by a unique conversation ID.
Think of this like opening a new folder in a shared team workspace to store everything related to this video.

4. Go Through Each Segment of the Transcript

The spoken text is often broken into small parts (segments), like subtitles.
For each part:
- It checks the text.
- It finds out the time that part was spoken.
- It sends it to the translation agent to clean up or translate the text.

5. Translate (if needed)

If the translation agent finds a better or translated version, it replaces the original.
Otherwise, it keeps the original.

6. Prepare for Documentation

After translation, the segment is passed to the documentation agent.
This agent might:
- Summarize the content,
- Highlight important terms,
- Structure it into a readable format.

7. Return the Final Result

The system gives back a structured package with:

The video link
The original language
The transcript in parts (processed and translated)
A documentation summary
The conversation ID (for tracking or further updates)

clsDocumentationAgent.py (This is the main class that will be part of the document agents.)

class clsDocumentationAgent:
    """Documentation Agent built with LangChain"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Initialize LangChain components
        self.llm = ChatOpenAI(
            model="gpt-4-0125-preview",
            temperature=0.1,
            api_key=OPENAI_API_KEY
        )
        
        # Create tools
        self.tools = [
            clsSendMessageTool(sender_id=self.agent_id, broker=self.broker)
        ]
        
        # Set up LLM with tools
        self.llm_with_tools = self.llm.bind(
            tools=[tool.tool_config for tool in self.tools]
        )
        
        # Setup memory
        self.memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )
        
        # Create prompt
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a Documentation Agent for YouTube video transcripts. Your responsibilities include:
                1. Process YouTube video transcripts
                2. Identify key points, topics, and main ideas
                3. Organize content into a coherent and structured format
                4. Create concise summaries
                5. Request research information when necessary
                
                When you need additional context or research, send a request to the Research Agent.
                Always maintain a professional tone and ensure your documentation is clear and organized.
            """),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ])
        
        # Create agent
        self.agent = (
            {
                "input": lambda x: x["input"],
                "chat_history": lambda x: self.memory.load_memory_variables({})["chat_history"],
                "agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
            }
            | self.prompt
            | self.llm_with_tools
            | OpenAIToolsAgentOutputParser()
        )
        
        # Create agent executor
        self.agent_executor = AgentExecutor(
            agent=self.agent,
            tools=self.tools,
            verbose=True,
            memory=self.memory
        )
        
        # Video data
        self.current_conversation_id = None
        self.video_notes = {}
        self.key_points = []
        self.transcript_segments = []
        
    def start_processing(self) -> str:
        """Start processing a new video"""
        self.current_conversation_id = str(uuid.uuid4())
        self.video_notes = {}
        self.key_points = []
        self.transcript_segments = []
        
        return self.current_conversation_id
    
    def process_transcript(self, transcript_segments, conversation_id=None):
        """Process a YouTube transcript"""
        if not conversation_id:
            conversation_id = self.start_processing()
        self.current_conversation_id = conversation_id
        
        # Store transcript segments
        self.transcript_segments = transcript_segments
        
        # Process segments
        processed_segments = []
        for segment in transcript_segments:
            processed_result = self.process_segment(segment)
            processed_segments.append(processed_result)
        
        # Generate summary
        summary = self.generate_summary()
        
        return {
            "processed_segments": processed_segments,
            "summary": summary,
            "conversation_id": conversation_id
        }
    
    def process_segment(self, segment):
        """Process individual transcript segment"""
        text = segment.get("text", "")
        start = segment.get("start", 0)
        
        # Use LangChain agent to process the segment
        result = self.agent_executor.invoke({
            "input": f"Process this video transcript segment at timestamp {start}s: {text}. If research is needed, send a request to the research_agent."
        })
        
        # Update video notes
        timestamp = start
        self.video_notes[timestamp] = {
            "text": text,
            "analysis": result["output"]
        }
        
        return {
            "timestamp": timestamp,
            "text": text,
            "analysis": result["output"]
        }
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "research_response":
            # Process research information received from Research Agent
            research_info = message.content.get("text", "")
            
            result = self.agent_executor.invoke({
                "input": f"Incorporate this research information into video analysis: {research_info}"
            })
            
            # Send acknowledgment back to Research Agent
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="acknowledgment",
                content={"text": "Research information incorporated into video analysis."},
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        elif message.message_type == "translation_response":
            # Process translation response from Translation Agent
            translation_result = message.content
            
            # Process the translated text
            if "final_text" in translation_result:
                text = translation_result["final_text"]
                original_text = translation_result.get("original_text", "")
                language_info = translation_result.get("language", {})
                
                result = self.agent_executor.invoke({
                    "input": f"Process this translated text: {text}\nOriginal language: {language_info.get('language', 'unknown')}\nOriginal text: {original_text}"
                })
                
                # Update notes with translation information
                for timestamp, note in self.video_notes.items():
                    if note["text"] == original_text:
                        note["translated_text"] = text
                        note["language"] = language_info
                        break
            
            return None
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Documentation Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)
    
    def generate_summary(self) -> str:
        """Generate a summary of the video"""
        if not self.video_notes:
            return "No video data available to summarize."
        
        all_notes = "\n".join([f"{ts}: {note['text']}" for ts, note in self.video_notes.items()])
        
        result = self.agent_executor.invoke({
            "input": f"Generate a concise summary of this YouTube video, including key points and topics:\n{all_notes}"
        })
        
        return result["output"]

Let us understand the key methods in a step-by-step manner:

The Documentation Agent is like a smart assistant that watches a YouTube video, takes notes, pulls out important ideas, and creates a summary — almost like a professional note-taker trained to help educators, researchers, and content creators. It works with a team of other assistants, like a Translator Agent and a Research Agent, and they all talk to each other through a messaging system.

1. Starting to Work on a New Video

def start_processing(self) -> str

When a new video is being processed:

A new project ID is created.
Old notes and transcripts are cleared to start fresh.

2. Processing the Whole Transcript

def process_transcript(...)

This is where the assistant:

Takes in the full transcript (what was said in the video).
Breaks it into small parts (like subtitles).
Sends each part to the smart brain for analysis.
Collects the results.
Finally, a summary of all the main ideas is created.

3. Processing One Transcript Segment at a Time

def process_segment(self, segment)

For each chunk of the video:

The assistant reads the text and timestamp.
It asks GPT-4 to analyze it and suggest important insights.
It saves that insight along with the original text and timestamp.

4. Handling Incoming Messages from Other Agents

def handle_mcp_message(self, message)

The assistant can also receive messages from teammates (other agents):

If the message is from the Research Agent:

It reads new information and adds it to its notes.
It replies with a thank-you message to say it got the research.

If the message is from the Translation Agent:

It takes the translated version of a transcript.
Updates its notes to reflect the translated text and its language.

This is like a team of assistants emailing back and forth to make sure the notes are complete and accurate.

5. Summarizing the Whole Video

def generate_summary(self)

After going through all the transcript parts, the agent asks GPT-4 to create a short, clean summary — identifying:

Main ideas
Key talking points
Structure of the content

The final result is clear, professional, and usable in learning materials or documentation.

clsResearchAgent.py (This is the main class that implements the research agent.)

class clsResearchAgent:
    """Research Agent built with AutoGen"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Configure AutoGen directly with API key
        if not OPENAI_API_KEY:
            print("Warning: OPENAI_API_KEY not set for ResearchAgent")
            
        # Create config list directly instead of loading from file
        config_list = [
            {
                "model": "gpt-4-0125-preview",
                "api_key": OPENAI_API_KEY
            }
        ]
        # Create AutoGen assistant for research
        self.assistant = AssistantAgent(
            name="research_assistant",
            system_message="""You are a Research Agent for YouTube videos. Your responsibilities include:
                1. Research topics mentioned in the video
                2. Find relevant information, facts, references, or context
                3. Provide concise, accurate information to support the documentation
                4. Focus on delivering high-quality, relevant information
                
                Respond directly to research requests with clear, factual information.
            """,
            llm_config={"config_list": config_list, "temperature": 0.1}
        )
        
        # Create user proxy to handle message passing
        self.user_proxy = UserProxyAgent(
            name="research_manager",
            human_input_mode="NEVER",
            code_execution_config={"work_dir": "coding", "use_docker": False},
            default_auto_reply="Working on the research request..."
        )
        
        # Current conversation tracking
        self.current_requests = {}
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "request":
            # Process research request from Documentation Agent
            request_text = message.content.get("text", "")
            
            # Use AutoGen to process the research request
            def research_task():
                self.user_proxy.initiate_chat(
                    self.assistant,
                    message=f"Research request for YouTube video content: {request_text}. Provide concise, factual information."
                )
                # Return last assistant message
                return self.assistant.chat_messages[self.user_proxy.name][-1]["content"]
            
            # Execute research task
            research_result = research_task()
            
            # Send research results back to Documentation Agent
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="research_response",
                content={"text": research_result},
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Research Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)

Let us understand the key methods in detail.

1. Receiving and Responding to Research Requests

def handle_mcp_message(self, message)

When the Research Agent gets a message (like a question or request for info), it:

Reads the message to see what needs to be researched.
Asks GPT-4 to find helpful, accurate info about that topic.
Sends the answer back to whoever asked the question (usually the Documentation Agent).

clsTranslationAgent.py (This is the main class that represents the translation agent)

class clsTranslationAgent:
    """Agent for language detection and translation"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Initialize language detector
        self.language_detector = clsLanguageDetector()
        
        # Initialize translation service
        self.translation_service = clsTranslationService()
    
    def process_text(self, text, conversation_id=None):
        """Process text: detect language and translate if needed, handling mixed language content"""
        if not conversation_id:
            conversation_id = str(uuid.uuid4())
        
        # Detect language with support for mixed language content
        language_info = self.language_detector.detect(text)
        
        # Decide if translation is needed
        needs_translation = True
        
        # Pure English content doesn't need translation
        if language_info["language_code"] == "en-IN" or language_info["language_code"] == "unknown":
            needs_translation = False
        
        # For mixed language, check if it's primarily English
        if language_info.get("is_mixed", False) and language_info.get("languages", []):
            english_langs = [
                lang for lang in language_info.get("languages", []) 
                if lang["language_code"] == "en-IN" or lang["language_code"].startswith("en-")
            ]
            
            # If the highest confidence language is English and > 60% confident, don't translate
            if english_langs and english_langs[0].get("confidence", 0) > 0.6:
                needs_translation = False
        
        if needs_translation:
            # Translate using the appropriate service based on language detection
            translation_result = self.translation_service.translate(text, language_info)
            
            return {
                "original_text": text,
                "language": language_info,
                "translation": translation_result,
                "final_text": translation_result.get("translated_text", text),
                "conversation_id": conversation_id
            }
        else:
            # Already English or unknown language, return as is
            return {
                "original_text": text,
                "language": language_info,
                "translation": {"provider": "none"},
                "final_text": text,
                "conversation_id": conversation_id
            }
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "translation_request":
            # Process translation request from Documentation Agent
            text = message.content.get("text", "")
            
            # Process the text
            result = self.process_text(text, message.conversation_id)
            
            # Send translation results back to requester
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="translation_response",
                content=result,
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Translation Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)

Let us understand the key methods in step-by-step manner:

1. Understanding and Translating Text:

def process_text(...)

This is the core job of the agent. Here’s what it does with any piece of text:

Step 1: Detect the Language

It tries to figure out the language of the input text.
It can handle cases where more than one language is mixed together, which is common in casual speech or subtitles.

Step 2: Decide Whether to Translate

If the text is clearly in English, or it’s unclear what the language is, it decides not to translate.
If the text is mostly in another language or has less than 60% confidence in being English, it will translate it into English.

Step 3: Translate (if needed)

If translation is required, it uses the translation service to do the job.
Then it packages all the information: the original text, detected language, the translated version, and a unique conversation ID.

Step 4: Return the Results

If no translation is needed, it returns the original text and a note saying “no translation was applied.”

2. Receiving Messages and Responding

def handle_mcp_message(...)

The agent listens for messages from other agents. When someone asks it to translate something:

It takes the text from the message.
Runs it through the process_text function (as explained above).
Sends the translated (or original) result to the person who asked.

clsTranslationService.py (This is the actual work process of translation by the agent)

class clsTranslationService:
    """Translation service using multiple providers with support for mixed languages"""
    
    def __init__(self):
        # Initialize Sarvam AI client
        self.sarvam_api_key = SARVAM_API_KEY
        self.sarvam_url = "https://api.sarvam.ai/translate"
        
        # Initialize Google Cloud Translation client using simple HTTP requests
        self.google_api_key = GOOGLE_API_KEY
        self.google_translate_url = "https://translation.googleapis.com/language/translate/v2"
    
    def translate_with_sarvam(self, text, source_lang, target_lang="en-IN"):
        """Translate text using Sarvam AI (for Indian languages)"""
        if not self.sarvam_api_key:
            return {"error": "Sarvam API key not set"}
        
        headers = {
            "Content-Type": "application/json",
            "api-subscription-key": self.sarvam_api_key
        }
        
        payload = {
            "input": text,
            "source_language_code": source_lang,
            "target_language_code": target_lang,
            "speaker_gender": "Female",
            "mode": "formal",
            "model": "mayura:v1"
        }
        
        try:
            response = requests.post(self.sarvam_url, headers=headers, json=payload)
            if response.status_code == 200:
                return {"translated_text": response.json().get("translated_text", ""), "provider": "sarvam"}
            else:
                return {"error": f"Sarvam API error: {response.text}", "provider": "sarvam"}
        except Exception as e:
            return {"error": f"Error calling Sarvam API: {str(e)}", "provider": "sarvam"}
    
    def translate_with_google(self, text, target_lang="en"):
        """Translate text using Google Cloud Translation API with direct HTTP request"""
        if not self.google_api_key:
            return {"error": "Google API key not set"}
        
        try:
            # Using the translation API v2 with API key
            params = {
                "key": self.google_api_key,
                "q": text,
                "target": target_lang
            }
            
            response = requests.post(self.google_translate_url, params=params)
            if response.status_code == 200:
                data = response.json()
                translation = data.get("data", {}).get("translations", [{}])[0]
                return {
                    "translated_text": translation.get("translatedText", ""),
                    "detected_source_language": translation.get("detectedSourceLanguage", ""),
                    "provider": "google"
                }
            else:
                return {"error": f"Google API error: {response.text}", "provider": "google"}
        except Exception as e:
            return {"error": f"Error calling Google Translation API: {str(e)}", "provider": "google"}
    
    def translate(self, text, language_info):
        """Translate text to English based on language detection info"""
        # If already English or unknown language, return as is
        if language_info["language_code"] == "en-IN" or language_info["language_code"] == "unknown":
            return {"translated_text": text, "provider": "none"}
        
        # Handle mixed language content
        if language_info.get("is_mixed", False) and language_info.get("languages", []):
            # Strategy for mixed language: 
            # 1. If one of the languages is English, don't translate the entire text, as it might distort English portions
            # 2. If no English but contains Indian languages, use Sarvam as it handles code-mixing better
            # 3. Otherwise, use Google Translate for the primary detected language
            
            has_english = False
            has_indian = False
            
            for lang in language_info.get("languages", []):
                if lang["language_code"] == "en-IN" or lang["language_code"].startswith("en-"):
                    has_english = True
                if lang.get("is_indian", False):
                    has_indian = True
            
            if has_english:
                # Contains English - use Google for full text as it handles code-mixing well
                return self.translate_with_google(text)
            elif has_indian:
                # Contains Indian languages - use Sarvam
                # Use the highest confidence Indian language as source
                indian_langs = [lang for lang in language_info.get("languages", []) if lang.get("is_indian", False)]
                if indian_langs:
                    # Sort by confidence
                    indian_langs.sort(key=lambda x: x.get("confidence", 0), reverse=True)
                    source_lang = indian_langs[0]["language_code"]
                    return self.translate_with_sarvam(text, source_lang)
                else:
                    # Fallback to primary language
                    if language_info["is_indian"]:
                        return self.translate_with_sarvam(text, language_info["language_code"])
                    else:
                        return self.translate_with_google(text)
            else:
                # No English, no Indian languages - use Google for primary language
                return self.translate_with_google(text)
        else:
            # Not mixed language - use standard approach
            if language_info["is_indian"]:
                # Use Sarvam AI for Indian languages
                return self.translate_with_sarvam(text, language_info["language_code"])
            else:
                # Use Google for other languages
                return self.translate_with_google(text)

This Translation Service is like a smart translator that knows how to:

Detect what language the text is written in,
Choose the best translation provider depending on the language (especially for Indian languages),
And then translate the text into English.

It supports mixed-language content (such as Hindi-English in one sentence) and uses either Google Translate or Sarvam AI, a translation service designed for Indian languages.

Now, let us understand the key methods in a step-by-step manner:

1. Translating Using Google Translate

def translate_with_google(...)

This function uses Google Translate:

It sends the text, asks for English as the target language, and gets a translation back.
It also detects the source language automatically.
If successful, it returns the translated text and the detected original language.
If there’s an error, it returns a message saying what went wrong.

Best For: Non-Indian languages (like Spanish, French, Chinese) and content that is not mixed with English.

2. Main Translation Logic

def translate(self, text, language_info)

This is the decision-maker. Here’s how it works:

Case 1: No Translation Needed

If the text is already in English or the language is unknown, it simply returns the original text.

Case 2: Mixed Language (e.g., Hindi + English)

If the text contains more than one language:

✅ If one part is English → use Google Translate (it’s good with mixed languages).
✅ If it includes Indian languages only → use Sarvam AI (better at handling Indian content).
✅ If it’s neither English nor Indian → use Google Translate.

The service checks how confident it is about each language in the mix and chooses the most likely one to translate from.

Case 3: Single Language

If the text is only in one language:

✅ If it’s an Indian language (like Bengali, Tamil, or Marathi), use Sarvam AI.
✅ If it’s any other language, use Google Translate.

So, we’ve done it.

I’ve included the complete working solutions for you in the GitHub Link.

We’ll cover the detailed performance testing, Optimized configurations & many other useful details in our next post.

Till then, Happy Avenging! 🙂

Note: All the data & scenarios posted here are representational data & scenarios & available over the internet & for educational purposes only. There is always room for improvement in this kind of model & the solution associated with it. I’ve shown the basic ways to achieve the same for educational purposes only.

Real-time video summary assistance App – Part 1

Posted on March 31, 2025April 21, 2025 by SatyakiDe in agents, ai, anthropic, api, Azure, call, cloud, code, computing, Data Science, design, json, LangChain, machine-learning, mcpprotocol, objects, openai, Python, Real-time, snippet, Technology, video

Today, we’ll discuss another topic in our two-part series. We will understand the importance of the MCP protocol for communicating between agents.

This will be an in-depth highly technical as well as depicting using easy-to-understand visuals.

But, before that, let us understand the demo first.

Isn’t it exciting?

MCP Protocol:

Let us first understand in easy language about the MCP protocol.

MCP (Multi-Agent Communication Protocol) is a custom message exchange system that facilitates structured and scalable communication among multiple AI agents operating within an application. These agents collaborate asynchronously or in real-time to complete complex tasks by sharing results, context, and commands through a common messaging layer.

How MCP Protocol Helps:

Feature	Benefit
Agent-Oriented Architecture	Each agent handles a focused task, improving modularity and scalability.
Event-Driven Message Passing	Agents communicate based on triggers, not polling—leading to faster and efficient responses.
Structured Communication Format	All messages follow a standard format (e.g., JSON) with metadata for sender, recipient, type, and payload.
State Preservation	Agents maintain context across messages using memory (e.g., `ConversationBufferMemory`) to ensure coherence.

How It Works (Step-by-Step):

📥 User uploads or streams a video.
🧑‍💻 MCP Protocol triggers the Transcription Agent to start converting audio into text.
🌐 Translation Agent receives this text (if a different language is needed).
🧾 Summarization Agent receives the translated or original transcript and generates a concise summary.
📚 Research Agent checks for references or terminology used in the video.
📄 Documentation Agent compiles the output into a structured report.
🔁 All communication between agents flows through MCP, ensuring consistent message delivery and coordination.

Now, let us understand the solution that we intend to implement for our solutions:

This app provides live summarization and contextual insights from videos such as webinars, interviews, or YouTube recordings using multiple cooperating AI agents. These agents may include:

Transcription Agent: Converts spoken words to text.
Translation Agent: Translates text to different languages (if needed).
Summarization Agent: Generates concise summaries.
Research Agent: Finds background or supplementary data related to the discussion.
Documentation Agent: Converts outputs into structured reports or learning materials.

We need to understand one more thing before deep diving into the code. Part of your conversation may be mixed, like part Hindi & part English. So, in that case, it will break the sentences into chunks & then convert all of them into the same language. Hence, the following rules are applied while translating the sentences –

Now, we will go through the basic frame of the system & try to understand how it fits all the principles that we discussed above for this particular solution mapped against the specific technology –

Documentation Agent built with the LangChain framework
Research Agent built with the AutoGen framework
MCP Broker for seamless communication between agents

Process Flow:

Let us understand from the given picture the flow of the process that our app is trying to implement –

Great! So, now, we’ll focus on some of the key Python scripts & go through their key features.

But, before that, we share the group of scripts that belong to specific tasks.

Message-Chaining Protocol (MCP) Implementation:

clsMCPMessage.py
clsMCPBroker.py

YouTube Transcript Extraction:

clsYouTubeVideoProcessor.py

Language Detection:

clsLanguageDetector.py

Translation Services & Agents:

clsTranslationAgent.py
clsTranslationService.py

Documentation Agent:

clsDocumentationAgent.py

Research Agent:

clsResearchAgent.py

Now, we’ll review some of the script in this post, along with the next post, as a continuation from this post.

CODE:

clsMCPMessage.py (This is one of the main or key scripts that will help enable implementation of the MCP protocols)

class clsMCPMessage(BaseModel):
    """Message format for MCP protocol"""
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    timestamp: float = Field(default_factory=time.time)
    sender: str
    receiver: str
    message_type: str  # "request", "response", "notification"
    content: Dict[str, Any]
    reply_to: Optional[str] = None
    conversation_id: str
    metadata: Dict[str, Any] = {}
    
class clsMCPBroker:
    """Message broker for MCP protocol communication between agents"""
    
    def __init__(self):
        self.message_queues: Dict[str, queue.Queue] = {}
        self.subscribers: Dict[str, List[str]] = {}
        self.conversation_history: Dict[str, List[clsMCPMessage]] = {}
    
    def register_agent(self, agent_id: str) -> None:
        """Register an agent with the broker"""
        if agent_id not in self.message_queues:
            self.message_queues[agent_id] = queue.Queue()
            self.subscribers[agent_id] = []
    
    def subscribe(self, subscriber_id: str, publisher_id: str) -> None:
        """Subscribe an agent to messages from another agent"""
        if publisher_id in self.subscribers:
            if subscriber_id not in self.subscribers[publisher_id]:
                self.subscribers[publisher_id].append(subscriber_id)
    
    def publish(self, message: clsMCPMessage) -> None:
        """Publish a message to its intended receiver"""
        # Store in conversation history
        if message.conversation_id not in self.conversation_history:
            self.conversation_history[message.conversation_id] = []
        self.conversation_history[message.conversation_id].append(message)
        
        # Deliver to direct receiver
        if message.receiver in self.message_queues:
            self.message_queues[message.receiver].put(message)
        
        # Deliver to subscribers of the sender
        for subscriber in self.subscribers.get(message.sender, []):
            if subscriber != message.receiver:  # Avoid duplicates
                self.message_queues[subscriber].put(message)
    
    def get_message(self, agent_id: str, timeout: Optional[float] = None) -> Optional[clsMCPMessage]:
        """Get a message for the specified agent"""
        try:
            return self.message_queues[agent_id].get(timeout=timeout)
        except (queue.Empty, KeyError):
            return None
    
    def get_conversation_history(self, conversation_id: str) -> List[clsMCPMessage]:
        """Get the history of a conversation"""
        return self.conversation_history.get(conversation_id, [])

Imagine a system where different virtual agents (like robots or apps) need to talk to each other. To do that, they send messages back and forth—kind of like emails or text messages. This code is responsible for:

Making sure those messages are properly written (like filling out all parts of a form).
Making sure messages are delivered to the right people.
Keeping a record of conversations so you can go back and review what was said.

This part (clsMCPMessage) is like a template or a form that every message needs to follow. Each message has:

ID: A unique number so every message is different (like a serial number).
Time Sent: When the message was created.
Sender & Receiver: Who sent the message and who is supposed to receive it.
Type of Message: Is it a request, a response, or just a notification?
Content: The actual information or question the message is about.
Reply To: If this message is answering another one, this tells which one.
Conversation ID: So we know which group of messages belongs to the same conversation.
Extra Info (Metadata): Any other small details that might help explain the message.

This (clsMCPBroker) is the system (or “post office”) that makes sure messages get to where they’re supposed to go. Here’s what it does:

1. Registering an Agent

Think of this like signing up a new user in the system.
Each agent gets their own personal mailbox (called a “message queue”) so others can send them messages.

2. Subscribing to Another Agent

If Agent A wants to receive copies of messages from Agent B, they can “subscribe” to B.
This is like signing up for B’s newsletter—whenever B sends something, A gets a copy.

3. Sending a Message

When someone sends a message:
- It is saved into a conversation history (like keeping emails in your inbox).
- It is delivered to the main person it was meant for.
- And, if anyone subscribed to the sender, they get a copy too—unless they’re already the main receiver (to avoid sending duplicates).

4. Receiving Messages

Each agent can check their personal mailbox to see if they got any new messages.
If there are no messages, they’ll either wait for some time or move on.

5. Viewing Past Conversations

You can look up all messages that were part of a specific conversation.
This is helpful for remembering what was said earlier.

In systems where many different smart tools or services need to work together and communicate, this kind of communication system makes sure everything is:

Organized
Delivered correctly
Easy to trace back when needed

So, in this post, we’ll finish it here. We’ll cover the rest of the post in the next post.

I’ll bring some more exciting topics in the coming days from the Python verse.

Till then, Happy Avenging! 🙂

Monitoring & evaluating the leading LLMs (both the established & new) by Python-based evaluator

Posted on January 5, 2025January 5, 2025 by SatyakiDe in ai, anthropic, api, Azure, bharatgpt, cloud, code, CPU, Crossplatform, Data Science, deepseek, GPU, HuggingFace, json, llm, natural-language, numpy, objects, openai, Pandas, Python, Real-time, Silicon, snippet, Technology, Torch

As we’re leaping more & more into the field of Generative AI, one of the frequent questions or challenges people are getting more & more is the performance & other evaluation factors. These factors will eventually bring the fruit of this technology; otherwise, you will end up in technical debt.

This post will discuss the key snippets of the monitoring app based on the Python-based AI app. But before that, let us first view the demo.

Isn’t it exciting?

Let us deep dive into it. But, here is the flow this solution will follow.

So, the current application will invoke the industry bigshots and some relatively unknown or new LLMs.

In this case, we’ll evaluate Anthropic, Open AI, DeepSeek, and Bharat GPT’s various models. However, Bharat GPT is open source, so we’ll use the Huggingface library and execute it locally against my MacBook Pro M4 Max.

The following are the KPIs we’re going to evaluate:

Package Installation:

Here are the lists of dependant python packages that is require to run this application –

pip install certifi==2024.8.30
pip install anthropic==0.42.0
pip install huggingface-hub==0.27.0
pip install nltk==3.9.1
pip install numpy==2.2.1
pip install moviepy==2.1.1
pip install numpy==2.1.3
pip install openai==1.59.3
pip install pandas==2.2.3
pip install pillow==11.1.0
pip install pip==24.3.1
pip install psutil==6.1.1
pip install requests==2.32.3
pip install rouge_score==0.1.2
pip install scikit-learn==1.6.0
pip install setuptools==70.2.0
pip install tokenizers==0.21.0
pip install torch==2.6.0.dev20250104
pip install torchaudio==2.6.0.dev20250104
pip install torchvision==0.22.0.dev20250104
pip install tqdm==4.67.1
pip install transformers==4.47.1

CODE:

clsComprehensiveLLMEvaluator.py (This is the main Python class that will apply all the logic to collect stats involving important KPIs. Note that we’re only going to discuss a few important functions here.)

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
    def get_claude_response(self, prompt: str) -> str:
        response = self.anthropic_client.messages.create(
            model=anthropic_model,
            max_tokens=maxToken,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

The Retry Mechanism
- The @retry line means this function will automatically try again if it fails.
- It will stop retrying after 3 attempts (stop_after_attempt(3)).
- It will wait longer between retries, starting at 4 seconds and increasing up to 10 seconds (wait_exponential(multiplier=1, min=4, max=10)).
The Function Purpose
- The function takes a message, called prompt, as input (a string of text).
- It uses a service (likely an AI system like Claude) to generate a response to this prompt.
Sending the Message
- Inside the function, the code self.anthropic_client.messages.create is the part that actually sends the prompt to the AI.
- It specifies:Which AI model to use (e.g., anthropic_model).
- The maximum length of the response (controlled by maxToken).
- The input message for the AI has a “role” (user), as well as the content of the prompt.
Getting the Response
- Once the AI generates a response, it’s saved as response.
- The code retrieves the first part of the response (response.content[0].text) and sends it back to whoever called the function.

Similarly, it will work for Open AI as well.

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
    def get_deepseek_response(self, prompt: str) -> tuple:
        deepseek_api_key = self.deepseek_api_key

        headers = {
            "Authorization": f"Bearer {deepseek_api_key}",
            "Content-Type": "application/json"
            }
        
        payload = {
            "model": deepseek_model,  
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": maxToken
            }
        
        response = requests.post(DEEPSEEK_API_URL, headers=headers, json=payload)

        if response.status_code == 200:
            res = response.json()["choices"][0]["message"]["content"]
        else:
            res = "API request failed with status code " + str(response.status_code) + ":" + str(response.text)

        return res

Retry Mechanism:
- The @retry line ensures the function will try again if it fails.
- It will stop retrying after 3 attempts (stop_after_attempt(3)).
- It waits between retries, starting at 4 seconds and increasing up to 10 seconds (wait_exponential(multiplier=1, min=4, max=10)).

What the Function Does:
- The function takes one input, prompt, which is the message or question you want to send to the AI.
- It returns the AI’s response or an error message.

Preparing to Communicate with the API:
- API Key: It gets the API key for the DeepSeek service from self.deepseek_api_key.
- Headers: These tell the API that the request will use the API key (for security) and that the data format is JSON (structured text).
- Payload: This is the information sent to the AI. It includes:
  - Model: Specifies which version of the AI to use (deepseek_model).
  - Messages: The input message with the role “user” and your prompt.
  - Max Tokens: Defines the maximum size of the AI’s response (maxToken).

Sending the Request:
- It uses the requests.post() method to send the payload and headers to the DeepSeek API using the URL DEEPSEEK_API_URL.

Processing the Response:
- If the API responds successfully (status_code == 200):
  - It extracts the AI’s reply from the response data.
  - Specifically, it gets the first choice’s message content: response.json()["choices"][0]["message"]["content"].
- If there’s an error:
  - It constructs an error message with the status code and detailed error text from the API.

Returning the Result:
- The function outputs either the AI’s response or the error message.

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
    def get_bharatgpt_response(self, prompt: str) -> tuple:
        try:
            messages = [[{"role": "user", "content": prompt}]]
            
            response = pipe(messages, max_new_tokens=maxToken,)

            # Extract 'content' field safely
            res = next((entry.get("content", "")
                        for entry in response[0][0].get("generated_text", [])
                        if isinstance(entry, dict) and entry.get("role") == "assistant"
                        ),
                        None,
                        )
            
            return res
        except Exception as e:
            x = str(e)
            print('Error: ', x)

            return ""

Retry Mechanism:The @retry ensures the function will try again if it fails.
- It will stop retrying after 3 attempts (stop_after_attempt(3)).
- The waiting time between retries starts at 4 seconds and increases exponentially up to 10 seconds (wait_exponential(multiplier=1, min=4, max=10)).
What the Function Does:The function takes one input, prompt, which is the message or question you want to send to BharatGPT.
- It returns the AI’s response or an empty string if something goes wrong.
Sending the Prompt:Messages Structure: The function wraps the user’s prompt in a format that the BharatGPT AI understands:
- messages = [[{"role": "user", "content": prompt}]]
- This tells the AI that the prompt is coming from the “user.”
Pipe Function: It uses a pipe() method to send the messages to the AI system.
- max_new_tokens=maxToken: Limits how long the AI’s response can be.
Extracting the Response:The response from the AI is in a structured format. The code looks for the first piece of text where:
- The role is “assistant” (meaning it’s the AI’s reply).
- The text is in the “content” field.
- The next() function safely extracts this “content” field or returns None if it can’t find it.
Error Handling:If something goes wrong (e.g., the AI doesn’t respond or there’s a technical issue), the code:
- Captures the error message in e.
- Prints the error message: print('Error: ', x).
- Returns an empty string ("") instead of crashing.
Returning the Result:If everything works, the function gives you the AI’s response as plain text.
- If there’s an error, it gives you an empty string, indicating no response was received.

    def get_model_response(self, model_name: str, prompt: str) -> ModelResponse:
        """Get response from specified model with metrics"""
        start_time = time.time()
        start_memory = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024

        try:
            if model_name == "claude-3":
                response_content = self.get_claude_response(prompt)
            elif model_name == "gpt4":
                response_content = self.get_gpt4_response(prompt)
            elif model_name == "deepseek-chat":
                response_content = self.get_deepseek_response(prompt)
            elif model_name == "bharat-gpt":
                response_content = self.get_bharatgpt_response(prompt)

            # Model-specific API calls 
            token_count = len(self.bert_tokenizer.encode(response_content))
            
            end_memory = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024
            memory_usage = end_memory - start_memory
            
            return ModelResponse(
                content=response_content,
                response_time=time.time() - start_time,
                token_count=token_count,
                memory_usage=memory_usage
            )
        except Exception as e:
            logging.error(f"Error getting response from {model_name}: {str(e)}")
            return ModelResponse(
                content="",
                response_time=0,
                token_count=0,
                memory_usage=0,
                error=str(e)
            )

Start Tracking Time and Memory:

The function starts a timer (start_time) to measure how long it takes to get a response.
It also checks how much memory is being used at the beginning (start_memory).

Choose the AI Model:

Based on the model_name provided, the function selects the appropriate method to get a response:
- "claude-3" → Calls get_claude_response(prompt).
- "gpt4" → Calls get_gpt4_response(prompt).
- "deepseek-chat" → Calls get_deepseek_response(prompt).
- "bharat-gpt" → Calls get_bharatgpt_response(prompt).

Process the Response:

Once the response is received, the function calculates:
- Token Count: The number of tokens (small chunks of text) in the response using a tokenizer.
- Memory Usage: The difference between memory usage after the response (end_memory) and before it (start_memory).

Return the Results:

The function bundles all the information into a ModelResponse object:
- The AI’s reply (content).
- How long the response took (response_time).
- The number of tokens in the reply (token_count).
- How much memory was used (memory_usage).

Handle Errors:

If something goes wrong (e.g., the AI doesn’t respond), the function:
- Logs the error message.
- Returns an empty response with default values and the error message.

    def evaluate_text_quality(self, generated: str, reference: str) -> Dict[str, float]:
        """Evaluate text quality metrics"""
        # BERTScore
        gen_embedding = self.sentence_model.encode([generated])
        ref_embedding = self.sentence_model.encode([reference])
        bert_score = cosine_similarity(gen_embedding, ref_embedding)[0][0]

        # BLEU Score
        generated_tokens = word_tokenize(generated.lower())
        reference_tokens = word_tokenize(reference.lower())
        bleu = sentence_bleu([reference_tokens], generated_tokens)

        # METEOR Score
        meteor = meteor_score([reference_tokens], generated_tokens)

        return {
            'bert_score': bert_score,
            'bleu_score': bleu,
            'meteor_score': meteor
        }

Inputs:

generated: The text produced by the AI.
reference: The correct or expected version of the text.

Calculating BERTScore:

Converts the generated and reference texts into numerical embeddings (mathematical representations) using a pre-trained model (self.sentence_model.encode).
Measures the similarity between the two embeddings using cosine similarity. This gives the bert_score, which ranges from -1 (completely different) to 1 (very similar).

Calculating BLEU Score:

Breaks the generated and reference texts into individual words (tokens) using word_tokenize.
Converts both texts to lowercase for consistent comparison.
Calculates the BLEU Score (sentence_bleu), which checks how many words or phrases in the generated text overlap with the reference. BLEU values range from 0 (no match) to 1 (perfect match).

Calculating METEOR Score:

Also uses the tokenized versions of generated and reference texts.
Calculates the METEOR Score (meteor_score), which considers exact matches, synonyms, and word order. Scores range from 0 (no match) to 1 (perfect match).

Returning the Results:

Combines the three scores into a dictionary with the keys 'bert_score', 'bleu_score', and 'meteor_score'.

Similarly, other functions are developed.

    def run_comprehensive_evaluation(self, evaluation_data: List[Dict]) -> pd.DataFrame:
        """Run comprehensive evaluation on all metrics"""
        results = []
        
        for item in evaluation_data:
            prompt = item['prompt']
            reference = item['reference']
            task_criteria = item.get('task_criteria', {})
            
            for model_name in self.model_configs.keys():
                # Get multiple responses to evaluate reliability
                responses = [
                    self.get_model_response(model_name, prompt)
                    for _ in range(3)  # Get 3 responses for reliability testing
                ]
                
                # Use the best response for other evaluations
                best_response = max(responses, key=lambda x: len(x.content) if not x.error else 0)
                
                if best_response.error:
                    logging.error(f"Error in model {model_name}: {best_response.error}")
                    continue
                
                # Gather all metrics
                metrics = {
                    'model': model_name,
                    'prompt': prompt,
                    'response': best_response.content,
                    **self.evaluate_text_quality(best_response.content, reference),
                    **self.evaluate_factual_accuracy(best_response.content, reference),
                    **self.evaluate_task_performance(best_response.content, task_criteria),
                    **self.evaluate_technical_performance(best_response),
                    **self.evaluate_reliability(responses),
                    **self.evaluate_safety(best_response.content)
                }
                
                # Add business impact metrics using task performance
                metrics.update(self.evaluate_business_impact(
                    best_response,
                    metrics['task_completion']
                ))
                
                results.append(metrics)
        
        return pd.DataFrame(results)

Input:
- evaluation_data: A list of test cases, where each case is a dictionary containing:
  - prompt: The question or input to the AI model.
  - reference: The ideal or expected answer.
  - task_criteria (optional): Additional rules or requirements for the task.
Initialize Results:
- An empty list results is created to store the evaluation metrics for each model and test case.
Iterate Through Test Cases:
- For each item in the evaluation_data:
  - Extract the prompt, reference, and task_criteria.
Evaluate Each Model:
- Loop through all available AI models (self.model_configs.keys()).
- Generate three responses for each model to test reliability.
Select the Best Response:
- Out of the three responses, pick the one with the most content (best_response), ignoring responses with errors.
Handle Errors:
- If a response has an error, log the issue and skip further evaluation for that model.
Evaluate Metrics:
- Using the best_response, calculate a variety of metrics, including:
  - Text Quality: How similar the response is to the reference.
  - Factual Accuracy: Whether the response is factually correct.
  - Task Performance: How well it meets task-specific criteria.
  - Technical Performance: Evaluate time, memory, or other system-related metrics.
  - Reliability: Check consistency across multiple responses.
  - Safety: Ensure the response is safe and appropriate.
Evaluate Business Impact:
- Add metrics for business impact (e.g., how well the task was completed, using task_completion as a key factor).
Store Results:
- Add the calculated metrics for this model and prompt to the results list.
Return Results as a DataFrame:
- Convert the results list into a structured table (a pandas DataFrame) for easy analysis and visualization.

Great! So, now, we’ve explained the code.

Observation from the result:

Let us understand the final outcome of this run & what we can conclude from that.

BERT Score (Semantic Understanding):
- GPT4 leads slightly at 0.8322 (83.22%)
- Bharat-GPT close second at 0.8118 (81.18%)
- Claude-3 at 0.8019 (80.19%)
- DeepSeek-Chat at 0.7819 (78.19%) Think of this like a “comprehension score” – how well the models understand the context. All models show strong understanding, with only a 5% difference between best and worst.
BLEU Score (Word-for-Word Accuracy):
- Bharat-GPT leads at 0.0567 (5.67%)
- Claude-3 at 0.0344 (3.44%)
- GPT4 at 0.0306 (3.06%)
- DeepSeek-Chat lowest at 0.0189 (1.89%) These low scores suggest models use different wording than references, which isn’t necessarily bad.
METEOR Score (Meaning Preservation):
- Bharat-GPT leads at 0.4684 (46.84%)
- Claude-3 close second at 0.4507 (45.07%)
- GPT4 at 0.2960 (29.60%)
- DeepSeek-Chat at 0.2652 (26.52%) This shows how well models maintain meaning while using different words.
Response Time (Speed):
- Claude-3 fastest: 4.40 seconds
- Bharat-GPT: 6.35 seconds
- GPT4: 6.43 seconds
- DeepSeek-Chat slowest: 8.52 seconds
Safety and Reliability:
- Error Rate: Perfect 0.0 for all models
- Toxicity: All very safe (below 0.15%)
  - Claude-3 safest at 0.0007GPT4 at 0.0008Bharat-GPT at 0.0012
  - DeepSeek-Chat at 0.0014
Cost Efficiency:
- Claude-3 most economical: $0.0019 per response
- Bharat-GPT close: $0.0021
- GPT4: $0.0038
- DeepSeek-Chat highest: $0.0050

Key Takeaways by Model:

Claude-3: ✓ Fastest responses ✓ Most cost-effective ✓ Excellent meaning preservation ✓ Lowest toxicity
Bharat-GPT: ✓ Best BLEU and METEOR scores ✓ Strong semantic understanding ✓ Cost-effective ✗ Moderate response time
GPT4: ✓ Best semantic understanding ✓ Good safety metrics ✗ Higher cost ✗ Moderate response time
DeepSeek-Chat: ✗ Generally lower performance ✗ Slowest responses ✗ Highest cost ✗ Slightly higher toxicity

Reliability of These Statistics:

Strong Points:

Comprehensive metric coverage
Consistent patterns across evaluations
Zero error rates show reliability
Clear differentiation between models

Limitations:

BLEU scores are quite low across all models
Doesn’t measure creative or innovative responses
May not reflect specific use case performance
Single snapshot rather than long-term performance

Final Observation:

Best Overall Value: Claude-3
- Fast, cost-effective, safe, good performance
Best for Accuracy: Bharat-GPT
- Highest meaning preservation and precision
Best for Understanding: GPT4
- Strongest semantic comprehension
Consider Your Priorities:
- Speed → Choose Claude-3
- Cost → Choose Claude-3 or Bharat-GPT
- Accuracy → Choose Bharat-GPT
- Understanding → Choose GPT4

These statistics provide reliable comparative data but should be part of a broader decision-making process that includes your specific needs, budget, and use cases.

For the Bharat GPT model, we’ve tested this locally on my MacBook Pro 4 Max. And, the configuration is as follows –

I’ve tried the API version locally, & it provided a similar performance against the stats that we received by running locally. Unfortunately, they haven’t made the API version public yet.

So, apart from the Anthropic & Open AI, I’ll watch this new LLM (Bharat GPT) for overall stats in the coming days.

So, we’ve done it.

You can find the detailed code at the GitHub lin k.

I’ll bring some more exciting topics in the coming days from the Python verse.

Till then, Happy Avenging! 🙂

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Share this:

Like this:

1. Registering an Agent

2. Subscribing to Another Agent

3. Sending a Message

4. Receiving Messages

5. Viewing Past Conversations

Share this:

Like this:

Share this:

Like this: