autogen Archives

Real-time video summary assistance App – Part 2

Posted on April 21, 2025April 21, 2025 by SatyakiDe in ai, anthropic, api, Azure, call, cloud, code, computing, Crossplatform, Data Science, design, function, gpt3, IoT, json, LangChain, machine-learning, mcpprotocol, Microsoft, natural-language, objects, openai, Performance, Python, Real-time, sarvam, sql, Technology, video, youtubedataapi

As a continuation of the previous post, I would like to continue my discussion about the implementation of MCP protocols among agents. But before that, I want to add the quick demo one more time to recap our objectives.

Let us recap the process flow –

Also, understand the groupings of scripts by each group as posted in the previous post –

Message-Chaining Protocol (MCP) Implementation:

    clsMCPMessage.py
    clsMCPBroker.py

YouTube Transcript Extraction:

    clsYouTubeVideoProcessor.py

Language Detection:

    clsLanguageDetector.py

Translation Services & Agents:

    clsTranslationAgent.py
    clsTranslationService.py

Documentation Agent:

    clsDocumentationAgent.py
    
Research Agent:

    clsDocumentationAgent.py

Great! Now, we’ll continue with the main discussion.

CODE:

clsYouTubeVideoProcessor.py (This class processes the transcripts from YouTube. It may also translate them into English for non-native speakers.)

def extract_youtube_id(youtube_url):
    """Extract YouTube video ID from URL"""
    youtube_id_match = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11}).*', youtube_url)
    if youtube_id_match:
        return youtube_id_match.group(1)
    return None

def get_youtube_transcript(youtube_url):
    """Get transcript from YouTube video"""
    video_id = extract_youtube_id(youtube_url)
    if not video_id:
        return {"error": "Invalid YouTube URL or ID"}
    
    try:
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
        
        # First try to get manual transcripts
        try:
            transcript = transcript_list.find_manually_created_transcript(["en"])
            transcript_data = transcript.fetch()
            print(f"Debug - Manual transcript format: {type(transcript_data)}")
            if transcript_data and len(transcript_data) > 0:
                print(f"Debug - First item type: {type(transcript_data[0])}")
                print(f"Debug - First item sample: {transcript_data[0]}")
            return {"text": transcript_data, "language": "en", "auto_generated": False}
        except Exception as e:
            print(f"Debug - No manual transcript: {str(e)}")
            # If no manual English transcript, try any available transcript
            try:
                available_transcripts = list(transcript_list)
                if available_transcripts:
                    transcript = available_transcripts[0]
                    print(f"Debug - Using transcript in language: {transcript.language_code}")
                    transcript_data = transcript.fetch()
                    print(f"Debug - Auto transcript format: {type(transcript_data)}")
                    if transcript_data and len(transcript_data) > 0:
                        print(f"Debug - First item type: {type(transcript_data[0])}")
                        print(f"Debug - First item sample: {transcript_data[0]}")
                    return {
                        "text": transcript_data, 
                        "language": transcript.language_code, 
                        "auto_generated": transcript.is_generated
                    }
                else:
                    return {"error": "No transcripts available for this video"}
            except Exception as e:
                return {"error": f"Error getting transcript: {str(e)}"}
    except Exception as e:
        return {"error": f"Error getting transcript list: {str(e)}"}

# ----------------------------------------------------------------------------------
# YouTube Video Processor
# ----------------------------------------------------------------------------------

class clsYouTubeVideoProcessor:
    """Process YouTube videos using the agent system"""
    
    def __init__(self, documentation_agent, translation_agent, research_agent):
        self.documentation_agent = documentation_agent
        self.translation_agent = translation_agent
        self.research_agent = research_agent
    
    def process_youtube_video(self, youtube_url):
        """Process a YouTube video"""
        print(f"Processing YouTube video: {youtube_url}")
        
        # Extract transcript
        transcript_result = get_youtube_transcript(youtube_url)
        
        if "error" in transcript_result:
            return {"error": transcript_result["error"]}
        
        # Start a new conversation
        conversation_id = self.documentation_agent.start_processing()
        
        # Process transcript segments
        transcript_data = transcript_result["text"]
        transcript_language = transcript_result["language"]
        
        print(f"Debug - Type of transcript_data: {type(transcript_data)}")
        
        # For each segment, detect language and translate if needed
        processed_segments = []
        
        try:
            # Make sure transcript_data is a list of dictionaries with text and start fields
            if isinstance(transcript_data, list):
                for idx, segment in enumerate(transcript_data):
                    print(f"Debug - Processing segment {idx}, type: {type(segment)}")
                    
                    # Extract text properly based on the type
                    if isinstance(segment, dict) and "text" in segment:
                        text = segment["text"]
                        start = segment.get("start", 0)
                    else:
                        # Try to access attributes for non-dict types
                        try:
                            text = segment.text
                            start = getattr(segment, "start", 0)
                        except AttributeError:
                            # If all else fails, convert to string
                            text = str(segment)
                            start = idx * 5  # Arbitrary timestamp
                    
                    print(f"Debug - Extracted text: {text[:30]}...")
                    
                    # Create a standardized segment
                    std_segment = {
                        "text": text,
                        "start": start
                    }
                    
                    # Process through translation agent
                    translation_result = self.translation_agent.process_text(text, conversation_id)
                    
                    # Update segment with translation information
                    segment_with_translation = {
                        **std_segment,
                        "translation_info": translation_result
                    }
                    
                    # Use translated text for documentation
                    if "final_text" in translation_result and translation_result["final_text"] != text:
                        std_segment["processed_text"] = translation_result["final_text"]
                    else:
                        std_segment["processed_text"] = text
                    
                    processed_segments.append(segment_with_translation)
            else:
                # If transcript_data is not a list, treat it as a single text block
                print(f"Debug - Transcript is not a list, treating as single text")
                text = str(transcript_data)
                std_segment = {
                    "text": text,
                    "start": 0
                }
                
                translation_result = self.translation_agent.process_text(text, conversation_id)
                segment_with_translation = {
                    **std_segment,
                    "translation_info": translation_result
                }
                
                if "final_text" in translation_result and translation_result["final_text"] != text:
                    std_segment["processed_text"] = translation_result["final_text"]
                else:
                    std_segment["processed_text"] = text
                
                processed_segments.append(segment_with_translation)
                
        except Exception as e:
            print(f"Debug - Error processing transcript: {str(e)}")
            return {"error": f"Error processing transcript: {str(e)}"}
        
        # Process the transcript with the documentation agent
        documentation_result = self.documentation_agent.process_transcript(
            processed_segments,
            conversation_id
        )
        
        return {
            "youtube_url": youtube_url,
            "transcript_language": transcript_language,
            "processed_segments": processed_segments,
            "documentation": documentation_result,
            "conversation_id": conversation_id
        }

Let us understand this step-by-step:

Part 1: Getting the YouTube Transcript

def extract_youtube_id(youtube_url):
    ...

This extracts the unique video ID from any YouTube link.

def get_youtube_transcript(youtube_url):
    ...

This gets the actual spoken content of the video.
It tries to get a manual transcript first (created by humans).
If not available, it falls back to an auto-generated version (created by YouTube’s AI).
If nothing is found, it gives back an error message like: “Transcript not available.”

Part 2: Processing the Video with Agents

class clsYouTubeVideoProcessor:
    ...

This is like the control center that tells each intelligent agent what to do with the transcript. Here are the detailed steps:

1. Start the Process

def process_youtube_video(self, youtube_url):
    ...

The system starts with a YouTube video link.
It prints a message like: “Processing YouTube video: [link]”

2. Extract the Transcript

The system runs the get_youtube_transcript() function.
If it fails, it returns an error (e.g., invalid link or no subtitles available).

3. Start a “Conversation”

The documentation agent begins a new session, tracked by a unique conversation ID.
Think of this like opening a new folder in a shared team workspace to store everything related to this video.

4. Go Through Each Segment of the Transcript

The spoken text is often broken into small parts (segments), like subtitles.
For each part:
- It checks the text.
- It finds out the time that part was spoken.
- It sends it to the translation agent to clean up or translate the text.

5. Translate (if needed)

If the translation agent finds a better or translated version, it replaces the original.
Otherwise, it keeps the original.

6. Prepare for Documentation

After translation, the segment is passed to the documentation agent.
This agent might:
- Summarize the content,
- Highlight important terms,
- Structure it into a readable format.

7. Return the Final Result

The system gives back a structured package with:

The video link
The original language
The transcript in parts (processed and translated)
A documentation summary
The conversation ID (for tracking or further updates)

clsDocumentationAgent.py (This is the main class that will be part of the document agents.)

class clsDocumentationAgent:
    """Documentation Agent built with LangChain"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Initialize LangChain components
        self.llm = ChatOpenAI(
            model="gpt-4-0125-preview",
            temperature=0.1,
            api_key=OPENAI_API_KEY
        )
        
        # Create tools
        self.tools = [
            clsSendMessageTool(sender_id=self.agent_id, broker=self.broker)
        ]
        
        # Set up LLM with tools
        self.llm_with_tools = self.llm.bind(
            tools=[tool.tool_config for tool in self.tools]
        )
        
        # Setup memory
        self.memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )
        
        # Create prompt
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a Documentation Agent for YouTube video transcripts. Your responsibilities include:
                1. Process YouTube video transcripts
                2. Identify key points, topics, and main ideas
                3. Organize content into a coherent and structured format
                4. Create concise summaries
                5. Request research information when necessary
                
                When you need additional context or research, send a request to the Research Agent.
                Always maintain a professional tone and ensure your documentation is clear and organized.
            """),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
        ])
        
        # Create agent
        self.agent = (
            {
                "input": lambda x: x["input"],
                "chat_history": lambda x: self.memory.load_memory_variables({})["chat_history"],
                "agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
            }
            | self.prompt
            | self.llm_with_tools
            | OpenAIToolsAgentOutputParser()
        )
        
        # Create agent executor
        self.agent_executor = AgentExecutor(
            agent=self.agent,
            tools=self.tools,
            verbose=True,
            memory=self.memory
        )
        
        # Video data
        self.current_conversation_id = None
        self.video_notes = {}
        self.key_points = []
        self.transcript_segments = []
        
    def start_processing(self) -> str:
        """Start processing a new video"""
        self.current_conversation_id = str(uuid.uuid4())
        self.video_notes = {}
        self.key_points = []
        self.transcript_segments = []
        
        return self.current_conversation_id
    
    def process_transcript(self, transcript_segments, conversation_id=None):
        """Process a YouTube transcript"""
        if not conversation_id:
            conversation_id = self.start_processing()
        self.current_conversation_id = conversation_id
        
        # Store transcript segments
        self.transcript_segments = transcript_segments
        
        # Process segments
        processed_segments = []
        for segment in transcript_segments:
            processed_result = self.process_segment(segment)
            processed_segments.append(processed_result)
        
        # Generate summary
        summary = self.generate_summary()
        
        return {
            "processed_segments": processed_segments,
            "summary": summary,
            "conversation_id": conversation_id
        }
    
    def process_segment(self, segment):
        """Process individual transcript segment"""
        text = segment.get("text", "")
        start = segment.get("start", 0)
        
        # Use LangChain agent to process the segment
        result = self.agent_executor.invoke({
            "input": f"Process this video transcript segment at timestamp {start}s: {text}. If research is needed, send a request to the research_agent."
        })
        
        # Update video notes
        timestamp = start
        self.video_notes[timestamp] = {
            "text": text,
            "analysis": result["output"]
        }
        
        return {
            "timestamp": timestamp,
            "text": text,
            "analysis": result["output"]
        }
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "research_response":
            # Process research information received from Research Agent
            research_info = message.content.get("text", "")
            
            result = self.agent_executor.invoke({
                "input": f"Incorporate this research information into video analysis: {research_info}"
            })
            
            # Send acknowledgment back to Research Agent
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="acknowledgment",
                content={"text": "Research information incorporated into video analysis."},
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        elif message.message_type == "translation_response":
            # Process translation response from Translation Agent
            translation_result = message.content
            
            # Process the translated text
            if "final_text" in translation_result:
                text = translation_result["final_text"]
                original_text = translation_result.get("original_text", "")
                language_info = translation_result.get("language", {})
                
                result = self.agent_executor.invoke({
                    "input": f"Process this translated text: {text}\nOriginal language: {language_info.get('language', 'unknown')}\nOriginal text: {original_text}"
                })
                
                # Update notes with translation information
                for timestamp, note in self.video_notes.items():
                    if note["text"] == original_text:
                        note["translated_text"] = text
                        note["language"] = language_info
                        break
            
            return None
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Documentation Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)
    
    def generate_summary(self) -> str:
        """Generate a summary of the video"""
        if not self.video_notes:
            return "No video data available to summarize."
        
        all_notes = "\n".join([f"{ts}: {note['text']}" for ts, note in self.video_notes.items()])
        
        result = self.agent_executor.invoke({
            "input": f"Generate a concise summary of this YouTube video, including key points and topics:\n{all_notes}"
        })
        
        return result["output"]

Let us understand the key methods in a step-by-step manner:

The Documentation Agent is like a smart assistant that watches a YouTube video, takes notes, pulls out important ideas, and creates a summary — almost like a professional note-taker trained to help educators, researchers, and content creators. It works with a team of other assistants, like a Translator Agent and a Research Agent, and they all talk to each other through a messaging system.

1. Starting to Work on a New Video

def start_processing(self) -> str

When a new video is being processed:

A new project ID is created.
Old notes and transcripts are cleared to start fresh.

2. Processing the Whole Transcript

def process_transcript(...)

This is where the assistant:

Takes in the full transcript (what was said in the video).
Breaks it into small parts (like subtitles).
Sends each part to the smart brain for analysis.
Collects the results.
Finally, a summary of all the main ideas is created.

3. Processing One Transcript Segment at a Time

def process_segment(self, segment)

For each chunk of the video:

The assistant reads the text and timestamp.
It asks GPT-4 to analyze it and suggest important insights.
It saves that insight along with the original text and timestamp.

4. Handling Incoming Messages from Other Agents

def handle_mcp_message(self, message)

The assistant can also receive messages from teammates (other agents):

If the message is from the Research Agent:

It reads new information and adds it to its notes.
It replies with a thank-you message to say it got the research.

If the message is from the Translation Agent:

It takes the translated version of a transcript.
Updates its notes to reflect the translated text and its language.

This is like a team of assistants emailing back and forth to make sure the notes are complete and accurate.

5. Summarizing the Whole Video

def generate_summary(self)

After going through all the transcript parts, the agent asks GPT-4 to create a short, clean summary — identifying:

Main ideas
Key talking points
Structure of the content

The final result is clear, professional, and usable in learning materials or documentation.

clsResearchAgent.py (This is the main class that implements the research agent.)

class clsResearchAgent:
    """Research Agent built with AutoGen"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Configure AutoGen directly with API key
        if not OPENAI_API_KEY:
            print("Warning: OPENAI_API_KEY not set for ResearchAgent")
            
        # Create config list directly instead of loading from file
        config_list = [
            {
                "model": "gpt-4-0125-preview",
                "api_key": OPENAI_API_KEY
            }
        ]
        # Create AutoGen assistant for research
        self.assistant = AssistantAgent(
            name="research_assistant",
            system_message="""You are a Research Agent for YouTube videos. Your responsibilities include:
                1. Research topics mentioned in the video
                2. Find relevant information, facts, references, or context
                3. Provide concise, accurate information to support the documentation
                4. Focus on delivering high-quality, relevant information
                
                Respond directly to research requests with clear, factual information.
            """,
            llm_config={"config_list": config_list, "temperature": 0.1}
        )
        
        # Create user proxy to handle message passing
        self.user_proxy = UserProxyAgent(
            name="research_manager",
            human_input_mode="NEVER",
            code_execution_config={"work_dir": "coding", "use_docker": False},
            default_auto_reply="Working on the research request..."
        )
        
        # Current conversation tracking
        self.current_requests = {}
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "request":
            # Process research request from Documentation Agent
            request_text = message.content.get("text", "")
            
            # Use AutoGen to process the research request
            def research_task():
                self.user_proxy.initiate_chat(
                    self.assistant,
                    message=f"Research request for YouTube video content: {request_text}. Provide concise, factual information."
                )
                # Return last assistant message
                return self.assistant.chat_messages[self.user_proxy.name][-1]["content"]
            
            # Execute research task
            research_result = research_task()
            
            # Send research results back to Documentation Agent
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="research_response",
                content={"text": research_result},
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Research Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)

Let us understand the key methods in detail.

1. Receiving and Responding to Research Requests

def handle_mcp_message(self, message)

When the Research Agent gets a message (like a question or request for info), it:

Reads the message to see what needs to be researched.
Asks GPT-4 to find helpful, accurate info about that topic.
Sends the answer back to whoever asked the question (usually the Documentation Agent).

clsTranslationAgent.py (This is the main class that represents the translation agent)

class clsTranslationAgent:
    """Agent for language detection and translation"""
    
    def __init__(self, agent_id: str, broker: clsMCPBroker):
        self.agent_id = agent_id
        self.broker = broker
        self.broker.register_agent(agent_id)
        
        # Initialize language detector
        self.language_detector = clsLanguageDetector()
        
        # Initialize translation service
        self.translation_service = clsTranslationService()
    
    def process_text(self, text, conversation_id=None):
        """Process text: detect language and translate if needed, handling mixed language content"""
        if not conversation_id:
            conversation_id = str(uuid.uuid4())
        
        # Detect language with support for mixed language content
        language_info = self.language_detector.detect(text)
        
        # Decide if translation is needed
        needs_translation = True
        
        # Pure English content doesn't need translation
        if language_info["language_code"] == "en-IN" or language_info["language_code"] == "unknown":
            needs_translation = False
        
        # For mixed language, check if it's primarily English
        if language_info.get("is_mixed", False) and language_info.get("languages", []):
            english_langs = [
                lang for lang in language_info.get("languages", []) 
                if lang["language_code"] == "en-IN" or lang["language_code"].startswith("en-")
            ]
            
            # If the highest confidence language is English and > 60% confident, don't translate
            if english_langs and english_langs[0].get("confidence", 0) > 0.6:
                needs_translation = False
        
        if needs_translation:
            # Translate using the appropriate service based on language detection
            translation_result = self.translation_service.translate(text, language_info)
            
            return {
                "original_text": text,
                "language": language_info,
                "translation": translation_result,
                "final_text": translation_result.get("translated_text", text),
                "conversation_id": conversation_id
            }
        else:
            # Already English or unknown language, return as is
            return {
                "original_text": text,
                "language": language_info,
                "translation": {"provider": "none"},
                "final_text": text,
                "conversation_id": conversation_id
            }
    
    def handle_mcp_message(self, message: clsMCPMessage) -> Optional[clsMCPMessage]:
        """Handle an incoming MCP message"""
        if message.message_type == "translation_request":
            # Process translation request from Documentation Agent
            text = message.content.get("text", "")
            
            # Process the text
            result = self.process_text(text, message.conversation_id)
            
            # Send translation results back to requester
            response = clsMCPMessage(
                sender=self.agent_id,
                receiver=message.sender,
                message_type="translation_response",
                content=result,
                reply_to=message.id,
                conversation_id=message.conversation_id
            )
            
            self.broker.publish(response)
            return response
        
        return None
    
    def run(self):
        """Run the agent to listen for MCP messages"""
        print(f"Translation Agent {self.agent_id} is running...")
        while True:
            message = self.broker.get_message(self.agent_id, timeout=1)
            if message:
                self.handle_mcp_message(message)
            time.sleep(0.1)

Let us understand the key methods in step-by-step manner:

1. Understanding and Translating Text:

def process_text(...)

This is the core job of the agent. Here’s what it does with any piece of text:

Step 1: Detect the Language

It tries to figure out the language of the input text.
It can handle cases where more than one language is mixed together, which is common in casual speech or subtitles.

Step 2: Decide Whether to Translate

If the text is clearly in English, or it’s unclear what the language is, it decides not to translate.
If the text is mostly in another language or has less than 60% confidence in being English, it will translate it into English.

Step 3: Translate (if needed)

If translation is required, it uses the translation service to do the job.
Then it packages all the information: the original text, detected language, the translated version, and a unique conversation ID.

Step 4: Return the Results

If no translation is needed, it returns the original text and a note saying “no translation was applied.”

2. Receiving Messages and Responding

def handle_mcp_message(...)

The agent listens for messages from other agents. When someone asks it to translate something:

It takes the text from the message.
Runs it through the process_text function (as explained above).
Sends the translated (or original) result to the person who asked.

clsTranslationService.py (This is the actual work process of translation by the agent)

class clsTranslationService:
    """Translation service using multiple providers with support for mixed languages"""
    
    def __init__(self):
        # Initialize Sarvam AI client
        self.sarvam_api_key = SARVAM_API_KEY
        self.sarvam_url = "https://api.sarvam.ai/translate"
        
        # Initialize Google Cloud Translation client using simple HTTP requests
        self.google_api_key = GOOGLE_API_KEY
        self.google_translate_url = "https://translation.googleapis.com/language/translate/v2"
    
    def translate_with_sarvam(self, text, source_lang, target_lang="en-IN"):
        """Translate text using Sarvam AI (for Indian languages)"""
        if not self.sarvam_api_key:
            return {"error": "Sarvam API key not set"}
        
        headers = {
            "Content-Type": "application/json",
            "api-subscription-key": self.sarvam_api_key
        }
        
        payload = {
            "input": text,
            "source_language_code": source_lang,
            "target_language_code": target_lang,
            "speaker_gender": "Female",
            "mode": "formal",
            "model": "mayura:v1"
        }
        
        try:
            response = requests.post(self.sarvam_url, headers=headers, json=payload)
            if response.status_code == 200:
                return {"translated_text": response.json().get("translated_text", ""), "provider": "sarvam"}
            else:
                return {"error": f"Sarvam API error: {response.text}", "provider": "sarvam"}
        except Exception as e:
            return {"error": f"Error calling Sarvam API: {str(e)}", "provider": "sarvam"}
    
    def translate_with_google(self, text, target_lang="en"):
        """Translate text using Google Cloud Translation API with direct HTTP request"""
        if not self.google_api_key:
            return {"error": "Google API key not set"}
        
        try:
            # Using the translation API v2 with API key
            params = {
                "key": self.google_api_key,
                "q": text,
                "target": target_lang
            }
            
            response = requests.post(self.google_translate_url, params=params)
            if response.status_code == 200:
                data = response.json()
                translation = data.get("data", {}).get("translations", [{}])[0]
                return {
                    "translated_text": translation.get("translatedText", ""),
                    "detected_source_language": translation.get("detectedSourceLanguage", ""),
                    "provider": "google"
                }
            else:
                return {"error": f"Google API error: {response.text}", "provider": "google"}
        except Exception as e:
            return {"error": f"Error calling Google Translation API: {str(e)}", "provider": "google"}
    
    def translate(self, text, language_info):
        """Translate text to English based on language detection info"""
        # If already English or unknown language, return as is
        if language_info["language_code"] == "en-IN" or language_info["language_code"] == "unknown":
            return {"translated_text": text, "provider": "none"}
        
        # Handle mixed language content
        if language_info.get("is_mixed", False) and language_info.get("languages", []):
            # Strategy for mixed language: 
            # 1. If one of the languages is English, don't translate the entire text, as it might distort English portions
            # 2. If no English but contains Indian languages, use Sarvam as it handles code-mixing better
            # 3. Otherwise, use Google Translate for the primary detected language
            
            has_english = False
            has_indian = False
            
            for lang in language_info.get("languages", []):
                if lang["language_code"] == "en-IN" or lang["language_code"].startswith("en-"):
                    has_english = True
                if lang.get("is_indian", False):
                    has_indian = True
            
            if has_english:
                # Contains English - use Google for full text as it handles code-mixing well
                return self.translate_with_google(text)
            elif has_indian:
                # Contains Indian languages - use Sarvam
                # Use the highest confidence Indian language as source
                indian_langs = [lang for lang in language_info.get("languages", []) if lang.get("is_indian", False)]
                if indian_langs:
                    # Sort by confidence
                    indian_langs.sort(key=lambda x: x.get("confidence", 0), reverse=True)
                    source_lang = indian_langs[0]["language_code"]
                    return self.translate_with_sarvam(text, source_lang)
                else:
                    # Fallback to primary language
                    if language_info["is_indian"]:
                        return self.translate_with_sarvam(text, language_info["language_code"])
                    else:
                        return self.translate_with_google(text)
            else:
                # No English, no Indian languages - use Google for primary language
                return self.translate_with_google(text)
        else:
            # Not mixed language - use standard approach
            if language_info["is_indian"]:
                # Use Sarvam AI for Indian languages
                return self.translate_with_sarvam(text, language_info["language_code"])
            else:
                # Use Google for other languages
                return self.translate_with_google(text)

This Translation Service is like a smart translator that knows how to:

Detect what language the text is written in,
Choose the best translation provider depending on the language (especially for Indian languages),
And then translate the text into English.

It supports mixed-language content (such as Hindi-English in one sentence) and uses either Google Translate or Sarvam AI, a translation service designed for Indian languages.

Now, let us understand the key methods in a step-by-step manner:

1. Translating Using Google Translate

def translate_with_google(...)

This function uses Google Translate:

It sends the text, asks for English as the target language, and gets a translation back.
It also detects the source language automatically.
If successful, it returns the translated text and the detected original language.
If there’s an error, it returns a message saying what went wrong.

Best For: Non-Indian languages (like Spanish, French, Chinese) and content that is not mixed with English.

2. Main Translation Logic

def translate(self, text, language_info)

This is the decision-maker. Here’s how it works:

Case 1: No Translation Needed

If the text is already in English or the language is unknown, it simply returns the original text.

Case 2: Mixed Language (e.g., Hindi + English)

If the text contains more than one language:

✅ If one part is English → use Google Translate (it’s good with mixed languages).
✅ If it includes Indian languages only → use Sarvam AI (better at handling Indian content).
✅ If it’s neither English nor Indian → use Google Translate.

The service checks how confident it is about each language in the mix and chooses the most likely one to translate from.

Case 3: Single Language

If the text is only in one language:

✅ If it’s an Indian language (like Bengali, Tamil, or Marathi), use Sarvam AI.
✅ If it’s any other language, use Google Translate.

So, we’ve done it.

I’ve included the complete working solutions for you in the GitHub Link.

We’ll cover the detailed performance testing, Optimized configurations & many other useful details in our next post.

Till then, Happy Avenging! 🙂

Note: All the data & scenarios posted here are representational data & scenarios & available over the internet & for educational purposes only. There is always room for improvement in this kind of model & the solution associated with it. I’ve shown the basic ways to achieve the same for educational purposes only.

Building solutions using LLM AutoGen in Python – Part 3

Posted on October 28, 2024October 28, 2024 by SatyakiDe in api, Azure, cloud, code, Data Science, design, json, objects, openai, Pandas, Performance, Python, sql

Before we dive into the details of this post, let us provide the previous two links that precede it.

Building solutions using LLM AutoGen in Python – Part 1

Building solutions using LLM AutoGen in Python – Part 2

For, reference, we’ll share the demo before deep dive into the actual follow-up analysis in the below section –

In this post, we will understand the initial code generated & then the revised code to compare them for a better understanding of the impact of revised prompts.

But, before that let us broadly understand the communication types between the agents.

Direct Communication:

Agents Involved: Agent1, Agent2
Flow:
- Agent1 sends a request directly to Agent2.
- Agent2 processes the request and sends the response back to Agent1.
Use Case: Simple query-response interactions without intermediaries.

Mediator-Based Communication:

Agents Involved: UserAgent, Mediator, SpecialistAgent1, SpecialistAgent2
Flow:
- UserAgent sends input to Mediator.
- Mediator delegates tasks to SpecialistAgent1 and SpecialistAgent2.
- Specialists process tasks and return results to Mediator.
- Mediator consolidates results and sends them back to UserAgent.

Broadcast Communication:

Agents Involved: Broadcaster, AgentA, AgentB, AgentC
Flow:
- Broadcaster sends a message to multiple agents simultaneously.
- Agents that find the message relevant (AgentA, AgentC) acknowledge or respond.
Use Case: System-wide notifications or alerts.

Hierarchical Communication:

Agents Involved: Supervisor, Worker1, Worker2
Flow:
- Supervisor assigns tasks to Worker1 and Worker2.
- Workers execute tasks and report progress back to Supervisor.
Use Case: Task delegation in structured organizations.

Publish/Subscribe Communication:

Agents Involved: Publisher, Subscriber1, Topic
Flow:
- Publisher publishes an event or message to a Topic.
- Subscriber1, who is subscribed to the Topic, receives the event.
Use Case: Decoupled systems where publishers and subscribers do not need direct knowledge of each other.

Event-Driven Communication:

Agents Involved: TriggerEvent, ReactiveAgent, NextStep
Flow:
- An event occurs (TriggerEvent).
- ReactiveAgent detects the event and acts.
- The action leads to the NextStep in the process.
Use Case: Systems that need to respond to asynchronous events or changes in the environment.

Since, we now understand the basic communication types. Let us understand the AutoGen generated first code & the last code (That satisfies our need) –

# filename: simple_snake.py (Generated by AutoGen)

import pygame
import time
import random
 
snake_speed = 15
 
# Window color
white = pygame.Color(255, 255, 255)
 
# Snake color
green = pygame.Color(0, 255, 0)
 
snake_position = [100, 50]
 
# defining first 4 blocks 
# of snake body
snake_body = [ [100, 50], 
               [90, 50],
               [80, 50],
               [70, 50]
            ]
# fruit position
fruit_position = [random.randrange(1, (1000//10)) * 10, 
                  random.randrange(1, (600//10)) * 10]
fruit_spawn = True
 
direction = 'RIGHT'
change_to = direction
 
score = 0
 
# Initialising pygame
pygame.init()
 
# Initialise game window
win = pygame.display.set_mode((1000, 600))
pygame.display.set_caption("Snake game for kids")
 
# FPS (frames per second) controller
fps_controller = pygame.time.Clock()
 
  
while True:
    # handling key events
    for event in pygame.event.get():
        if event.type == pygame.KEYDOWN:
            if event.key == pygame.K_UP:
                change_to = 'UP'
            if event.key == pygame.K_DOWN:
                change_to = 'DOWN'
            if event.key == pygame.K_LEFT:
                change_to = 'LEFT'
            if event.key == pygame.K_RIGHT:
                change_to = 'RIGHT'

    # If two keys pressed simultaneously
    # we don't want snake to move into two
    # directions simultaneously
    if change_to == 'UP' and direction != 'DOWN':
        direction = 'UP'
    if change_to == 'DOWN' and direction != 'UP':
        direction = 'DOWN'
    if change_to == 'LEFT' and direction != 'RIGHT':
        direction = 'LEFT'
    if change_to == 'RIGHT' and direction != 'LEFT':
        direction = 'RIGHT'
 
    # Moving the snake
    if direction == 'UP':
        snake_position[1] -= 10
    if direction == 'DOWN':
        snake_position[1] += 10
    if direction == 'LEFT':
        snake_position[0] -= 10
    if direction == 'RIGHT':
        snake_position[0] += 10
 
    # Snake body growing mechanism
    # if fruits and snakes collide then scores
    # will increase by 10
    snake_body.insert(0, list(snake_position))
    if snake_position[0] == fruit_position[0] and snake_position[1] == fruit_position[1]:
        score += 10
        fruit_spawn = False
    else:
        snake_body.pop()
         
    if not fruit_spawn:
        fruit_position = [random.randrange(1, (1000//10)) * 10, 
                          random.randrange(1, (600//10)) * 10]
         
    fruit_spawn = True
    win.fill(white)
    
    for pos in snake_body:
        pygame.draw.rect(win, green, pygame.Rect(pos[0], pos[1], 10, 10))
    pygame.draw.rect(win, white, pygame.Rect(
    fruit_position[0], fruit_position[1], 10, 10))
 
    # Game Over conditions
    if snake_position[0] < 0 or snake_position[0] > 1000-10:
        break
    if snake_position[1] < 0 or snake_position[1] > 600-10:
        break
 
    # Touching the snake body
    for block in snake_body[1:]:
        if snake_position[0] == block[0] and snake_position[1] == block[1]:
            break
    
    # refresh game screen
    pygame.display.update()

    # Frame Per Second /Refresh rate
    fps_controller.tick(snake_speed)

# displaying final score after game over
print(f"You scored {score} in the game.")

Now, let us remember the prompt refinement that we’ve done (We’ve already posted that in our last post as well) –

I want to place the snake within the boundary of the visual wall & it should bounce each wall & then hit the target. And, it should increase the size. And, every time when it hits the target it will increase the score by 10. And, the application will run for 2 minutes, which will be displayed on the screen. This will exit once 2 minute is over.

# filename: simple_snake.py (Generated by AutoGen)

import pygame
import time
import random
import math

pygame.init()

white = (255, 255, 255)
yellow = (255, 255, 102)
green = (0, 255, 0)
red = (255, 0, 0)
black = (0, 0, 0)
blue = (0, 0, 255)

dis_width = 800
dis_height = 600

dis = pygame.display.set_mode((dis_width, dis_height))
pygame.display.set_caption('Snake Game')

clock = pygame.time.Clock()
snake_block = 10
snake_speed = 30
font_style = pygame.font.SysFont(None, 50)
score_font = pygame.font.SysFont(None, 35)

def our_snake(snake_block, snake_List):
    for x in snake_List:
        pygame.draw.rect(dis, green, [x[0], x[1], snake_block, snake_block])

def message(msg,color):
    mesg = font_style.render(msg, True, color)
    dis.blit(mesg, [dis_width / 3, dis_height / 3])

def gameLoop():  # creating a function
    game_over = False
    game_close = False

    # snake starting coordinates
    x1 = dis_width / 2
    y1 = dis_height / 2

    # snake initial movement direction
    x1_change = 0
    y1_change = 0

    # initialize snake length and list of coordinates
    snake_List = []
    Length_of_snake = 1

    # random starting point for the food
    foodx = round(random.randrange(0, dis_width - snake_block) / 10.0) * 10.0
    foody = round(random.randrange(0, dis_height - snake_block) / 10.0) * 10.0

    # initialize score
    score = 0

    # store starting time
    start_time = time.time()

    while not game_over:

        # Remaining time
        elapsed_time = time.time() - start_time
        remaining_time = 120 - elapsed_time  # 2 minutes game
        if remaining_time <= 0:
            game_over = True

        # event handling loop
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                game_over = True  # when closing window
            if event.type == pygame.MOUSEBUTTONUP:
                # get mouse click coordinates
                pos = pygame.mouse.get_pos()

                # calculate new direction vector from snake to click position
                x1_change = pos[0] - x1
                y1_change = pos[1] - y1

                # normalize direction vector
                norm = math.sqrt(x1_change ** 2 + y1_change ** 2)
                if norm != 0:
                    x1_change /= norm
                    y1_change /= norm

                # multiply direction vector by step size
                x1_change *= snake_block
                y1_change *= snake_block

        x1 += x1_change
        y1 += y1_change
        dis.fill(white)
        pygame.draw.rect(dis, red, [foodx, foody, snake_block, snake_block])
        pygame.draw.rect(dis, green, [x1, y1, snake_block, snake_block])
        snake_Head = []
        snake_Head.append(x1)
        snake_Head.append(y1)
        snake_List.append(snake_Head)
        if len(snake_List) > Length_of_snake:
            del snake_List[0]

        our_snake(snake_block, snake_List)

        # Bounces the snake back if it hits the edge
        if x1 < 0 or x1 > dis_width:
            x1_change *= -1
        if y1 < 0 or y1 > dis_height:
            y1_change *= -1

        # Display score
        value = score_font.render("Your Score: " + str(score), True, black)
        dis.blit(value, [0, 0])

        # Display remaining time
        time_value = score_font.render("Remaining Time: " + str(int(remaining_time)), True, blue)
        dis.blit(time_value, [0, 30])

        pygame.display.update()

        # Increase score and length of snake when snake gets the food
        if abs(x1 - foodx) < snake_block and abs(y1 - foody) < snake_block:
            foodx = round(random.randrange(0, dis_width - snake_block) / 10.0) * 10.0
            foody = round(random.randrange(0, dis_height - snake_block) / 10.0) * 10.0
            Length_of_snake += 1
            score += 10

        # Snake movement speed
        clock.tick(snake_speed)

    pygame.quit()
    quit()

gameLoop()

Now, let us understand the difference here –

The first program is a snake game controlled by arrow keys that end if the Snake hits a wall or itself. The second game uses mouse clicks for control, bounces off walls instead of ending, includes a 2-minute timer, and displays the remaining time.

So, we’ve done it. 🙂

You can find the detailed code in the following G ithub link.

I’ll bring some more exciting topics in the coming days from the Python verse.

Till then, Happy Avenging! 🙂

Building solutions using LLM AutoGen in Python – Part 1

Posted on February 29, 2024October 28, 2024 by SatyakiDe in ai, api, Azure, cloud, code, Data Science, design, Model, natural-language, objects, openai, Pandas, Python

Today, I’ll be publishing a series of posts on LLM agents and how they can help you improve your delivery capabilities for various tasks.

Also, we’re providing the demo here –

Isn’t it exciting?

Process Flow:

The application will interact with the AutoGen agents, use underlying Open AI APIs to follow the instructions, generate the steps, and then follow that path to generate the desired code. Finally, it will execute the generated scripts if the first outcome of the demo satisfies users.

CODE:

Let us understand some of the key snippets –

Creating the Assistant Agent:

# Create the assistant agent
assistant = autogen.AssistantAgent(
    name="AI_Assistant",
    llm_config={
        "config_list": config_list,
    }
)

Purpose: This line creates an AI assistant agent named “AI_Assistant”.

Function: It uses a language model configuration provided in config_list to define how the assistant behaves.

Role: The assistant serves as the primary agent who will coordinate with other agents to solve problems.

Creating the User Proxy Agent:

user_proxy = autogen.UserProxyAgent(
    name="Admin",
    system_message=templateVal_1,
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": WORK_DIR,
        "use_docker": False,
    },
)

Purpose: This code creates a user proxy agent named “Admin”.

Function:

System Message: Uses templateVal_1 as its initial message to set the context.
Human Input Mode: Set to "TERMINATE", meaning it will keep interacting until a termination condition is met.
Auto-Reply Limit: Can automatically reply up to 10 times without human intervention.
Termination Condition: A message is considered a termination message if it ends with the word “TERMINATE”.
Code Execution: Configured to execute code in the directory specified by WORK_DIR without using Docker.

Role: Acts as an intermediary between the user and the assistant, handling interactions and managing the conversation flow.

Creating the Engineer Agent:

engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config={
        "config_list": config_list,
    },
    system_message=templateVal_2,
)

Purpose: Creates an assistant agent named “Engineer”.

Function: Uses templateVal_2 as its system message to define its expertise in engineering matters.

Role: Specializes in technical and engineering aspects of the problem.

Creating the Game Designer Agent:

game_designer = autogen.AssistantAgent(
    name="GameDesigner",
    llm_config={
        "config_list": config_list,
    },
    system_message=templateVal_3,
)

Purpose: Creates an assistant agent named “GameDesigner”.

Function: Uses templateVal_3 to set its focus on game design.

Role: Provides insights and solutions related to game design aspects.

Creating the Planner Agent:

planner = autogen.AssistantAgent(
    name="Planer",
    llm_config={
        "config_list": config_list,
    },
    system_message=templateVal_4,
)

Purpose: Creates an assistant agent named “Planer” (likely intended to be “Planner”).

Function: Uses templateVal_4 to define its role in planning.

Role: Responsible for organizing and planning tasks to solve the problem.

Creating the Critic Agent:

critic = autogen.AssistantAgent(
    name="Critic",
    llm_config={
        "config_list": config_list,
    },
    system_message=templateVal_5,
)

Purpose: Creates an assistant agent named “Critic”.

Function: Uses templateVal_5 to set its function as a critic.

Role: Provide feedback, critique solutions, and help improve the overall response.

Setting Up Logging:

logging.basicConfig(level=logging.ERROR)
logger = logging.getLogger(__name__)

Purpose: Configures the logging system.

Function: Sets the logging level to only capture error messages to avoid cluttering the output.

Role: Helps in debugging by capturing and displaying error messages.

Defining the buildAndPlay Method:

def buildAndPlay(self, inputPrompt):
    try:
        user_proxy.initiate_chat(
            assistant,
            message=f"We need to solve the following problem: {inputPrompt}. "
                    "Please coordinate with the admin, engineer, game_designer, planner and critic to provide a comprehensive solution. "
        )

        return 0
    except Exception as e:
        x = str(e)
        print('Error: <<Real-time Translation>>: ', x)

        return 1

Purpose: Defines a method to initiate the problem-solving process.

Function:

Parameters: Takes inputPrompt, which is the problem to be solved.
Action:
- Calls user_proxy.initiate_chat() to start a conversation between the user proxy agent and the assistant agent.
- Sends a message requesting coordination among all agents to provide a comprehensive solution to the problem.
Error Handling: If an exception occurs, it prints an error message and returns 1.

Role: Initiates collaboration among all agents to solve the provided problem.

Summary of the Workflow:

Agents Setup: Multiple agents with specialized roles are created.
Initiating Conversation: The buildAndPlay method starts a conversation, asking agents to collaborate.
Problem Solving: Agents communicate and coordinate to provide a comprehensive solution to the input problem.
Error Handling: The system captures and logs any errors that occur during execution.

We’ll continue to discuss this topic in the u pcoming post.

I’ll bring some more exciting topics in the coming days from the Python verse.

Till then, Happy Avenging! 🙂

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Share this:

Like this:

Direct Communication:

Mediator-Based Communication:

Broadcast Communication:

Hierarchical Communication:

Publish/Subscribe Communication:

Event-Driven Communication:

Share this:

Like this:

Share this:

Like this: