LangDB

Deep Integration: Building Multi-Agent Workflows with CrewAI & LangDB AI Gateway

Mrunmay Shelar — Wed, 30 Jul 2025 11:29:10 GMT

In the world of AI agent development, the challenge isn't just building individual agents—it's creating systems where agents can seamlessly collaborate, access the right tools, and leverage the best models for each task. What if you could orchestrate complex multi-agent workflows with built-in model access, dynamic tool management, and complete observability?

In this post, we'll show you how CrewAI and LangDB work together to create a deeply integrated agent development platform.

TL;DR:

This guide demonstrates the deep integration between CrewAI and the LangDB AI Gateway, showing how to build sophisticated multi-agent workflows with seamless model access, dynamic tool management, and built-in observability. We'll use a report generation system as a practical example to showcase how these technologies work together as a unified platform, giving you access to over 350+ LLMs.

Full Conversation: https://app.langdb.ai/sharing/threads/3becbfed-a1be-ae84-ea3c-4942867a3e22

This team of agents collaborates to deliver comprehensive reports by combining web research for current information with analysis and professional writing. You can see a full trace of the entire workflow execution.

The Code

You can find the complete source code for this project on GitHub:

LangDB Samples: https://github.com/langdb/langdb-samples/tree/main/examples/crewai/report-writing-agent

The Integration: CrewAI + LangDB AI Gateway as a Unified Platform

Our system demonstrates how CrewAI and the LangDB AI Gateway work together as a deeply integrated platform:

CrewAI: Provides the orchestration framework for multi-agent workflows
LangDB AI Gateway: Delivers the AI Gateway capabilities including access to over 350+ models, model management, tool management, and observability
Unified Experience: Both technologies work seamlessly together, appearing as a single development platform

This integration enables you to build sophisticated agent systems without worrying about the underlying infrastructure complexity.

Checkout: https://docs.langdb.ai/guides/building-agents/building-reporting-writing-agent-using-crewai and https://docs.langdb.ai/features/tracing

Seamless Integration Setup

The integration between CrewAI and the LangDB AI Gateway is designed to be as simple as possible. With just a few lines of code, you get access to over 350 models, dynamic tooling, and complete observability.

Initialize the Integration

The first step is to initialize the LangDB integration with CrewAI. This single call enables all the advanced features:

# main.py
from pylangdb.crewai import init
from dotenv import load_dotenv

# Load environment variables and initialize LangDB integration
load_dotenv()
init()  # This enables deep integration with the LangDB AI Gateway

Configure the LangDB AI Gateway

Set up your LangDB AI Gateway credentials to enable model access and tool management:

export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""
export LANGDB_API_BASE_URL='https://api.us-east-1.langdb.ai'

Create LangDB AI Gateway-Enabled LLMs

Define a helper function that creates LLMs with full LangDB AI Gateway integration:

from crewai import LLM
import os

def create_llm(model):
    return LLM(
        model=model,
        api_key=os.environ.get("LANGDB_API_KEY"),
        base_url=os.environ.get("LANGDB_API_BASE_URL"),
        extra_headers={
            "x-project-id": os.environ.get("LANGDB_PROJECT_ID")
        }
    )

Deep Integration Features

Model Selection across 350+ Models

The LangDB AI Gateway's model access capabilities are fully integrated with CrewAI. You can specify any model from over 350 providers, and the LangDB AI Gateway will provide access to it:

# Each agent can use different models seamlessly
@agent
def researcher(self) -> Agent:
    return Agent(
        config=self.agents_config['researcher'],
        llm=create_llm("openai/langdb/reportresearcher_9wzgx5n5") # Virtual Model with tools
    )

@agent
def analyst(self) -> Agent:
    return Agent(
        config=self.agents_config['analyst'],
        llm=create_llm("openai/anthropic/claude-3.7-sonnet") # Direct model access
    )

@agent
def report_writer(self) -> Agent:
    return Agent(
        config=self.agents_config['report_writer'],
        llm=create_llm("openai/gpt-4o") # Another model provider
    )

Dynamic Tool Management

The LangDB AI Gateway's Virtual Models and Virtual MCPs integrate seamlessly with CrewAI agents. Tools are managed centrally in the LangDB AI Gateway but appear natively to your CrewAI agents:

# The researcher agent automatically gets access to web search tools
# through the LangDB Virtual Model, without any additional configuration
@agent
def researcher(self) -> Agent:
    return Agent(
        config=self.agents_config['researcher'],
        llm=create_llm("openai/langdb/reportresearcher_9wzgx5n5") # Tools included automatically
    )

Built-in Observability

Every interaction is automatically traced and observable through the LangDB AI Gateway's integrated tracing system:

# No additional tracing code needed - it's all automatic
def generate_report(topic):
    crew_instance = ReportGenerationCrew()
    result = crew_instance.crew().kickoff()
    return result  # Full trace automatically captured in the LangDB AI Gateway

Advanced Integration Capabilities

Virtual Model Integration

LangDB AI Gateway Virtual Models work seamlessly with CrewAI agents. You can create models with specific capabilities and use them directly:

# This agent automatically gets web search capabilities
# through the LangDB Virtual Model configuration
@agent
def researcher(self) -> Agent:
    return Agent(
        config=self.agents_config['researcher'],
        llm=create_llm("openai/langdb/reportresearcher_9wzgx5n5")
    )

The Virtual Model is configured in the LangDB AI Gateway UI to include:

Base model (e.g., GPT-4.1)
Attached MCP tools (e.g., Tavily Search)
Custom instructions and parameters

MCP Tool Integration

The LangDB AI Gateway's MCP (Model Context Protocol) tools integrate with Virtual Models:

This means:

No Tool Configuration: Tools are managed in the LangDB AI Gateway UI
Dynamic Updates: Change tools without redeploying agents
Automatic Tracing: All tool calls are traced in the LangDB AI Gateway
Cost Tracking: Tool usage costs are tracked automatically

Multi-Provider Model Access

The LangDB AI Gateway's model access capabilities are fully integrated with CrewAI, giving you access to a universe of over 350+ models:

# Each agent can use different models seamlessly
@agent
def researcher(self) -> Agent:
    return Agent(
        config=self.agents_config['researcher'],
        llm=create_llm("openai/langdb/reportresearcher_9wzgx5n5") # OpenAI + tools
    )

@agent
def analyst(self) -> Agent:
    return Agent(
        config=self.agents_config['analyst'],
        llm=create_llm("openai/anthropic/claude-3.7-sonnet") # Anthropic
    )

@agent
def report_writer(self) -> Agent:
    return Agent(
        config=self.agents_config['report_writer'],
        llm=create_llm("openai/gpt-4o") # OpenAI
    )

Integrated Workflow Management

Crew Definition with LangDB AI Gateway Integration

The crew definition remains simple while leveraging all LangDB AI Gateway capabilities:

@crew
def crew(self) -> Crew:
    return Crew(
        agents=[self.researcher(), self.analyst(), self.report_writer()],
        tasks=[self.research_task(), self.analysis_task(), self.report_writing_task()],
        process=Process.sequential
    )

Task Execution with Built-in Observability

Tasks execute with full LangDB AI Gateway integration:

def generate_report(topic):
    crew_instance = ReportGenerationCrew()
    result = crew_instance.crew().kickoff()
    return result  # Full trace automatically available in the LangDB AI Gateway

Integration Benefits

Simplified Development

Single Setup: One initialization call enables all features
No Tool Management: Tools are managed centrally in the LangDB AI Gateway
Automatic Tracing: No additional observability code needed

Dynamic Capabilities

Model Switching: Change models from over 350+ options without code changes
Tool Updates: Add/remove tools through the LangDB AI Gateway UI
Model Flexibility: Mix and match Model seamlessly

Production Ready

Built-in Observability: Complete traces for every execution
Cost Tracking: Automatic cost and usage monitoring
Performance Monitoring: Latency and performance metrics
Error Handling: Integrated error tracking and debugging

Running the Integrated System

Execute the workflow with full LangDB integration:

if __name__ == "__main__":
    generate_report("The Impact of AI on Social Media Marketing in 2024")

The system automatically:

Provides access to appropriate models from over 350+ choices
Manages tool access
Captures complete traces
Tracks costs and performance

Real-World Integration Example

Here's what the integration looks like in practice:

# The Impact of AI on Social Media Marketing in 2024

## Executive Summary

Artificial Intelligence has fundamentally transformed social media marketing in 2024, creating new opportunities and challenges for businesses worldwide. This report examines the current state of AI integration in social media marketing, key trends, and strategic implications for marketers.

## Key Findings

### 1. AI-Powered Content Creation
- **Automated Content Generation**: 73% of marketers now use AI tools for content creation
- **Personalization at Scale**: AI enables hyper-personalized content delivery to specific audience segments
- **Real-time Optimization**: Dynamic content adjustment based on performance metrics

### 2. Advanced Analytics and Insights
- **Predictive Analytics**: AI models forecast campaign performance with 85% accuracy
- **Sentiment Analysis**: Real-time brand sentiment monitoring across platforms
- **Competitive Intelligence**: Automated tracking of competitor strategies and performance

Every step of this report generation was powered by the deep integration between CrewAI and LangDB, with complete observability into the process.

Conclusion

The integration between CrewAI and the LangDB AI Gateway creates a unified platform for building sophisticated multi-agent systems. This deep integration provides:

Seamless Development: Build complex workflows with simple, clean code
Dynamic Capabilities: Change models and tools without redeployment
Built-in Observability: Complete visibility into every aspect of your workflows
Production Ready: Enterprise-grade monitoring and management

This architecture enables rapid development and iteration, allowing you to build truly powerful and intelligent agentic systems for any domain.

Ready to build your own? Start building for free on the LangDB AI Gateway or Explore CrewAI.

Kimi K2 vs Grok 4: Open-Source Challenger vs Premium Powerhouse

Mrunmay Shelar — Thu, 24 Jul 2025 05:59:46 GMT

Choosing the right large-language model (LLM) has moved beyond “GPT-4o or bust.” In 2025, Kimi K2 (Moonshot AI) and Grok 4 (xAI) give developers two very different yet highly capable options: an open-source trillion-parameter Mixture-of-Experts model on one side and a premium, real-time, multi-agent powerhouse on the other. This article walks through their architectures, benchmark results, practical use cases, and how you can access both through LangDB AI gateway.

TL;DR:

In a nutshell, Kimi K2 is an open-source MoE with 1 T parameters and a 128 K token context, self-hostable and priced at just $0.15/$2.50 per million tokens—ideal for high-volume or agentic workflows—while Grok 4 is a proprietary dense model with 1.7 T parameters, a 256 K token window plus live web/X hooks, costing $3/$15 per million tokens and excelling at deep reasoning and real-time data. Benchmarks show Grok leading on live-execution and toughest reasoning tasks, with Kimi matching on static coding and general-knowledge tests at one-tenth the cost. In a real-world LangGraph run, Kimi K2 completed the pipeline in half the time (86 s vs 168 s) at one-tenth the cost ($0.012 vs $0.128).

Architecture

Model	Core design	Params (total / active)	Context window	Stand-out features
Kimi K2	Mixture-of-Experts	1 T / 32 B active	128 K tokens (up to 1 M offline)	MuonClip optimizer, open weights
Grok 4	Dense + RL-tuned; “Heavy” = multi-agent	≈ 1.7 T	256 K via API	Real-time X/Twitter & web search, Colossus-scale training

Kimi K2

Moonshot’s MoE activates just 32 B parameters per token, giving near-GPT-4o performance at far lower compute. The open Apache 2.0 license plus 128 K context makes it attractive for self-hosting and agentic workflows.

Grok 4

xAI trained Grok 4 on 200 K H100 GPUs; the Heavy variant federates multiple Groks that “debate” their answers, boosting deep reasoning. Real-time data hooks mean answers stay current without extra retrieval plumbing.

Benchmarks

Suite	Kimi K2	Grok 4	Notes
SWE-bench Verified	65.8 % (71.6 % w/ parallel)	73 %	Real-world GitHub bug-fixing
LiveCodeBench	53.7 %	79.4 %	Code must compile & run
MMLU	89.5 %	86.6 %	General knowledge
GPQA Diamond	75.1 %	88.4 %	Grad-level physics

Take-away: Grok 4 dominates the hardest reasoning and live-execution tasks; Kimi stays neck-and-neck on static coding and actually wins broad knowledge tests—all while being orders-of-magnitude cheaper.

Use Cases

Scenario	Best fit	Rationale	Self-hostable?
Autonomous agents & CI/CD	Kimi K2	Native sandboxed tool-calling + open plugin ecosystem	✅ Yes
Whole-repo deep debugging	Grok 4 Heavy	256 K context + multi-agent reasoning spots elusive bugs	❌ No
Budget-constrained startups	Kimi K2	$0.15 / $2.50 per M tokens vs $3 / $15 per M tokens; self-host option	✅ Yes
Regulated enterprise, live data	Grok 4	SOC 2/GDPR compliance; real-time search; enterprise support	❌ No

Both models provide correct solutions, but Kimi K2’s open-source nature and lower cost make it more accessible for high-volume or repetitive tasks, while Grok 4’s premium features justify its higher price when you need complex reasoning or real-time data.

Accessibility through LangDB

Both models (alongside Claude 4, Gemini 2.5 Pro, and 300+ others) are available through LangDB’s OpenAI-compatible API.

LangDB is the fastest enterprise AI gateway—fully built in Rust—to secure, govern, and optimize AI traffic across 250+ LLMs via a single OpenAI-compatible API. Key features include:

Unified access to Kimi K2, Grok 4, Claude 4, Gemini 2.5 Pro, and hundreds more
Observability & tracing for every request and agent step
Guardrails to enforce policy and compliance
Cost control without changing your code
Framework-agnostic—works seamlessly with LangChain, LangGraph, and any OpenAI-compatible library

Integrate in minutes and let LangDB handle model management, metrics, and governance so you can focus on building.

Real-World LangGraph Performance

To see these differences in action, we ran the same LangGraph data-extraction pipeline against both models (full traces linked below):

Grok 4: https://app.langdb.ai/sharing/threads/4d25db11-e011-41be-b7bc-c12f7edee2fb

Kimi K2: https://app.langdb.ai/sharing/threads/82403cde-533a-41b5-bf03-92abceb2b018

Model	Cost (USD)	Time Taken (s)
Grok 4	0.128	167.87
Kimi K2	0.012	86.00

See it in action:

LangGraph data-extraction guide → https://docs.langdb.ai/guides/building-agents/building-complex-data-extraction-with-langgraph

Full code examples → https://github.com/langdb/langdb-samples/tree/main/examples/langchain/langchain-data-extraction

On the same LangGraph pipeline, Kimi K2 ran in roughly half the time and at one-tenth the cost of Grok 4. This real-world test underlines the cost-efficiency and speed advantages of an open-source MoE model for typical data-extraction workflows.

However, if your pipeline demands the deepest reasoning chains or the freshest web-hooks, Grok 4’s premium features may still be worth the extra spend and latency. Evaluate your throughput and SLAs to pick the best fit.

Conclusion

AI’s future isn’t one-size-fits-all. Kimi K2 democratizes near-SOTA coding for pennies and full control, while Grok 4 pushes the reasoning ceiling and keeps answers current—at a premium. With LangDB, you can seamlessly plug both into your stack and choose the right model per task, without rewriting your integration. Pick your path, optimize your costs, and get building!

Designing Smart Multi-Agent Workflows with Agno & LangDB

Mrunmay Shelar — Thu, 17 Jul 2025 11:32:22 GMT

Build a multi-agent financial analysis team with LangDB and Agno that can reason, research, and report on complex financial data.

In the world of finance, staying ahead requires more than just data; it requires deep analysis, contextual awareness, and collaborative reasoning. What if you could build a team of AI agents to do this for you? In this post, we'll show you how to build a sophisticated, multi-agent financial analysis team using LangDB and Agno.

TL;DR:

This guide walks you through building a multi-agent workflow using Agno for orchestration and LangDB as the AI Gateway. We'll use a financial analysis team as a practical example to show how you can build sophisticated agent systems that are easy to manage and debug, thanks to LangDB's end-to-end tracing, dynamic tooling, and access to over 350 LLMs.

This team of agents collaborates to deliver in-depth insights on publicly traded companies by combining web research for market sentiment with hard financial data analysis. You can see a full trace of the final agent's execution.

The Code

You can find the complete source code for this project on GitHub:

LangDB Samples: https://github.com/langdb/langdb-samples/tree/main/examples/agno/reasoning-finance-team

The Architecture: A Trio of Financial Experts

Our system is composed of two specialist agents orchestrated by a coordinating team:

Web Search Agent: Gathers the latest news and market sentiment from the internet.
Finance Agent: Equipped with YFinanceTools to fetch and analyze quantitative stock data, including pricing, fundamentals, and analyst recommendations.
Reasoning Finance Team: A coordinator that directs the two agents, synthesizes their findings, and produces a final, comprehensive report.

LangDB provides the backbone for this system. As an AI Gateway, it enables seamless access to over 350 LLMs, simplifies tool integration, and provides full end-to-end tracing and observability into each agent's actions and the team's collaborative process.

Checkout: https://docs.agno.com/models/langdb and https://docs.agno.com/observability/langdb

Enhanced Tracing with `pylangdb.agno.init()`

While you can use LangDB as a provider in Agno directly, calling pylangdb.agno.init() unlocks deeper, end-to-end tracing. This function provides additional metadata and observability by automatically instrumenting the entire Agno framework, giving you complete visibility into your agent's workflows.

# main.py
from pylangdb.agno import init

# Initialize LangDB for enhanced tracing *before* importing any Agno modules.
init()

from agno.agent import Agent
from agno.team import Team
# ... other imports

As an official provider integrated with Agno, LangDB requires you to set up your credentials. You'll need to export your LangDB API key and Project ID as environment variables. You can find these in your LangDB project settings.

export LANGDB_API_KEY=""
export LANGDB_PROJECT_ID=""

Code Walkthrough: Building the Team

Let's look at how the agents and the team are defined.

The Web Search Agent: Decoupled and Dynamic

Instead of hard-coding a search tool, we assign the web_agent a LangDB Virtual Model. This decouples the agent's logic from the specific tools it uses.

web_agent = Agent(
    name="Web Search Agent",
    role="Search the web for the information",
    model=LangDB(id="langdb/search_agent_xmf4v5jk"),
    instructions="Always include sources"
)

This virtual model is configured in the LangDB UI to provide search capabilities, which we'll cover in the next section.

The Finance Agent: The Quantitative Analyst

This agent is equipped with YFinanceTools to access a wide range of financial data. It's powered by a powerful model like Grok-4 and has specific instructions to format its output professionally.

finance_agent = Agent(
    name="Finance AI Agent",
    role="Analyse the given stock",
    model=LangDB(id="xai/grok-4"),
    tools=[YFinanceTools(
        stock_price=True,
        stock_fundamentals=True,
        analyst_recommendations=True,
        company_info=True,
        company_news=True
    )],
    instructions=[
        "Use tables to display stock prices, fundamentals (P/E, Market Cap), and recommendations.",
        "Clearly state the company name and ticker symbol.",
        "Focus on delivering actionable financial insights."
    ]
)

The Coordinating Team: The Orchestrator

The ReasoningFinanceTeam orchestrates the two specialist agents. It operates in coordinate mode, allowing it to delegate tasks, synthesize information, and ensure the final output is a comprehensive report.

reasoning_finance_team = Team(
    name="Reasoning Finance Team",
    mode="coordinate",
    model=LangDB(id="xai/grok-4"),
    members=[web_agent, finance_agent],
    tools=[ReasoningTools(add_instructions=True)],
    instructions=[
        "Collaborate to provide comprehensive financial and investment insights",
        "Consider both fundamental analysis and market sentiment",
        "Present findings in a structured, easy-to-follow format",
    ],
    success_criteria="The team has provided a complete financial analysis with data, visualizations, risk assessment, and actionable investment recommendations."
)

Dynamic Tooling with Virtual Models and Virtual MCPs

To empower the web_agent with live web search capabilities without hard-coding tools, we configure a Virtual Model in LangDB. This model is backed by a Virtual MCP Server that provides the actual search functionality.

Create a Virtual MCP Server: In the LangDB UI, create a new Virtual MCP Server named web-search-mcp that uses the Tavily Search MCP.
Create and Configure the Virtual Model: Create a new virtual model (e.g., search-agent) and attach the web-search-mcp to it.
Use the Virtual Model ID: Copy the ID of your new virtual model and use it in the web_agent definition.

This setup allows you to change the tools and models your agents use on the fly from the LangDB UI, without changing a single line of code.

Running the Team and Observing the Results

To run the team, simply call the print_response method with a detailed prompt:

reasoning_finance_team.print_response(
    """Compare the tech sector giants (AAPL, GOOGL, MSFT) performance:\n
    1. Get financial data for all three companies\n
    2. Analyze recent news affecting the tech sector\n
    3. Calculate comparative metrics and correlations\n
    4. Recommend portfolio allocation weights"""
)

Every execution is captured in LangDB, providing a complete trace of the team's operations. This includes the initial prompt, each agent's contributions, the tools they used, and the final synthesized output. You can explore the full, shareable trace.

Here is a snippet of the final report generated by the agent team:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                   Comprehensive Comparative Analysis of Tech Giants: AAPL, GOOGL, and MSFT                    ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

As the Reasoning Finance Team, we've conducted a thorough analysis of Apple Inc. (AAPL), Alphabet Inc. (GOOGL), and Microsoft Corporation (MSFT) based on the user's request. This includes financial data retrieval, recent news analysis, comparative metrics and correlations, and portfolio allocation recommendations. Our evaluation incorporates fundamental analysis (e.g., valuations, growth), market sentiment (e.g., news and analyst views), quantitative metrics (e.g., betas, correlations), and risk assessments. Data is current as of July 2025.

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

                                             1. Financial Data Overview                                             

Key financial data for each company, sourced from reliable APIs.                                                  

                                                   Stock Prices                                                   

   Metric               AAPL      GOOGL     MSFT                                                                   
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                                               
   Current Price        $210.16   $182.97   $505.62                                                                
   52-Week High         $260.10   $207.05   $508.30                                                                
   52-Week Low          $169.21   $140.53   $344.79                                                                
   50-Day Moving Avg    $203.87   $170.88   $472.41                                                                
   200-Day Moving Avg   $222.55   $173.43   $427.18

And here's a view of the full trace in the LangDB UI, showing how the agents collaborated to produce the report.

Full Observability with LangDB Tracing

The "full observability" promised in our subtitle is delivered through LangDB's detailed tracing capabilities. When you run your Agno team, every action is captured, providing a transparent, hierarchical view of the entire workflow. Here’s what you can see in the trace:

Hierarchical Span View: The trace isn't a flat list of events but a tree of "spans." The top-level span represents the entire team's execution, with child spans for each agent's turn, tool call, and model invocation. This shows the exact flow of control and delegation.
Input/Output for Each Step: For every span, you can inspect the exact inputs and outputs. This means you can see the precise query sent to the Web Search Agent, the articles it returned, the data requested by the Finance Agent, and the final synthesized response from the team. This level of detail is crucial for debugging.
Latency and Performance: Each span is timestamped and includes latency information, allowing you to instantly identify bottlenecks. You can see exactly how long each tool call, model response, or agent deliberation took.
Cost and Token Usage: For every LLM call, the trace displays the number of input and output tokens and the associated cost, giving you full transparency into your operational expenses.
Automatic Metadata: Thanks to pylangdb.agno.init(), traces are automatically enriched with metadata, including agent names, the team name (Reasoning Finance Team), and the models used, making it easy to filter and search for specific traces in the LangDB UI.

This granular, end-to-end visibility is what makes building, debugging, and managing complex agentic workflows with LangDB and Agno so powerful.

You can check out the full conversation with tracing here: https://app.langdb.ai/sharing/threads/630b2ded-15ae-43d9-8a7a-d6dd9d649655

Conclusion

By combining Agno with the LangDB AI Gateway, we've built a financial analysis team that is:

Modular: Each agent has a specific, firewalled responsibility.
Dynamic: We can change models and grant new tools on the fly from the LangDB UI without redeploying our agent.
Observable: We get detailed traces of every interaction, making debugging and performance analysis straightforward.

This architecture allows for rapid development and iteration, enabling you to build truly powerful and intelligent agentic systems for any domain.

Ready to build your own? Start building for free on LangDB or Explore Agno to orchestrate your agent workflows.

Grok 4: Fast Start Guide for Developers

Mrunmay Shelar — Sun, 13 Jul 2025 18:30:00 GMT

Elon Musk’s xAI just dropped Grok 4, its most powerful large-language model yet. With perfect or near-perfect scores on top academic benchmarks, Grok 4 redefines what “general-purpose assistant” means for developers.

Grok 4 is xAI’s flagship LLM, optimized for deep reasoning, long-context understanding, and agentic workflows.

Why Grok 4 matters

Graduate-level reasoning across STEM & humanities
Ultra-long context (256 K tokens) — more than Anthropic Claude 4 Sonnet & Opus (200K), o3 (200K), and DeepSeek R1 0528 (128K), but below Google Gemini 2.5 Pro (1M tokens) — ideal for large-codebases and documents
Multi-agent “Heavy” tier that coordinates five Grok instances for tough problems (adds ~2× accuracy on hard tests)

The benchmark results speak for themselves, with Grok-4 Heavy achieving perfect and near-perfect scores, outperforming its best rivals in several categories.

Benchmark	Grok 4	Grok 4 Heavy	Best rival*
AIME 25 (math)	91.7 %	100 %	88.9 % (OpenAI o3)
HMMT 25 (math)	90.0 %	96.7 %	82.5 % (Gemini 2.5)
GPQA (grad QA)	87.5 %	88.9 %	86.4 % (Gemini 2.5)
Humanity’s Last Exam (HLE)	25.4 %	44.4 %	≈22 % (GPT-4 / Gemini)
ARC-AGI-2 (reasoning)	16.2 %	—	≈8 % (Claude Opus 4)

Capability highlight: Grok 4 Heavy’s multi-agent architecture doubles down on complex problem solving at scale.

Grok 4’s combination of a large context window, multi‑agent “Heavy” tier, and tool integration consistently places it at or near the top across a spectrum of advanced reasoning tasks.

Building with Grok-4: The Developer's Stack

A powerful model like Grok-4 is a fantastic tool, but building reliable, scalable, and observable AI applications requires a robust development stack. This is where frameworks like Agno and observability platforms like LangDB come into play.

Agno: An open-source Python framework for building AI agents. It provides a clean, composable, and "Pythonic" way to structure your agent's logic, tools, and memory. Instead of wrestling with boilerplate code, you can declaratively define what your agent can do.

LangDB: An AI gateway that acts as a unified control panel for over 350+ LLMs. With a single line of code, you can instrument your entire agent workflow for complete observability.

Example: Multi-Agent Financial Reasoning with Grok 4 & LangDB

Here's how you can build a real-world financial analysis team using Agno, with Grok 4 as your core reasoning model and LangDB for observability:

The Web Search Agent below uses a LangDB Virtual Model with Tavily search built-in. No custom search integration or setup needed—just reference your Virtual Model. Learn more about Virtual Models.

import os
from dotenv import load_dotenv

from pylangdb.agno import init
init()

from agno.agent import Agent
from agno.team.team import Team
from agno.tools.yfinance import YFinanceTools
from agno.models.langdb import LangDB

load_dotenv()

# Web Search Agent with Tavily via LangDB Virtual Model
web_agent = Agent(
    name="Web Search Agent",
    role="Search the web for the information",
    model=LangDB(id="langdb/search_agent_xmf4v5jk"),
    instructions="Always include sources"
)

# Finance Agent powered by Grok 4
finance_agent = Agent(
    name="Finance AI Agent",
    role="Analyse the given stock",
    model=LangDB(id="xai/grok-4"),
    tools=[YFinanceTools(
        stock_price=True,
        stock_fundamentals=True,
        analyst_recommendations=True,
        company_info=True,
        company_news=True
    )],
    instructions=[
        "Use tables to display stock prices, fundamentals (P/E, Market Cap), and recommendations.",
        "Clearly state the company name and ticker symbol.",
        "Focus on delivering actionable financial insights."
    ]
)

# Multi-agent team for collaborative financial analysis
reasoning_finance_team = Team(
    name="Reasoning Finance Team",
    mode="coordinate",
    model=LangDB(id="xai/grok-4"),
    members=[web_agent, finance_agent],
    instructions=[
        "Collaborate to provide comprehensive financial and investment insights",
        "Consider both fundamental analysis and market sentiment",
        "Use tables and charts to display data clearly and professionally",
        "Present findings in a structured, easy-to-follow format",
        "Only output the final consolidated analysis, not individual agent responses"
    ],
    markdown=True,
    show_members_responses=True,
    success_criteria="The team has provided a complete financial analysis with data, visualizations, risk assessment, and actionable investment recommendations supported by quantitative analysis and market research."
)

reasoning_finance_team.print_response(
    """Compare the tech sector giants (AAPL, GOOGL, MSFT) performance:\n    1. Get financial data for all three companies\n    2. Analyze recent news affecting the tech sector\n    3. Calculate comparative metrics and correlations\n    4. Recommend portfolio allocation weights"""
)

Observability in Action: What LangDB Adds

With LangDB, every part of your multi-agent workflow becomes transparent and easy to debug:

Visualize each step in your workflow: Instantly see how the prompt flows through every agent and tool. Whether it’s Tavily search, YFinance, or Grok 4 itself, you get a single unified trace.
Pinpoint latency and costs: Track response time and token usage for every call at every layer. No more guesswork. Easily spot bottlenecks and unexpected cost spikes.
Troubleshoot faster: Errors and slowdowns are highlighted with detailed step-by-step spans. You can optimize your pipeline without digging through logs.

Checkout the full conversation: https://app.langdb.ai/sharing/threads/73c91c58-eab7-4c6b-afe1-5ab6324f1ada

Wrap-up

Grok 4 sets a new bar for reasoning, math, and large-context tasks. Paired with Agno for flexible agent design and LangDB as your AI gateway, developers can easily build, debug, and scale high-performance LLM-powered applications. Drop Grok 4 into your own agents or start from the template above, and benefit from full workflow visibility and model management from day one.

Happy building!

Discover End-to-End Tracing on Google ADK with LangDB

Mrunmay Shelar — Thu, 03 Jul 2025 06:43:35 GMT

Before diving into the code, watch this 2-minute video to see a complete demonstration of what we'll be building. You'll learn how to integrate LangDB tracing into the Google ADK Travel Concierge sample with no code chages.

https://youtu.be/5tyDi5xzOUE

In this quick demo you’ll see:

How to install and initialize the pylangdb[adk] package.
The single line of code that enables full observability for every ADK agent and tool.
Running a sample prompt like “Find me flights from JFK to London”.
Inspecting your workflow in the LangDB AI Gateway dashboard, including:
- Threads view for step-by-step conversation logs.
- Traces view for Gantt charts, cost & token breakdowns, and dependency graphs.
Drilling into any agent or tool (like the planning_agent on Claude 3 Sonnet) for full observability.

In this tutorial, we'll walk through the architecture of a sophisticated Travel Concierge agent built with Google's Agent Development Kit (ADK). We'll explore how to leverage the LangDB AI Gateway to use any LLM—from OpenAI, Google, Anthropic, and more—and harness powerful features like Virtual Models and Virtual MCPs (Model Context Protocol) to create a dynamic, observable, and easily maintainable agent system.

Our travel_concierge agent is not just a single agent; it's a hierarchy of specialized sub-agents that handle everything from vacation inspiration to booking and in-trip assistance. Here's a look at the overall architecture:

This project is based on the official Google ADK Travel Concierge sample and has been modified to showcase the integration with the LangDB AI Gateway.

You can find the complete source code for this agent on GitHub: LangDB Samples

The Magic Behind the Curtain: `pylangdb.adk.init()`

First, let's talk about the most important line of code in this integration:

# travel_concierge/agent.py
from pylangdb.adk import init
# Initialize LangDB *before* importing any ADK modules.
init()

This single function call is the key to unlocking the LangDB AI Gateway's observability features. By placing it at the very top of our script, before any google.adk modules are imported, we enable automatic instrumentation for the entire agent framework.

Here’s what init() does automatically:

Discovers Agents: It recursively finds all agent and sub-agent definitions within your project.
Patches Runtimes: It automatically patches the necessary ADK components to emit traces.
Links Sessions: It intelligently links all the interactions—from the root agent's initial processing to the deepest sub-agent and tool calls—into a single, cohesive trace in the LangDB Tracing.

This "zero-instrumentation" approach means you get complete, end-to-end visibility into your agent's complex workflows just by adding that one line of code.

The Architecture: Root Agent and Sub-Agents

Our travel_concierge is a hierarchical agent. At the top is the root_agent, which acts as a smart router or orchestrator. Its job is not to answer queries directly, but to delegate them to a specialized sub-agent.

Here's its actual definition:

# travel_concierge/agent.py
root_agent = Agent(
    model="openai/gpt-4.1",
    name="root_agent",
    description="A Travel Conceirge using the services of multiple sub-agents",
    instruction=prompt.ROOT_AGENT_INSTR,
    sub_agents=[
        inspiration_agent,
        planning_agent,
        # ... and other sub-agents
    ],
    # ...
)

As you can see, it uses a standard model ("openai/gpt-4.1") and has a list of sub_agents. It doesn't have any tools of its own. The real power comes from the sub-agents.

Dynamic Tooling with Virtual Models and Virtual MCPs

A LangDB Virtual Model is a powerful abstraction that decouples your agent's code from its runtime configuration. It acts as a pointer to a configuration that you can manage entirely from the LangDB UI.

This is where the Model Context Protocol (MCP) comes in. MCP is a standard that allows language models to interact with external tools and services in a uniform way. However, managing connections to multiple MCP-enabled tools can be complex.

The LangDB AI Gateway simplifies this with Virtual MCP Servers. A Virtual MCP is a single, managed endpoint that you configure in the UI. It can bundle multiple tools (like Google Maps, Tavily Search, or your own custom APIs), handle their authentication securely, and lock them to specific versions.

You then connect this Virtual MCP to your agent's Virtual Model. This is how you can dynamically grant new capabilities to your agents without changing a single line of code.

Here are all the virtual models for our project, as seen in the LangDB AI Gateway dashboard. You can see the inspiration_agent, google_search_agent, and planning_agent all configured here, ready to be assigned to our agents.

Example: The `inspiration_agent` and Google Maps

Let's look at our inspiration_agent. It needs access to location data to give travel ideas. Instead of hardcoding a Google Maps MCP, we use a Virtual Model.

Here's the agent's definition:

# travel_concierge/sub_agents/inspiration/agent.py
inspiration_agent = Agent(
    model= "langdb/inspiration_agent_z73m3wmd",
    name="inspiration_agent",
    description="A travel inspiration agent...",
    # ...
)

Notice its model is langdb/inspiration_agent_z73m3wmd. In the LangDB AI Gateway UI, we've configured this virtual model to use a Virtual MCP server that has the Google Maps API attached as a tool. Now, when the inspiration_agent is active, it can seamlessly query Google Maps, even though the tool isn't explicitly listed in its code.

Example: Grounding with Google Search

We also have a specialized agent tool for web searches, google_search_grounding.

# travel_concierge/tools/search.py
_search_agent = Agent(
    model= "langdb/google_search_agent_hsz7lf9q",
    name="google_search_grounding",
    description="An agent providing Google-search grounding capability",
    # ... instruction ...
)

google_search_grounding = AgentTool(agent=_search_agent)

Just like our inspiration_agent, the _search_agent uses a virtual model, langdb/google_search_agent_hsz7lf9q. We've attached a Virtual MCP server that provides the Tavily Search tool to this model in LangDB.

Example: The `planning_agent` for Flights and Hotels

Finally, let's look at the planning_agent, which handles the core booking tasks.

# travel_concierge/sub_agents/planning/agent.py
planning_agent = Agent(
    model="langdb/planning_agent_w1l8sygt",
    name="planning_agent",
    description="Helps users with travel planning...",
    # ...
)

This agent's virtual model, langdb/planning_agent_w1l8sygt, is connected to a Virtual MCP that provides an Airbnb search tool. This allows the agent to handle complex booking-related queries by leveraging this external service, all without having the tool logic hardcoded in the agent's definition.

The Flow: From Query to Answer

A user asks the travel_concierge: "What are some good museums to visit in Paris?"
The root_agent receives the query and, based on its instructions, delegates the task to the inspiration_agent.
The inspiration_agent is activated. Its virtual model configuration is loaded from the LangDB AI Gateway.
The agent now knows it has access to the Google Maps tool (via its Virtual MCP).
It uses the tool to find museums in Paris and provides a list to the user.
All of these steps—the delegation, the model calls, the tool usage—are automatically captured as traces in the LangDB AI Gateway, giving us complete observability into our agent's behavior.

You can explore a complete, shareable trace of a conversation with this agent here: https://app.langdb.ai/sharing/threads/8425e068-77de-4f41-8aa9-d1111fc7d2b7

When you open the trace, you'll see a detailed breakdown of the entire workflow. This includes:

A Gantt chart visualizing the sequence and duration of each agent and tool invocation.
Cost and token counts for every LLM call, helping you monitor usage and optimize performance.
Detailed input/output payloads for each step, allowing you to inspect the exact data being passed between components.
A dependency graph showing how agents and tools are interconnected, making it easy to debug complex interactions.

Conclusion

By combining Google ADK with the LangDB AI Gateway's virtual models and MCPs, we've built a travel_concierge agent that is:

Modular: Each sub-agent has a specific responsibility.
Dynamic: We can change models and grant new tools on the fly from the LangDB UI without redeploying our agent.
Observable: We get detailed traces of every interaction, making debugging and performance analysis easy.

This architecture allows for rapid development and iteration, enabling us to build truly powerful and intelligent agentic systems.

Ready to build your own? Check out the LangDB AI Gateway documentation to get started

Empower Your AI Agents with Virtual MCP Servers

Mrunmay Shelar — Fri, 02 May 2025 18:30:00 GMT

Integrating AI agents with external tools such as GitHub, Slack, or ClickHouse often requires custom integration code, careful credential management, and handling differences between multiple APIs. The Model Context Protocol (MCP), initiated by Anthropic, addresses this by providing a standard protocol that lets language models interact with external services uniformly.

However, using MCP across multiple services introduces additional complexity; each tool has its own endpoint, potentially different versions, and varying credential requirements. Ensuring consistent behavior across these tools can become challenging.

Virtual MCP Servers from LangDB streamline this by allowing you to consolidate multiple MCP-based tools into a single interface. You choose only the tools you need, configure their authentication, specify exact versions to avoid unexpected changes, and gain visibility into all interactions through comprehensive logs.

What is an MCP Server?

The Model Context Protocol (MCP) defines a clear client-server architecture to standardize interactions between language models and external services:

Host: LLM applications like Cursor or Claude.
Client: Built-in connectors within the host application communicating with MCP servers.
Server: Service providers that expose tools, resources, or actions.

What are Virtual MCP Servers

Virtual MCP Servers from LangDB enhance MCP usage by providing a customizable, unified access point to external tools tailored specifically for your workflow. They simplify tool management by:

Merging multiple tools (e.g., Slack, GitHub, Airtable) into one endpoint.
Ensuring compatibility with various platforms like Windsurf, Cursor, Claude, Python, and TypeScript.

For instance, if your agent needs to create GitHub issues, send Slack messages, and retrieve Airtable records, traditionally, you would set up and maintain three separate MCP connections—each with its credentials, endpoint, and potential for incompatibility. With a Virtual MCP Server, these tools integrate into a single endpoint with centralized configuration, streamlined authentication, and stable versioning.

Why You Need Virtual MCP Servers

While MCPs standardize tool access, LangDB's Virtual MCP Servers offer a higher-level abstraction to help developers manage these tools in production.

Selective Tool Exposure

With Virtual MCPs, you don’t expose an entire tool suite. You pick the specific tools you need for your use case:

airtable.list_records
github.create_issue
slack.send_message

This keeps your agent’s surface area small and secure, and avoids unnecessary access.

Additionally, some environments like Windsurf enforce hard limits on the number of tools an agent can use (e.g., 50 tools). Virtual MCPs let you curate and trim your toolset down to just what's needed — ensuring compatibility and efficiency without sacrificing capabilities.

Clean Auth Handling

LangDB allows you to:

Add API keys only for selected tools
Use built-in OAuth where supported
Leave other tools stateless

You avoid global tokens and minimize the risk of credential leaks.

Version Lock-in + Poisoning Protection

Breakages happen when upstream tools change. Virtual MCPs prevent this:

Pinned versions ensure consistent behavior
Frozen tool definitions guard against future mutations
Poisoning safety prevents LLMs from ingesting unsafe or malicious schema updates

You control exactly what the model sees and uses.

Why Use a Virtual MCP?

Full Tracing: Every input/output, latency, and error is logged in your LangDB dashboard.

Poisoning Safety: Prevents injection or override by malicious or unverified tool definitions from source MCPs.

Feature	What It Solves
Selective Tools	Use only what’s needed from broader MCPs
Scoped Auth	Modular and safe credential management
Full Tracing	Inputs, outputs, latencies, and failures are logged
Plug-and-Play	Compatible with Cursor, Claude, Windsurf, and more
Version Control	Prevents breaking changes
Poisoning Safety	Blocks tampered or unsafe tool definitions

How to Set Up a Virtual MCP Server

Creating your own Virtual MCP is simple:

Open your LangDB MCP Servers Listing
Choose the Server you want to start with
Click “Deploy MCP Server” and Select the Project.
Select tools tools you need or add more MCP Servers from the list
(Optional) Add API keys or let LangDB manage auth
Click on “Create Virtual MCP Server”
Click Generate MCP URL

Use the URL directly in Cursor, Claude, Windsurf or any MCP compatible clients.

Here is a quick 2 mins video on how to use Virtual MCP Servers on LangDB:

https://youtu.be/P7aIpi9kS28

For usage check out: Virtual MCP Usage Docs

Who Is This For?

Virtual MCP Servers are ideal for:

Cursor / Claude / Windsurf users tired of tool setup
Developers managing complex toolchains
Teams needing secure, scoped, auditable MCP workflows.
Builders of AI agents using Slack, GitHub, Airtable, and more

Final Thoughts

Virtual MCP Servers are designed to solve three core challenges faced in AI tool orchestration:

Scoped Authentication: Add API keys only for the tools you use. Everything else stays stateless and secure.
Tool Safety & Poisoning Protection: Lock in tool versions and guard against malicious or unverified definitions.
Selective Exposure: Curate your toolset by combining just what your workflow needs across multiple MCPs.

This is how you build tool-aware agents that are safe, reliable, and production-ready.

Explore More

📄 LangDB MCP Servers
📄 LangDB Docs: Virtual MCPs
📖 Previous Blog: Smarter Coding Workflows with Context7 + Sequential Thinking
📄 Anthropic's MCP Specification

Smarter Coding Workflows with Context7 + Sequential Thinking

Mrunmay Shelar — Wed, 30 Apr 2025 18:30:00 GMT

Imagine you're building a Next.js 14 app. You want to implement a "New Todo" form using modern features like Server Actions, useFormState, and form validation — but you're not sure how to wire everything together cleanly.

https://youtu.be/90XzHuJETEk

Normally, you'd have to:

Search documentation manually.
Context-switch between browser and editor.
Piece together examples from random blog posts.

Instead, using Context7 and Sequential Thinking inside your LLM-native code editor (like Cursor or Windsurf), you simply ask:

I’m adding a “New Todo” form in this Next.js 14 app.  
Please implement:

1. A `createTodo` server action (with Zod validation) that returns `{ id, title, completed: false }`.
2. A `toggleTodo` server action to flip the `completed` boolean.
3. In `NewTodoForm.tsx`, wire up `useFormState(createTodo)`:
   • Render an `` and “Add Todo” button.
   • Display the todo list with a checkbox next to each.
   • Strikethrough the todo title when `completed === true` and updates when we click.
   • On checkbox change, call `toggleTodo` with optimistic UI updates.
   • Show inline validation errors from Zod.

Break the solution into clear steps — sequentialThinking — and pull fresh docs on Next.js Server Actions, `use server`, and `useFormState` — context7.

What happens behind the scenes:

Context7 injects the latest, version-specific documentation for Next.js Server Actions and React form utilities directly into your prompt.
Sequential Thinking structures the AI's response into clear steps:
- What each server action should do
- How to validate using Zod
- How to wire up the form with useFormState
- Best practices for optimistic UI and error handling
- A complete implementation with ready-to-use code

How to Set It Up

Go to your LangDB Project.
Create a Virtual MCP Server.
Add these two MCPs to it:
- Context7 MCP — injects live documentation.
- Sequential Thinking MCP — enables structured step-by-step reasoning.
Choose the MCP client of your choice and generate secure MCP URL.
After running the command, start using the MCP Server in your LLM-native editor by mentioning use context7 and sequentialThinking in your prompt.

Tracing the Workflow

Every tool call — from fetching documentation to reasoning through logic — is fully traceable in LangDB:

See inputs and outputs.
View each MCP server call (Context7, Sequential Thinking) as a distinct trace event.
Debug, inspect, and optimize tool chains just like you would trace API pipelines.

Why This Workflow Matters

Prevents hallucinations: By pulling live, versioned documentation into your coding context.
Builds cleaner logic: Step-by-step structured reasoning makes complex implementations manageable.
Keeps you focused: No need to context-switch for documentation or architecture planning.

Why LangDB + MCPs Are Needed

Today's LLMs are powerful, but they often hallucinate, miss subtle API changes, or lose track of reasoning across steps. Developers need a system that can:

Inject fresh knowledge dynamically into prompts.
Guide structured thinking, not just code generation.
Track and debug every tool invocation like a real API pipeline.

LangDB's Model Context Protocol (MCP) architecture and full tracing support provide exactly this foundation. With Virtual MCPs, you can stitch together best-in-class tools like Context7 and Sequential Thinking.

Try It Out Yourself

Want to see this in action?

👉 Demo Repo: nextjs-server-actions-demo
🔌 MCPs to Add:
- Context7 MCP
- Sequential Thinking MCP

Once installed in Cursor, Claude, or Windsurf, paste the prompt, and let your AI editor reason, implement, and patch your repo with fully traceable steps.

March Product Update

Mrunmay Shelar — Fri, 04 Apr 2025 11:31:37 GMT

In March, we focused on making production-grade LLM workflows faster, smarter, and easier to manage. We rolled out major updates to the LangDB AI Gateway: a redesigned tracing view with span-level insights, robust cost controls for scaling safely, and a smarter models interface for better deployment decisions. We also expanded our model catalog with releases like Gemini 2.5 Pro and Claude 3.7 Sonnet, and added native integrations with ClickHouse, Vercel AI SDK, and Langroid. On the community front, we connected with developers at events in Delhi and Mumbai.

🚀 LangDB AI Gateway

Tracing View Revamp

The tracing view has been fully redesigned to give you more visibility into what's happening under the hood:

Dependency Graph: Visually map the flow of your entire request, including nested calls, tool usage, and internal routing logic
Timeline View: See every span and model call across time with accurate duration and sequence
Details Panel: Click any span to explore headers, messages, tool definitions, usage & cost, outputs, and raw JSON—all in one place
Token, Cost, Duration Overview: Get high-level metrics across the top of every trace session

This makes it dramatically easier to debug multi-step agent flows, identify bottlenecks, and understand how your models are behaving in production.

Cost Control UI Enhancements

Budget management is now easier than ever with LangDB's upgraded cost control experience:

Project-Level Controls: Set daily, monthly, or total cost limits per project
Per-User & Group Limits: Restrict usage for individual users or teams (e.g., devs group) to prevent surprise bills.
Real-Time Visibility: Instantly view how much has been used, what remains, and what limits are active
No-Code Setup: Everything can be configured from the UI.

This gives teams peace of mind while scaling, and keeps infra costs transparent and under control.

Redesigned Models Section

Models Section now includes:

Filters by provider, type, input/output format, and capabilities
Visible pricing, context length, and input types
Searchable, sortable layout with inline model details

Whether you're deploying high-efficiency models like o3-mini or multi-modal powerhouses like gemini-2.5-pro, this updated section helps you pick the best tool for the job with confidence.

AI Gateway v0.2.1

Our gateway has been updated to v0.2.1 with:

Improved tracing integration
Enhanced config validation
Better support for routing across private and public LLMs

🔗 Explore on GitHub

7-Day Professional Tier Trial

You can now activate a free 7-day trial of LangDB's Professional Tier from the dashboard. Get access to full tracing, usage visibility, routing controls, and more—no commitment required.

🤖 New Models

We've expanded LangDB's model lineup to include:

Gemma 3 variants for lightweight tasks
Gemini 2.5 Pro for advanced multi-modal reasoning
Claude 3.7 Sonnet with high accuracy and long context support

All models are available instantly via LangDB's unified API

🔗 Explore All Models

🔌 Integrations

New Integrations

ClickHouse: LangDB uses ClickHouse as its data store for traces, metrics, and analytics. Native UDF support lets you run LLM calls directly from SQL. Learn about ClickHouse UDFs
Vercel AI SDK: LangDB is now available as a provider via the Vercel AI SDK. Try it on Vercel AI SDK
Langroid: Use LangDB with Langroid to build multi-agent applications with routing, memory, and modular agent logic. View examples
OpenAI Agents SDK Support: Learn how to integrate Gemini, Claude, and DeepSeek into the Agents SDK by OpenAI. Read the article

👥 Community

Delhi Meetup with ClickHouse

We co-hosted a developer meetup in Delhi alongside ClickHouse to showcase how LangDB powers scalable AI infrastructure using real-time trace analytics.

ClickHouse featured LangDB co-founder Matteo Pelati for his contributions to AI infra. Matteo presented LangDB at the ClickHouse Singapore meetup, highlighting how LangDB enables secure, observable LLM deployments while leveraging ClickHouse for storage and analytics.

🔗 See the post

Global AI Bootcamp (Mumbai)

We participated in the Global AI Bootcamp in Mumbai and hosted a talk titled "Introduction to AI Gateway", where we covered how LangDB helps teams manage multi-model routing, observability, and real-time debugging with minimal config.

🔗 See the post

Stay Connected

Join our community to stay updated on the latest LangDB features, best practices, and AI infrastructure insights.

Join us on Slack

Integrate Gemini, Claude, DeepSeek into Agents SDK by OpenAI

Mrunmay Shelar — Mon, 31 Mar 2025 15:00:54 GMT

OpenAI recently introduced the Agents SDK, a lightweight, Python-first toolkit for building agentic AI apps. It’s built around three primitives:

Agents: LLMs paired with tools and instructions to complete tasks autonomously.
Handoffs: Let agents delegate tasks to other agents.
Guardrails: Validate inputs/outputs to keep workflows safe and reliable.

TLDR;

OpenAI’s Agents SDK is great for building tool-using agents with handoffs. But what if you want to use Gemini's latest reasoning model or take advantage of DeepSeek’s 90 percent lower cost?

With LangDB, you can run the same agent logic across more than 350 models, including Claude, Gemini, DeepSeek, and Grok, without changing your code. It works out of the box with frameworks like CrewAI and LangChain, and provides full trace visibility into every agent, model, and tool call.

Building a Multi-Agent Customer Support System

Let’s create a production-grade AI agent system using the OpenAI Agents SDK, with LangDB providing multi-model flexibility and runtime control.

We’ll build four specialized agents, each powered by a different model to demonstrate LangDB's multi-model capabilities:

OrderAgent (claude-3.7-sonnet): checks order status
FAQAgent (gemini-2.5-pro): answers common customer questions
ComplaintAgent (grok-2): handles complaints with empathy
ReservationAgent (gpt-4o): manages table bookings

Then, we use a Classifier Agent to route user queries to the appropriate sub-agent using built-in handoff.

Overview

Each leaf agent is powered by a tool — a simple Python function decorated with @function_tool.

Example: Tool for Order Status

@function_tool
def check_order_status(order_id: str):
    order_statuses = {
        "12345": "Being prepared.",
        "67890": "Dispatched.",
        "11121": "Still processing."
    }
    return order_statuses.get(order_id, "Order ID not found.")

Defining the agent:

order_agent = Agent(
    name="OrderAgent",
    model="anthropic/claude-3.7-sonnet",
    instructions="Help customers with their order status.",
    tools=[check_order_status]
)

Each of the other agents follows the same structure, varying only in tools, instructions, and model selection.

Routing User Queries with Handoffs

classifier_agent = Agent(
    name="User Interface Agent",
    model="openai/gpt-4o-mini",
    instructions="You are a restaurant customer support agent. Handoff to the appropriate agent based on the user query.",
    handoffs=[order_agent,faq_agent,complaint_agent,reservation_agent]
)

This agent functions as a controller, deciding which specialized agent should handle the user's request.

Running Multi-Model Agents with LangDB

To run the agents through LangDB and switch between different providers, first configure the OpenAI-compatible client like this:

from agents import Agent, set_default_openai_client
from openai import AsyncOpenAI
from uuid import uuid4

client = AsyncOpenAI(
    api_key="langdbAPIKey",
    base_url=f"https://api.us-east-1.langdb.ai/{langDBProjectID}/v1",
    default_headers={"x-thread-id": str(uuid4()), "x-run-id": str(uuid4())})
set_default_openai_client(client, use_for_tracing=False)

LangDB allows you to test your agents using models such as GPT, Claude, Gemini, or Grok, while keeping the agent logic unchanged.

To switch models, simply change the configuration:

faq_agent = Agent(
    name="FAQAgent",
    model="gemini/gemini-2.5-pro-exp-03-25", # or claude-3.7-sonnet, gpt-4o,
    instructions="Answer common customer questions about hours, menu, and location."
    ...)

This approach enables you to evaluate multiple providers and optimize for quality, cost, or latency.

Observing the Agents Execution

LangDB provides a unified trace view that helps developers inspect agent flows across different models. Each span in the trace shows the model used, any tools called, and the time taken at each step.

In the example above, you can see:

Multiple agents running on different models in a single request
Visual breakdown of agent handoffs and tool calls
Timeline and cost details for each model involved

This trace graph shows a single user query routed through multiple agents, each using a different model. It captures the classifier decision, tool calls, and model usage in one place, making it easy to inspect and debug the full interaction flow.

The visibility helps in debugging behavior, verifying tool usage, and understanding model performance across complex workflows.

Why this matters?

In real-world applications, different models can excel depending on the type of task or interaction style required:

Some are better suited for fast, low-latency tasks
Others handle nuanced, empathetic, or creative responses well
Certain models are optimized for summarization or structured formatting
Others provide strong performance for general-purpose conversations

LangDB lets you assign the most suitable model to each agent, giving you task-specific control while maintaining a unified development experience.

Conclusion

The OpenAI Agents SDK provides a clean way to define agent workflows. Paired with LangDB, it becomes possible to run the same agent setup across multiple model providers without changing your application code.

LangDB gives you visibility into agent execution through trace views and lets you switch between over 350 supported models using a consistent interface. This makes it easier to compare performance, debug behavior, and adapt to evolving requirements.

Try It Yourself

To explore this setup, check out the sample project in the repository. It contains the necessary code to run agents locally, modify model configurations, and observe how requests are routed across different models using LangDB.

Visit LangDB and signup to get 10$ credit.

View the sample code: GitHub Repository.

References

Update: Cost Control UI, Models Section & Open Source v0.2.1

Mrunmay Shelar — Fri, 21 Mar 2025 12:12:20 GMT

We've made some improvements to the platform:

Cost Control UI Enhancements – Managing budgets is now more intuitive. You can set limits for projects, users, and groups with a simple click and direct input. The UI now also provides clear usage breakdowns and remaining budget at a glance. Learn more in the docs.
Models Section Improvement – Easily navigate through models with a redesigned layout. The left-side filters allow you to refine searches by provider, model type, input/output types, capabilities, and more. Explore the models section.
7-Day Trial for Professional Tier – Get a hands-on experience with advanced features before committing.
New Models Added – The model roster now includes Gemma 3 variants and additional models, providing more options for different use cases.

Also, Open Source v0.2.1 is out! Check it here → ai-gateway

Check out the updated UI and feel free to share your feedback!

AI Without Downtime

Mrunmay Shelar — Thu, 06 Mar 2025 07:04:45 GMT

Recently, the DeepSeek-R1 model gained massive popularity, leading to an overwhelming surge in traffic. As demand skyrocketed, the model began experiencing slowdowns and even outages, making it unreliable for production applications.

LangDB’s Fallback Routing ensures uninterrupted AI service by automatically switching to a backup model when the primary model struggles with high traffic. Instead of facing downtime, applications leveraging DeepSeek-Reasoner could seamlessly reroute requests to alternative models like GPT-4o or even other providers of DeepSeek-R1, ensuring smooth operations even during peak demand.

Why Fallback Matters

AI reliability is crucial, but high traffic, model failures, or server outages can cause disruptions. Instead of leaving applications vulnerable, LangDB’s Fallback Routing ensures continuity by rerouting requests to a backup model in real time.

How it Works

When a request is sent to LangDB’s routing system, it first attempts to process it using the preferred model. If that model is down, experiencing delays, or overloaded, the system seamlessly reroutes the request to a predefined backup model. This prevents downtime, reduces latency issues, and improves reliability.

Setting Up Fallback Routing in LangDB

LangDB provides an easy way to configure Fallback Routing via the UI or API.

Here’s how you can use the UI to set it up:

Here’s how you can set it up programmatically:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-router",
        "type": "fallback",
        "targets": [
            { "model": "deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

This configuration ensures that if DeepSeek-Reasoner is overloaded or unavailable, requests automatically switch to GPT-4o, maintaining uninterrupted services.

Fallback router with Percentage

In the previous example, we implemented a simple fallback mechanism. However, a more robust solution would be to distribute queries across multiple providers of DeepSeek-R1 while maintaining a fallback to GPT-4o if both providers fail. This method helps balance traffic efficiently while ensuring uninterrupted AI services.

Here’s how you can configure Fallback Routing with Percentage-Based Load Balancing:

{
    "model": "router/dynamic",
    "router": {
        "name": "fallback-percentage-router",
        "type": "fallback",
        "targets": [
            {
                "model": "router/dynamic",
                "router": {
                    "name": "percentage-balanced",
                    "type": "percentage",
                    "model_a": [
                        { "model": "fireworksai/deepseek-r1", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ],
                    "model_b": [
                        { "model": "deepseek/deepseek-reasoner", "temperature": 0.7, "max_tokens": 400 },
                        0.5
                    ]
                }
            },
            { "model": "gpt-4o", "temperature": 0.8, "max_tokens": 500 }
        ]
    }
}

How This Works:

Primary Route: The system distributes requests evenly (50-50%) between two providers of DeepSeek-R1 to balance the load.
Fallback Route: If both DeepSeek-R1 providers are unavailable or fail, all requests are automatically rerouted to GPT-4o, ensuring continuous service.

This approach provides load balancing, and reliable failover protection, making it ideal for AI applications facing high demand and occasional model unavailability.

In more complex scenarios, you can configure a multi-level fallback system with percentage-based distribution. This approach allows requests to be routed dynamically based on pricing, performance, or reliability, ensuring efficiency while preventing downtime. Checkout Routing Strategies for more details about this.

By leveraging dynamic routing, you can:

Prevent downtime by automatically switching to backup models.
Optimize performance and cost with smart load balancing.
Ensure scalability without manual intervention.

With LangDB’s flexible and powerful routing capabilities, you can build AI applications that are not only intelligent but also robust and fail-safe.

Get Started Today

Ready to implement fallback routing in your AI stack? Check out the LangDB Routing Docs and deploy your AI applications on LangDB to ensure reliability, scalability, and seamless failover.

Empowering LLMs with Real-Time Search

Mrunmay Shelar — Fri, 28 Feb 2025 08:25:42 GMT

Traditional LLMs rely on static training data, making them prone to outdated responses, hallucinations, and missing critical context. LangDB's built-in Search tool solves this by fetching real-time data, improving accuracy and contextual relevance.

The Challenge: Stale or Incomplete Knowledge

Static Corpus: Most LLMs are trained on large datasets, but that training is typically a snapshot in time. Once trained, the model doesn’t automatically update its knowledge.
Inaccurate or Outdated Information: Without a method to query current data, an LLM may provide answers that were correct at the time of training but are no longer valid.
Limited Context: Even if the model has relevant data, it might not surface the best context without a guided search mechanism.

Introducing LangDB Search Tool

LangDB’s built-in Search tool addresses these challenges by allowing real-time querying of databases, documents, or external sources:

On-Demand Queries: Instead of relying solely on the LLM’s training data, the Search tool can fetch the latest information at query time.
Integrated with LangDB: The search functionality is seamlessly woven into the LangDB, ensuring that developers can use it without additional overhead.
API-Ready: LangDB’s search tool can be accessed via API too.

Search vs No-Search

Below is a side-by-side comparison of using LangDB’s search tool versus relying on static model knowledge. The left image shows results with search enabled, pulling real-time, up-to-date information. The right image shows the same query without search, leading to more generic and potentially outdated responses.

Feature	No Search	With LangDB Search
Data Freshness	Static, based on training corpus	Dynamic, fetches real-time information
Accuracy	Prone to outdated or incorrect responses	Pulls from latest sources, improving reliability
Context Depth	Limited by internal model memory	Integrates external sources for better insights
Hallucination Risk	Higher	Lower, as responses are backed by retrieved data

Using Search through API

LangDB’s search tool can be easily integrated via API to fetch real-time data. Below is an example of how to make a simple API call to retrieve live information.

Enable real-time search with LLMs in LangDB with a simple API call:

curl 'https://api.us-east-1.langdb.ai/{LangDB_ProjectID}/v1/chat/completions' \
-H 'authorization: Bearer LangDBAPIKey' \
-H 'Content-Type: application/json' \
-d '{
  "model": "openai/gpt-4o-mini",
  "mcp_servers": [{ "name": "websearch", "type": "in-memory"}],
  "messages": [
    {
      "role": "user",
      "content": "what are the latest models by anthropic and what are its standout features?"
    }
  ]
}'

This allows the LLM to enhance responses with live data, ensuring greater accuracy and relevance.

Conclusion

LangDB’s built-in Search tool eliminates the limitations of static LLMs by integrating real-time web search, ensuring your AI retrieves the most relevant, up-to-date, and accurate information. Whether you're building chatbots, research tools, or automation systems, dynamic search enhances responses with verifiable data, reducing hallucinations and improving decision-making.

With seamless API integration and chat playground support, LangDB makes it easy to query, validate, and optimize your LLM calls.

🔹 Want to build smarter LLM-powered applications? Start using LangDB Search tool today and take your AI to the next level.

AI-Powered Banking Chatbot: Build with LangChain, LangDB.ai & RAG (Part 2)

Dishant Gandhi — Wed, 26 Feb 2025 08:44:18 GMT

In Part 1 of our series, we built a LangChain-powered conversational AI for banking FAQs using LangDB AI Gateway. Now, in Part 2, we'll integrate ChromaDB for Retrieval-Augmented Generation (RAG), enhancing the chatbot's ability to provide precise answers based on uploaded documents.

🚀 What We'll Cover:

Understanding the RAG Pipeline.
Setting up ChromaDB for vector storage.
Embedding and storing documents.
Retrieving context from documents for accurate responses.
Querying the chatbot for contextually rich answers.

💡

Find all source code and starter pack by clicking here.

💡

Don’t forget to star us⭐

Alternatively you can also follow our YouTube tutorial

https://youtu.be/jTZ00qz-O3A

🤖 What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a powerful approach that enhances the capabilities of large language models (LLMs) by providing them with external knowledge. Instead of relying solely on pre-trained knowledge, RAG retrieves relevant documents from a vector database and uses them as context to generate accurate and informed responses.

Key Components of RAG:

Retriever: Searches for relevant documents based on user queries.
Generator: Generates a response using both the retrieved context and the LLM's internal knowledge.
Memory: Retains conversation history for continuity.

💡

This approach ensures that the chatbot can answer user queries with up-to-date, domain-specific information, such as banking FAQs or interest rates.

What Are Vector Databases?

A vector database stores information as vector embeddings—numerical representations of text, images, or other data. These embeddings allow for efficient similarity searches, enabling the retriever to find the most relevant documents based on user queries.

Why Use a Vector Database like ChromaDB?

Fast Retrieval: Quickly finds relevant information, even in large datasets.
Contextual Matching: Retrieves documents based on semantic meaning, not just keywords.
Efficient Storage: Compact and scalable storage of embeddings.

💡

In our project, we'll use ChromaDB to store and retrieve banking-related documents.

How Does Embedding Work?

Embedding converts text into high-dimensional vectors that capture semantic meaning. For example, the phrases "home loan interest rate" and "mortgage rate" would have similar vector representations, enabling efficient retrieval.

Embedding Process:

Text Input: Extracted from uploaded documents (e.g., PDF FAQs).
Vectorization: Text is converted into embeddings using a model like sentence-transformers.
Storage: Embeddings are stored in ChromaDB for future retrieval.

💡

This process ensures that the chatbot can search and find relevant information based on user queries.

Setting Up ChromaDB

First, ensure ChromaDB and PyPDF is installed:

pip install chromadb pypdf

Import necessary modules and initialize ChromaDB:

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

CHROMA_DB_DIR = "chroma"

# Initialize ChromaDB and Embeddings
def initialize_chromadb():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})
    vector_store = Chroma(persist_directory=CHROMA_DB_DIR, embedding_function=embeddings)
    return vector_store

# Initialize ChromaDB
vector_db = initialize_chromadb()

What This Does:

Embeddings: Converts text into searchable vectors.
Vector Store: Stores these vectors for retrieval.
Persistence: Ensures data isn't lost after restarting the app.

Uploading and Processing PDF

Let's allow users to upload a PDF containing interest rates and banking FAQs.

st.sidebar.title("Options")
uploaded_file = st.sidebar.file_uploader("Upload PDF", type="pdf")

File Upload: Users upload PDFs via Streamlit's sidebar.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import tempfile

def process_pdf(file):
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_file_path = os.path.join(temp_dir, file.name)
        with open(temp_file_path, "wb") as temp_file:
            temp_file.write(file.getbuffer())

        pdf_loader = PyPDFLoader(temp_file_path)
        documents = pdf_loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80, length_function=len)
        chunks = text_splitter.split_documents(documents)

        return chunks

Explanation:

Text Splitting: Splits documents into 800-character chunks (with 80-character overlap).
PDF Loading: PyPDFLoader extracts content from the uploaded PDF.

Storing Document in ChromaDB

Once the document is processed, we'll convert it into vector embeddings and store it in ChromaDB.

if uploaded_file:
    user_vector_store_dir = CHROMA_DB_DIR
    user_chunks = process_pdf(uploaded_file)
    vector_db.add_documents(user_chunks)
    st.sidebar.success(f"Processed {len(user_chunks)} chunks from uploaded PDF.")

Document Conversion: Converts the text into LangChain document format.
Vector Storage: Stores the document as vector embeddings in ChromaDB.

Querying ChromaDB for Contextual Answers

Now, let's update the chatbot to search the vector store for relevant context when a user asks a question.

if send_button:
    user_input = st.session_state.user_input.strip()  # Ensure the input is not empty or just whitespace    
    if user_input:
        context = ""
        # Retrieve relevant context from ChromaDB
        try:
            search_results = vector_db.similarity_search(user_input, k=3)
            for result in search_results:
                context += result.page_content + "\n\n"
        except Exception as e:
            st.error(f"Error retrieving context from ChromaDB: {e}")

What This Does:

Similarity Search: Retrieves the top 3 relevant document chunks from ChromaDB.
Contextual Response: Uses retrieved content to generate a precise, context-aware answer.

Now, we will be able to chat with our Banking assistant freely

Whole code snippet with RAG

import os
import tempfile
from os import getenv

import streamlit as st
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
import requests
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Constants
PROMPT_TEMPLATE = """
You are a banking assistant specializing in answering FAQs about loans, interest rates, and general banking services.
If the user greets, respond with a greeting. If the user asks a question, provide an answer.
Use the following context too for answering questions:

{context}

Conversation History: 
{history}

---


Answer the question based on the above context: {query}

"""

CHROMA_DB_DIR = "chroma"
LANGDB_API_URL = "https://api.us-east-1.langdb.ai/your-project-id/v1"  # Replace with your LANGDB project id
os.environ["LANGDB_API_KEY"] = "your-api-key"

st.set_page_config(page_title="Banking Assistant", layout="wide")
st.title("Banking FAQ Assistant")
st.write("Ask questions about banking services, loan options, and interest rates!")

# Initialize ChromaDB and Embeddings
def initialize_chromadb():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})
    vector_store = Chroma(persist_directory=CHROMA_DB_DIR, embedding_function=embeddings)
    return vector_store

# Initialize ChromaDB and LangChain LLM
vector_db = initialize_chromadb()
# Initialize LangChain LLM
llm = ChatOpenAI(
    base_url=LANGDB_API_URL,
    api_key=getenv("LANGDB_API_KEY"),
    model="gpt-3.5-turbo",  # Replace with the specific model name you are using
    timeout=10  # Add a timeout of 10 seconds
)

# Memory for conversation history
memory = ConversationBufferMemory(
    memory_key="history",
    return_messages=True,
    input_key="query",
)

# Prompt Template for LangChain
prompt_template = PromptTemplate(
    input_variables=["context", "history", "query"],
    template=PROMPT_TEMPLATE
)

# LangChain LLM Chain
chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)

st.sidebar.title("Options")
uploaded_file = st.sidebar.file_uploader("Upload PDF", type="pdf")

def process_pdf(file):
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_file_path = os.path.join(temp_dir, file.name)
        with open(temp_file_path, "wb") as temp_file:
            temp_file.write(file.getbuffer())

        pdf_loader = PyPDFLoader(temp_file_path)
        documents = pdf_loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80, length_function=len)
        chunks = text_splitter.split_documents(documents)

        return chunks

if uploaded_file:
    user_vector_store_dir = CHROMA_DB_DIR
    user_chunks = process_pdf(uploaded_file)
    vector_db.add_documents(user_chunks)
    st.sidebar.success(f"Processed {len(user_chunks)} chunks from uploaded PDF.")

# Chatbox implementation
st.subheader("Chatbox")

# Container for chat messages
chat_container = st.container()

# Function to display chat messages
def display_message(message, is_user=True):
    if is_user:
        chat_container.markdown(f"{message}
", unsafe_allow_html=True)
    else:
        chat_container.markdown(f"{message}
", unsafe_allow_html=True)

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
with chat_container:
    for chat in st.session_state.messages:
        display_message(chat['content'], is_user=chat['is_user'])

# User Input Section
user_input = st.text_input("Enter your query:", key="user_input")
send_button = st.button("Send")

if send_button:
    user_input = st.session_state.user_input.strip()  # Ensure the input is not empty or just whitespace
    if user_input:
        context = ""
        # Retrieve relevant context from ChromaDB
        try:
            search_results = vector_db.similarity_search(user_input, k=3)
            for result in search_results:
                context += result.page_content + "\n\n"
        except Exception as e:
            st.error(f"Error retrieving context from ChromaDB: {e}")
        try:
            response = chain.run(context=context, query=user_input)
            # Update conversation memory
            st.session_state.messages.append({"role": "user", "content": user_input, "is_user":True})
            st.session_state.messages.append({"role": "assistant", "content": response, "is_user":False})
            st.rerun()
        except requests.exceptions.Timeout:
            st.error("The request to the LLM timed out. Please try again.")
        except Exception as e:
            st.error(f"Error generating response: {e}")
    else:
        st.warning("Please enter a valid query.")

Final Thoughts: Smarter Banking FAQ Chatbot with RAG

With ChromaDB integrated, our chatbot can now answer questions based on uploaded documents, ensuring accurate, contextually relevant responses. This powerful RAG pipeline makes the chatbot adaptable for real-world banking use cases.

💡 Key Takeaways:

Enhanced Accuracy: Queries are answered based on real-time context from documents.
Efficient Retrieval: ChromaDB ensures fast and relevant search results.
Seamless User Experience: Users receive precise answers without delays.

💡

Bonus: Get started with the complete source code and experiment with LangChain’s advanced features!

💡

Don’t forget to star our GitHub repo!

AI-Powered Banking Chatbot: Build with LangChain, LangDB.ai & RAG (Part 1)

Dishant Gandhi — Wed, 19 Feb 2025 13:32:19 GMT

In the fast-paced world of AI innovation, crafting a Chat Assistant isn’t just about coding - it’s about engineering an intelligent ecosystem that delivers dynamic responses, integrates seamlessly with vector databases, and maintains conversational memory for enhanced user interactions. Today, we’re diving into how you can build a LangChain-powered RAG (Retrieval-Augmented Generation) Conversational AI using Streamlit, LangChain, LangDB, and ChromaDB.

This is just the beginning! Stay tuned for this two-part series, where we will guide you step-by-step in building a robust AI-powered Chat Assistant. And yes—you get a free starter pack with all source code ready to go!

What’s on Our Agenda?

Here’s a sneak peek at what we’ll cover today:

Installation and Setup: Get your environment ready with the necessary dependencies.
Building a Simple Chatbot: Integrate LangChain, LangDB.ai, and ChromaDB for intelligent responses.
Adding Memory & Enhancing User Experience: Implement conversation history for a more natural flow.
Deploying with Streamlit: Run your chatbot with an intuitive UI.

💡

Find all source code and starter pack by clicking here.

💡

Don’t forget to star us⭐

Alternatively you can also follow our Youtube tutorial

https://youtu.be/Q-TnQ23RgMU

What’s the Theme of Our AI?

To keep things practical, we are building a Banking FAQ Assistant chatbot that answers user queries about loan options, interest rates, and general banking FAQs.

🏦 Conversational AI Theme: Banking FAQ & Loan Inquiry Bot

Capabilities:

Answer frequently asked questions about banking services
Provide details on various loan types and interest rates
Retain conversational memory for personalized banking guidance

Step-by-Step Guide to Get Started

Installation and Setup

Before we dive in, let’s set up our development environment.

Install Dependencies

Ensure you have Python installed, then proceed with the following:

pip install streamlit langchain openai langchain-community requests

Building the LangChain Conversational AI

Setting Up the Core Components

import streamlit as st
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
import requests

Here, we import the necessary modules:

Streamlit for the assistant’s UI.
LangChain components (LLMChain, PromptTemplate) to manage the AI model and prompts.
OpenAI as our Large Language Model SDK to use LangDB.ai AI Gateway.
ConversationBufferMemory for maintaining chat history.

Load Environment Variables

os.environ["LANGDB_API_KEY"] = "your-langdb-api-key"

Make sure to update your project-id
Since we are using LangDB, we have to add LANGDB_API_KEY to LangDB’s API key instead of OPENAI_API_KEY

Creating the Prompt Template

PROMPT_TEMPLATE = """
You are a banking assistant specializing in answering FAQs about loans, interest rates, and general banking services.
If the user greets, respond with a greeting. If the user asks a question, provide an answer.
Use the following context too for answering questions:

{context}

Conversation History: 
{history}

---

Answer the question based on the above context: {query}
"""

The Prompt Template provides structure to the assistant’s responses:

It greets users when necessary.
It uses contextual memory to fetch relevant banking information.
It provides structured responses based on the query and available context.

Initializing the Core AI Components

LANGDB_API_URL = "https://api.us-east-1.langdb.ai/your-project-id/v1"

llm = ChatOpenAI(
    base_url=LANGDB_API_URL,
    api_key=getenv("LANGDB_API_KEY"),
    model="gpt-4o-mini",  # Replace with the specific model name you are using
    timeout=10  # Add a timeout of 10 seconds
)

memory = ConversationBufferMemory(
    memory_key="history", 
    return_messages=True,
    input_key="query",
)

Here’s what each component does:

OpenAI (llm): Our AI Gateway SDK to use LangDB.ai Models.
Memory (ConversationBufferMemory): Retains chat history for continuity.

Replace your-project-id with your project id, Follow below Gif to fetch your project-id

Building the Chatbot Chain

prompt_template = PromptTemplate(
    input_variables=["context", "history", "query"],
    template=PROMPT_TEMPLATE
)

chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)

Here we chain the model with the prompt template and memory, allowing it to generate responses dynamically.

Deploying with Streamlit

Setting Up the UI

st.set_page_config(page_title="Banking Assistant", layout="wide")
st.title("Banking FAQ Assistant")
st.write("Ask questions about banking services, loan options, and interest rates!")

Above code sets up our Streamlit UI with a title and description.

Handling User Queries

user_input = st.text_input("Enter your query:")
send_button = st.button("Send")

Users can input their banking questions, and responses are triggered by clicking the Send button.

Processing the Query

if send_button:
    if user_input:
        try:
            context = ""
            response = chain.run(context=context, query=user_input)

            st.session_state.messages.append({"role": "user", "content": user_input, "is_user":True})
            st.session_state.messages.append({"role": "assistant", "content": response, "is_user":False})
            st.rerun()
        except Exception as e:
            st.error(f"Error generating response: {e}")
    else:
        st.warning("Please enter a valid query.")

Above code snippet helps us with following functionalities:

Generates a response using LangChain.
Updates chat history in Streamlit’s session state.

Lets have a look at the complete code snippet

import os
from os import getenv

import streamlit as st
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
import requests

# Constants
PROMPT_TEMPLATE = """
You are a banking assistant specializing in answering FAQs about loans, interest rates, and general banking services.
If the user greets, respond with a greeting. If the user asks a question, provide an answer.
Use the following context too for answering questions:

{context}

Conversation History: 
{history}

---


Answer the question based on the above context: {query}

"""

LANGDB_API_URL = "https://api.us-east-1.langdb.ai/your-project-key/v1"  # Replace with your LANGDB project id
os.environ["LANGDB_API_KEY"] = "your-api-key"

st.set_page_config(page_title="Banking Assistant", layout="wide")
st.title("Banking FAQ Assistant")
st.write("Ask questions about banking services, loan options, and interest rates!")

# Initialize LangChain LLM
llm = ChatOpenAI(
    base_url=LANGDB_API_URL,
    api_key=getenv("LANGDB_API_KEY"),
    model="gpt-3.5-turbo",  # Replace with the specific model name you are using
    timeout=10  # Add a timeout of 10 seconds
)

# Memory for conversation history
memory = ConversationBufferMemory(
    memory_key="history", 
    return_messages=True,
    input_key="query",
)

# Prompt Template for LangChain
prompt_template = PromptTemplate(
    input_variables=["context", "history", "query"],
    template=PROMPT_TEMPLATE
)

# LangChain LLM Chain
chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)

# Chatbox implementation
st.subheader("Chatbox")

# Container for chat messages
chat_container = st.container()

# Function to display chat messages
def display_message(message, is_user=True):
    if is_user:
        chat_container.markdown(f"{message}
", unsafe_allow_html=True)
    else:
        chat_container.markdown(f"{message}
", unsafe_allow_html=True)

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
with chat_container:
    for chat in st.session_state.messages:
        display_message(chat['content'], is_user=chat['is_user'])

# User Input Section
user_input = st.text_input("Enter your query:", key="user_input")
send_button = st.button("Send")

if send_button:
    user_input = st.session_state.user_input.strip()  # Ensure the input is not empty or just whitespace
    if user_input:
        try:
            context = "" # to be used in next tutorial
            response = chain.run(context=context, query=user_input)
            # Update conversation memory
            st.session_state.messages.append({"role": "user", "content": user_input, "is_user":True})
            st.session_state.messages.append({"role": "assistant", "content": response, "is_user":False})
            st.rerun()
        except requests.exceptions.Timeout:
            st.error("The request to the LLM timed out. Please try again.")
        except Exception as e:
            st.error(f"Error generating response: {e}")
    else:
        st.warning("Please enter a valid query.")

Final Thoughts: Scale Your AI with RAG!

Building a Banking FAQ chatbot with LangChain, LangDB, and ChromaDB enables users to access essential banking information effortlessly. By integrating memory and contextual awareness, this Conversational AI delivers clear and helpful responses.

🚀 What’s Next? In Part 2 of this series, we’ll dive into Building a RAG pipeline for more refined banking FAQs.

💡

Bonus: Get started for free with the complete source code and experiment with LangChain’s advanced features! Don’t forget to star⭐ our GitHub repo!

How to Build an Epic Conversational AI with LangDB.ai and Rasa-pro (and Get $10 Free Credit!)

Dishant Gandhi — Wed, 12 Feb 2025 12:02:20 GMT

In the ever-evolving landscape of AI analytical innovations, building a next-level chatbot is more than just coding, it’s about creating an ecosystem of AI integration solutions that drive enterprise AI governance, scaling AI, and superior AI observability. Today, we’re diving deep into how you can merge the formidable powers of LangDB and Rasa-pro to create a conversational AI system that’s both smart and scalable—all while enjoying a $10 free credit to kickstart your journey for free!

💡

Find all source code and starter pack by clicking here.

💡

Don’t forget to star us⭐

Alternatively you can also follow our Youtube tutorial

https://youtu.be/Se_pEedNe_4

What’s the Buzz About LangDB AI Gateway

LangDB AI Gateway is a powerful AI platform designed to help enterprises securely govern, optimize, and scale their AI solutions. As an AI gateway, it provides a seamless connection with over 150 large language models (LLMs) using OpenAI-compatible APIs.

By offering AI governance, cost efficiency, and enterprise AI solutions, LangDB ensures businesses can deploy AI models with enhanced security, performance, and reliability.

With LangDB, organizations benefit from AI as a service, enabling smooth AI for business intelligence operations while optimizing resources through intelligent model routing and observability. Whether you're building AI software solutions or integrating AI for enterprises, LangDB AI Gateway is the best artificial intelligence platform to streamline AI deployments at scale.

What is Rasa?

Rasa is a leading open-source framework for building conversational AI solutions. It is celebrated for its ability to empower developers with AI governance tools and ai management capabilities that streamline the process of creating engaging, intelligent chatbots. By incorporating Rasa into your workflow, you gain access to advanced policies and pipelines—ideal for enterprise AI governance and AI integration—that help manage and scale AI applications effectively.

What’s on Our Agenda?

Here’s a sneak peek into the roadmap of our ultimate conversational AI guide:

Installation and Setup: Learn how to create a Conda environment and install all the necessary packages.
Building Your Chatbot: Discover how to add engaging flows, configure responses, and fine-tune your Rasa project.
Integrating LangDB: Set up LangDB to bring powerful AI infrastructure tools into your project
Training and Deployment: Train your model and launch your AI server for a live demo

Step-by-Step Guide to Get Started

Installation

Kick off by setting up a dedicated Conda environment to keep your project clean and dependencies in check:

Download and Install Miniconda.
Once installed, follow below steps to install Rasa on your environment
Create your conda environment with

conda create env -n rasa-env python=3.10

Activate your conda environment

conda activate rasa-env

Turbocharge Your Setup, Speed up your installation process by installing uv:

pip install uv

Install Rasa-pro Now, install Rasa-pro

uv pip install rasa-pro --extra-index-url=https://europe-west3-python.pkg.dev/rasa-releases/rasa-pro-python/simple/

Setting Up Your Rasa Project

💡

Omit this step if you already have Rasa project and Rasa-pro key

If you’re new to Rasa or need a Rasa-pro license key, no sweat-grab yours from Rasa’s developer portal and set it in your environment:

You may receive a mail from Rasa with a license key for you to use, similar to below screenshot

Open your terminal and activate your conda environment rasa-env

In the same terminal, set your Rasa pro License as following

set RASA_PRO_LICENSE=your-key

Then, initialize your Rasa CALM project:

rasa init --template calm

This command scaffolds your project structure, priming it for some serious AI integration and AI application governance.

Building Your Rasa Chatbot: Nerd Out with Cool Flows and Responses

Adding flows

Let’s create a flow that greets users with some serious nerd cred. In the data/flows folder, create a file called greet.yml:

flows:
  greet:
    description: always run when the user first greets the bot
    name: greet
    steps:
      - action: utter_greet

This simple yet effective flow ensures your bot springs into action the moment someone drops a “Hi.”

Crafting Witty Responses

Next, update the shared.yml file in the domain folder to add your greeting response:

version: "3.1"

slots:
  return_value:
    type: any
    mappings:
      - type: custom
        action: add_contact
      - type: custom
        action: remove_contact

responses:
  utter_greet:
    - text: "Hello, how may I help you?"

Configuring Rasa for LangDB Integration

Integrate your Rasa project with LangDB by updating your config.yml:

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    model_group: openai-gpt-4

policies:
- name: FlowPolicy
- name: IntentlessPolicy
assistant_id: 20250207

Next step is to add this model group inside endpoints.yml with our LangDB Integration
Add below configuration at the very end of your endpoints.yml

model_groups:
  - id: openai-gpt-4
    models:
      - provider: openai
        model: gpt-4o
        api-base: "https://api.us-east-1.langdb.ai/your-project-id/v1"
        request_timeout: 7
        max_tokens: 256

id: your custom name for model-group(make sure to keep it same as in your config.yml)
model: add your LangDB model name
Remember to replace your-project-id with your actual project id to enjoy seamless AI integration and smart AI functionalities.

Follow below steps to fetch Project-id:

Training Your Rasa Model with LangDB

Before you train, set these environment variables to direct all your API calls through LangDB, ensuring robust AI operational visibility:
Use set or $env for Windows and export for Mac/Linux machines

OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://api.us-east-1.langdb.ai/your-project-id/v1
RASA_PRO_LICENSE=your-rasa-pro-license-key

Make sure to update your project-id
Why have we used OPENAI_BASE_URL environment variable?
- Rasa by default uses Openai API url to send all queries
- Since we are using LangDB, we have to overwrite OPENAI_BASE_URL to LangDB’s API url
Everything is set, lets train our model by running rasa train

Running our Conversational AI

Time to see your creation in action! Open two terminal windows:

OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://api.us-east-1.langdb.ai/your-project-id/v1
RASA_PRO_LICENSE=your-rasa-pro-license-key

Ensure both terminals have the necessary environment variables
Terminal 1: Actions server by running rasa run actions
Terminal 2: open Rasa’s interactive chat UI by running rasa inspect --debug
Watch as your Chat UI lights up with AI integration that delivers unparalleled AI and analytics performance.

Final Thoughts: Nerd Out and Optimize Your AI!

Integrating LangDB with Rasa-pro is more than just building a chatbot-it's about constructing a dynamic, analytical AI ecosystem that brings AI observability, AI governance tools, and end-to-end AI integrations to life. Whether you're diving into ai for managers, tinkering with ai in management, or simply passionate about ai integrations, this guide sets you up for success.

Bonus Alert: Get started for Free! with a $10 free credit to experiment with these cutting-edge tools without any upfront cost! Embrace your inner tech nerd, explore the endless possibilities of AI management tools and AI infrastructure tools, and join the revolution in AI solutions observability.

Don’t forget to check out our source code and starter pack, and if you love what you see, hit that star button on GitHub. Happy coding, and may your AI always be as smart, scalable, and observable as possible!

How Your LLM Interaction Data is a Goldmine for AI Innovation

Matteo Pelati — Wed, 12 Feb 2025 08:02:54 GMT

If you’re deploying AI agents or LLM-powered applications in your business, you might be sitting on an untapped goldmine—one that could revolutionize your product, customer experience, and competitive edge. The secret? The raw, unstructured data generated every time users interact with your AI.

Unlike traditional software, where user inputs are confined to buttons and dropdown menus, LLM applications invite free-form conversations. This creates a treasure trove of unfiltered insights into what users truly want, need, and struggle with. Add to this the layer of explicit feedback (e.g., thumbs up/down on responses), and you have a proprietary dataset that most companies overlook—but forward-thinking AI leaders are already leveraging to dominate their markets.

Here’s why this data matters, how to use it, and real-world examples to inspire your strategy.

The Goldmine You’re Sitting On

When users interact with an LLM-powered chatbot or a virtual assistant they ask questions in their own words. This organic, unstructured data reveals:

Pain points (e.g., “How do I reset my device without losing data?”)
Intent (e.g., “Is this product compatible with X?”)
Sentiment (e.g., frustration, confusion, satisfaction)

But the real magic happens when you combine this with preference data—explicit or inferred signals about what users value. This data can be captured in two powerful ways:

Direct User Feedback: (e.g., thumbs-down rating on a chatbot response)
LLM-as-Analyst: Use a secondary LLM to “read” conversations and infer preferences, even when users don’t explicitly rate responses.
- Example: An LLM flags frustration in a user’s message: “I’ve tried this three times already!”
- Example: An LLM detects unresolved issues: “Wait, that didn’t fix it.”

Together, these datasets let you:

Identify patterns (What do users actually care about?)
Improve your AI (Fine-tune models to fix weaknesses)
Drive business decisions (Which features are missing? Where is documentation lacking?)

How It Works: Combining Direct Feedback + LLM-Inferred Signals

Let’s break down a real conversation and its structured analytical output:

Raw Chat:

User: “Your instructions for resetting the AirPurifier Pro didn’t work. The red light is still on. What now?”

Chatbot: “Try unplugging the device for 60 seconds.”

User: “Still no luck. This is a waste of time.”

Structured Data Extracted:

{
  "product": "AirPurifier Pro",
  "issue_type": "Hardware Troubleshooting / Reset",
  "resolution_status": "No",
  "direct_feedback": null,  // User didn’t click a rating
  "inferred_sentiment": {
    "score": 1,  // LLM-as-analyst assigned based on tone
    "triggers": ["“Didn’t work”", "“Waste of time”"]
  }
}

Why This Matters:

Even without a direct rating, the LLM-as-analyst identified failure and frustration.
Aggregating these signals shows that “hardware troubleshooting / Reset” issues have a 70% unresolved rate, prompting a product team investigation.

An Example: How a Smart Devices Manufacturer scaled AI-Powered Support

Phase 1: Uncovering Hidden Patterns in Chat Data

A smart device manufacturer, noticed that 40% of customer support chats were being escalated to human agents. Their chatbot struggled with ambiguous queries, and user feedback averaged just 2.8/5.

Step 1: Post-Process Chats with an LLM Analyst

They ran 10,000+ historical human operated chats through an LLM to extract structured data:

Product: Device mentioned (thermostat, camera, etc.)
Issue Type: Connectivity, documentation, hardware, etc.
Sentiment: Inferred frustration/confusion from phrases like “This isn’t working!”
Resolution Status: Resolved by bot? (Yes/No)

Sample Insights from the Dashboard:

Product	Top Issue	Escalation Rate	Avg Sentiment
SmartThermo X3	Wi-Fi Setup	55%	2.1
AirPurifier Pro	Filter Replacement	35%	3.4
SecureCam 4K	Night Vision	60%	1.9

Key Findings:

Wi-Fi Setup for thermostats had the highest escalation rate (55%) and lowest sentiment.
Users asking about filter replacements often complained about “unclear instructions.”

Phase 2: Fine-Tuning a Specialized Support Model

Using the structured data they built a tailored dataset to fix their chatbot’s weaknesses.

Step 1: Create Training Data from Failed Interactions

For high-escalation queries like “Why won’t my thermostat connect to Wi-Fi?”, they paired raw chats with ideal resolutions from human agents:

Training Example:

{
  "prompt": "User: My SmartThermo X3 won’t connect to Wi-Fi. I’ve rebooted it twice. Chatbot: Have you checked your router? User: Yes, other devices work. This is so frustrating!",
  "completion": "Apologies for the trouble! The ‘network unavailable’ error often occurs due to a firmware mismatch. Go to Settings > About > Check Updates. If none appear, manually download the latest firmware [here]."
}

Step 2: Fine-Tune a Domain-Specific Model

They fine-tuned GPT-3.5 on 2,000+ examples like the above, focusing on:

Troubleshooting clarity: Step-by-step fixes instead of generic advice.
Proactive solutions: Linking to firmware updates or video guides.
Tone adjustments: Acknowledging frustration (e.g., “I understand this is urgent—let’s fix it quickly.”).

Phase 3: Results & Impact

After deploying the fine-tuned model:

Metric	Before Fine-Tuning	After Fine-Tuning
Escalation Rate	40%	12%
Avg. Feedback Score	2.8	4.5
Avg. Resolution Time	8.2 mins	3.1 mins

Real Conversations Post-Fine-Tuning:

User: “Night vision on my SecureCam 4K stopped working after the update.”

Old Bot: “Check your settings. If issues persist, contact support.”

Fine-Tuned Bot:

*“This is a known issue with firmware v2.1. Let’s fix it now:

Go to Settings > Night Vision > Toggle ‘Infrared Mode.’
If the issue persists, roll back to v2.0 [guide here].
Our team will notify you when v2.2 (fixing this bug) launches next week.”*

User Feedback: “Finally, a bot that actually solves problems!” (Rated 5/5)

Why This Worked

Closed-Loop Learning: Failed chats became training data.
Preference Data Fusion: Used both direct ratings (“thumbs down”) and LLM-inferred frustration (“This is so frustrating!”).
Business Alignment: The product team fixed recurring firmware bugs flagged by the chatbot’s analytics (e.g., night vision errors).

Your Playbook for Success

Mine Raw Chats: Use an LLM to structure conversations (product, issue, sentiment).
Build a “Failure Dataset”: Focus on escalated queries and low-rated responses.
Fine-Tune Iteratively: Start with 500–1,000 high-impact examples.
Measure Religiously: Track escalations, feedback, and resolution time.

By treating user interactions as a feedback loop—not just a cost center—they turned its chatbot into a competitive asset. The result? Happier customers, fewer support tickets, and a model that keeps improving itself.

Working with MCP Servers on LangDB

Mrunmay Shelar — Mon, 10 Feb 2025 04:40:14 GMT

In the evolving landscape of AI-powered applications, integrating multiple sources of data efficiently is key to building robust and intelligent systems. Model Context Protocol (MCP) provides a standardized framework that enables AI models to connect to various external services seamlessly while maintaining flexibility, security, and scalability.

LangDB makes it incredibly simple to work with MCP servers by enabling OpenAI-compatible API access to different LLMs and external search providers with minimal configuration.

What is MCP?

MCP (Model Context Protocol) provides a framework for AI models to connect to multiple external services through a standardized protocol. With native tool integrations, MCP connects AI models to APIs, databases, local files, automation tools, and remote services. This allows developers to effortlessly integrate MCP with IDEs, business workflows, and cloud platforms, while retaining the flexibility to switch between LLM providers. This enables the creation of intelligent, multi-modal workflows where AI securely interacts with real-world data and tools.

Where is MCP Used?

MCP is widely used across various fields to enhance AI functionality:

Data and Storage: Enables structured data management and retrieval.
Cloud & Infrastructure: Supports distributed computing and secure content delivery.
Development Tools: Automates repository management and error tracking.
Content and Search: Enhances search, indexing, and geolocation services.
AI & Memory: Powers vector search and machine learning applications.
Productivity: Integrates AI into collaboration and document management tools.
System & Utilities: Improves system automation and workflow efficiency.

LangDB simplifies MCP integration by sending requests to the appropriate MCP server based on the LLM's response, ensuring seamless retrieval of relevant data while enabling full tracing of interactions. This allows developers to monitor requests, analyze performance, and optimize responses efficiently.

Using MCP with LangDB

By specifying an MCP server in the request, developers can ensure seamless interactions between AI models and external data sources.

import os
from openai import OpenAI
client = OpenAI(
    api_key=os.getenv("LANGDB_API_KEY"),
    base_url=os.getenv("LANGDB_API_URL")
)
extra_body = {
    "mcp_servers": [
        {
            "server_url": "wss://your-mcp-server.com/ws?config=your_encoded_config",
            "type": "ws"
        }
    ]
}
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "what is langdb?"}],
    extra_body=extra_body
)

1. EXA Search Integration

EXA Search is a powerful search engine that provides structured and contextualized search results. Using LangDB, we can integrate EXA Search with minimal configuration.

2. Github

GitHub MCP Server enables seamless access to the GitHub API, allowing developers to perform file operations, repository management, search functionality, and more.

MCP Tracing and Observability in LangDB

One of the key benefits of using LangDB for MCP integration is full tracing and observability. With a simple change, developers can track requests, optimize response times, and analyze usage across different providers.

LangDB automatically manages these integrations, ensuring seamless transitions between different MCP servers without additional infrastructure overhead.

Using Smithery for MCP Deployments

We leveraged Smithery to streamline MCP deployments. To learn more:

Visit the LangDB Samples Repository for setup instructions and examples.
Checkout Smithery in the documentation.

For more details, visit the Model Context Protocol official page and explore Anthropic MCP documentation.

Building a Specialized RAG-Based AI Agent with LangDB and LlamaIndex

Dishant Gandhi — Tue, 04 Feb 2025 05:13:13 GMT

Introduction

In today's GenAI-driven landscape, enterprises increasingly integrate multiple AI agents to optimize decision-making, automate workflows, and enhance productivity. A key challenge is managing diverse AI models and libraries in a seamless, scalable way.

LangDB AI Gateway simplifies this by offering an intelligent AI Gateway that integrates effortlessly with popular AI frameworks.

💡

Learn more about AI Gateway and LangDB in our blog.

In this guide, we'll explore how to leverage LangDB with LlamaIndex to build a powerful, enterprise-ready AI solution that can store and retrieve knowledge efficiently.

By the end of this blog, you will have a fully functional Python application that queries a Large Language Model (LLM) and retrieves structured responses, paving the way for advanced enterprise AI workflows.

Pre-requisites

To follow along, ensure you have:

A LangDB account (Sign up here)
A LangDB API Key
Basic understanding of LlamaIndex

Installation

First, install the necessary dependencies:

pip install llama-index
pip install openai

Building a LlamaIndex-based Knowledge Store

Step 1: Set Up Your Data Storage

Create a data directory in your root folder and add relevant documents.

Step 2: Import Required Libraries

from llama_index.llms import openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
import uuid
import os

Step 3: Load and Index Documents

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist('storage')
query_engine = index.as_query_engine()
response = query_engine.query("What are the features of LangDB?")

Explanation:

The SimpleDirectoryReader loads all files from the data directory.
Documents are transformed into vector embeddings and stored persistently.
Query Engine enables us to retrieve relevant documents using an LLM-powered AI gateway.

Now, let’s supercharge this setup with LangDB.ai, one of the best artificial intelligence software solutions!

Configuring LangDB for Scalable AI Workflows

LangDB acts as a high-performance AI governance system, ensuring that our AI agents have structured and persistent memory.

Step 1: Create a LangDB Project

Navigate to Manage API Keys and generate a key.

Step 2: Load Environment Variables in Python

os.environ["OPENAI_API_KEY"] = "your-langdb-api-key"
os.environ["OPENAI_API_BASE"] = "https://api.us-east-1.langdb.ai"

Step 3: Configure LangDB with LlamaIndex

Settings.llm = OpenAI(
   base_url=os.getenv("OPENAI_API_BASE"),  # fetching variables declared on step 2
   api_key=os.getenv("OPENAI_API_KEY"),  # fetching variables declared on step 2
   model="gpt-4o-mini"  # Your preferred model
)

💡

A newer version of LlamaIndex uses Settings to add custom OpenAI model configurations.

Our connection and configuration is complete.

Let's have a look at the whole code

import os
from llama_index.llms import openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

langdb_api_base = os.getenv("LANGDB_URL", "https://api.us-east-1.langdb.ai") ### LangDB API base URL
LANGDB_API_KEY = os.getenv("LANGDB_API_KEY")

Settings.llm = OpenAI(
   base_url=langdb_api_base,
   api_key=LANGDB_API_KEY,
   model="gpt-4o-mini"
)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist('storage')
query_engine = index.as_query_engine()
response = query_engine.query("what are features of langdb?")
print(response)

Output:

For multiple projects

Use project id in your langdb_api_base url if you have multiple projects in your LangDB account
Below is the example:

langdb_api_base = "https://api.us-east-1.langdb.ai/your-project-id/v1"

Do more with LangDB

💡

🔗 LangDB AI Gateway is Open Source! Check out the repo & contribute: click here

Checkout our Youtube Video

https://youtu.be/YIhqgcPKJFw?si=fPZQC8wJk9n1Cd55

💡

💬 Join the conversation in our Slack community!

Now that we have a structured AI-powered Specialized RAG-Based AI Agent, here’s what we will bring next:

Build a multi-agent system with LlamaIndex + LangDB Dynamic Routing
Integrate Langchain and build a chat application with LlamaIndex and LangDB

You can also read our next guide: How to use LangChain with LangDB

Tracing Multiple Agents with LangDB

Mrunmay Shelar — Wed, 29 Jan 2025 04:13:32 GMT

When working with AI-powered workflows, efficiency, scalability, and cost control are. In many scenarios, a single-agent architecture often fails to meet all operational requirements, leading developers to adopt multi-agent workflows.

LangDB simplifies this by offering seamless tracing, analytics, and optimization tools for multi-agent workflows.

TLDR, Combining Gemini and GPT offers lower costs, though it’s slightly slower than using a single model.

Tag	Total Cost	Total Requests	Avg Duration (ms)	Total Duration (ms)	Total Input Tokens	Total Output Tokens	Avg TTFT (ms)	TPS	TPOT
[gemini]	0.037731	4	2390.0	9559.802	11197	345	2390.0	1207.35	0.03
[gpt-4o]	0.057021	4	4574.9	18299.686	17235	443	4574.9	966.03	0.04
[combined]	0.034887	4	2765.5	11061.997	10009	405	2765.5	941.42	0.03

Let’s break down a typical multi-agent workflow:

Example: Generating a Chart for a given question

User Request: "Generate a chart of average temperature in Alaska over the past decade."
Agent Roles:
1. Researcher Agent: Gathers raw temperature data by calling a search tool.
2. Chart Generator Agent: Processes the data and creates a visualization using a code execution tool.

Here’s how this workflow is structured across multiple agents.

To evaluate the efficiency of multi-agent workflows, we ran three different setups in LangDB. These scenarios illustrate how using the same model across agents compares to combining models.

Scenarios We Tested

To evaluate the efficiency of multi-agent workflows, we tested three setups. In the first scenario, both the Researcher Agent and Chart Generator Agent used Gemini, focusing on speed and cost. In the second scenario, both agents used GPT-4o, prioritizing accuracy but incurring higher costs. Finally, in the third scenario, we combined models: the Researcher Agent used Gemini for data gathering, while the Chart Generator Agent used GPT-4o for visualization. This allowed us to balance speed, cost, and accuracy

Tracing and Analytics with LangDB

After running these scenarios, we used LangDB's tracing capabilities to monitor task routing, response times, and token usage. The tracing interface allowed us to see exactly how tasks were distributed between agents and measure the efficiency of each workflow.

Using LangDB's analytics, we evaluated the performance data from these runs to generate the table presented earlier. This included:

Cost Analysis: Tracking how much each agent contributed to the overall expense.
Time Analysis: Measuring the average response time across different workflows.
Number of Requests: Recording how many requests each run processed.
Average Time to First Token (TTFT): Capturing the time taken for the first token to appear.
Tokens Per Second (TPS): Analyzing the efficiency of token generation per second.
Time Per Output Token (TPOT): Evaluating the time taken per output token to assess response efficiency.

Tag	Total Cost	Total Requests	Avg Duration (ms)	Total Duration (ms)	Total Input Tokens	Total Output Tokens	Avg TTFT (ms)	TPS	TPOT
[gemini]	0.037731	4	2390.0	9559.802	11197	345	2390.0	1207.35	0.03
[gpt-4o]	0.057021	4	4574.9	18299.686	17235	443	4574.9	966.03	0.04
[combined]	0.034887	4	2765.5	11061.997	10009	405	2765.5	941.42	0.03

Key Takeaways from the Scenarios

1. Gemini-Only Workflow

Cost: Slightly higher than combined but cheaper than GPT-4o.
Speed: Fastest average duration (2390 ms) and highest TPS (1207.35).

2. GPT-4o-Only Workflow

Cost: Highest overall.
Speed: Slowest duration (4574.9 ms) and lower TPS (966.03).

3. Combined Workflow

Cost: Lowest of all workflows.
Speed: Moderate duration (2765.5 ms) and TPS (941.42).

Why Tracing Matters for Multi-Agent Workflows

In a multi-agent workflow, different models handle specialized tasks. Without proper observability, you don't know how efficiently each model is performing. This lack of visibility can lead to:

Escalating Costs: Premium models being overused for trivial tasks.
Performance Bottlenecks: Delays caused by slow or inefficient task execution.
Operational Inefficiency: No clear insights on how to optimize workflows.

Why LangDB is Essential for Multi-Agent Workflows

With LangDB, you can:

Monitor Key Metrics: Track costs, latency, number of requests, and token usage for each workflow.
Analyze Performance Trends: Understand response times, time to first token, and processing speeds across different models.
Gain Detail Insights: Use detailed analytics to identify inefficiencies and optimize workflows accordingly.

Next Steps

To explore multi-agent workflows with LangDB:

Visit the LangDB Samples Repository for setup instructions and examples.
Try out LangDB to monitor and optimize your workflows in real time.
Experiment with combining different models to achieve the best results for your unique use case.

Explore LangDB’s capabilities by visiting the LangDB Samples Repository or trying the platform for multi-agent tracing and analytics.

Choosing the Right LLM for the Job with LangDB’s Dynamic Routing

Mrunmay Shelar — Wed, 22 Jan 2025 06:51:33 GMT

We got different kinds of LLMs in 2025 ranging from powerful reasoning models to lightweight, cost-efficient option. It can be challenging to one best fit in an complex workflow, especially when you want to have a balance between quality and cost.

TLDR; Using dynamic routing saves costs upto 55% in a Multi-agent RAG workflow

Metrics	Cost for 10 Queries	Cost for 20 Queries
GPT-4o (No Routing)	$0.2085	$0.6143
GPT-4o + Gemini-1.5-flash-8b (with Routing)	$0.1517	$0.2738
Cost Difference	27.25%	55.42%

The Challenge of Balancing Cost and Performance

In a multi-agent workflow, tasks can vary significantly in complexity:

Complex questions: Require deep reasoning and nuanced outputs, best suited for high-powered models like gpt-4o.
Simpler questions: Involve basic text generation or classification, which lightweight models like gemini-1.5-flash-8b can handle efficiently.

The challenge is ensuring that complex tasks get the power they need while avoiding unnecessary costs for simpler tasks. Over-reliance on a single model can either inflate costs or compromise quality.

How LangDB’s Dynamic Routing Helps

Instead of manually deciding which model to use for each query, LangDB lets you define a routing strategy that automates these decisions. For example, you can prioritize:

Cost-efficiency: Using lighter models whenever possible to reduce expenses.
Quality-first workflows: Using high-powered models for critical or complex tasks.
Balanced approaches: Combining cost and quality priorities by dynamically assessing task requirements.

Example:

Here’s an example of a simple cost-optimization strategy using LangDB:

routing_body = {
    "extra": {
        "strategy": {
            "type": "cost",
            "willingness_to_pay": 0.5
        },
        "models": ["gpt-4o", "gemini-1.5-flash-8b"],
    },
}

Key Points:

The "type": "cost" strategy ensures that LangDB evaluates each query for its complexity and routes it to the least expensive model that can handle it.
The "willingness_to_pay" parameter allows you to define your tolerance for higher costs. For instance, 0.5 means the system will only select the more expensive gpt-4o model if the complexity justifies the additional cost.

Experiment: Comparing Direct and Routed Workflows

To demonstrate the impact of LangDB’s Dynamic Routing, we conducted two tests using a Retrieval-Augmented Generation (RAG) workflow:

Direct Workflow: All queries were routed to the high-powered gpt-4o model.
Routed Workflow: Queries were dynamically routed between gpt-4o and gemini-1.5-flash-8b based on a cost-optimization strategy.

The tests included two sets of queries:

10 questions (Test 1)
20 questions (Test 2)

Results: Cost Comparison

Run 1: 10 Questions

For the first test, we compared the total costs of processing 10 questions using the direct and routed workflows.

Insights:

The Direct Workflow consistently used the high-powered gpt-4o model for all queries, resulting in higher overall costs.
The Routed Workflow leveraged gemini-1.5-flash-8b for simpler queries, reducing costs significantly while still using gpt-4o for complex tasks.

Run 2: 20 Questions

In the second test with 20 questions, the cost differences became even more pronounced.

Insights:

The Direct Workflow incurred a linear cost increase, as every query was routed to the expensive gpt-4o model.
The Routed Workflow scaled far more efficiently by handling a larger proportion of queries with the cost-effective gemini-1.5-flash-8b model.

Quality Analysis

To ensure that cost savings did not compromise the quality of the responses, we evaluated the outputs using an LLM-based scoring scheme. Each answer was rated on a scale of 1-10 using the following criteria:

Accuracy and correctness (4 points)
Completeness and depth (3 points)
Clarity and coherence (3 points)

Results:

Test	Mean Score (Direct Workflow)	Mean Score (Routed Workflow)	Cases Routed Scored Higher	Cases No Routing Scored Higher	Cases with Equal Scores
Test 1 (10 questions)	9.0	8.5	0	4	6
Test 2 (20 questions)	8.65	8.9	8	3	9

These results demonstrate that LangDB’s routing strategy did not compromise the overall quality of responses, even as it significantly reduced czosts.

Whether you’re managing small-scale projects or scaling to thousands of queries, LangDB’s Dynamic Routing ensures the best balance between cost and quality. It allows you to optimize workflows, achieve scalability, and focus resources where they’re needed most.

The experiment clearly shows that automating model selection can not only reduce costs but also maintain high standards of performance, making LangDB’s Dynamic Routing an invaluable tool for any organization leveraging LLMs.

The detailed experiment notebook will be published soon, offering deeper insights and allowing you to replicate our results.

Start optimizing your workflows today with LangDB AI and make smarter decisions with every query!

LangDB

Deep Integration: Building Multi-Agent Workflows with CrewAI & LangDB AI Gateway

TL;DR:

The Code

The Integration: CrewAI + LangDB AI Gateway as a Unified Platform

Seamless Integration Setup

Initialize the Integration

Configure the LangDB AI Gateway

Create LangDB AI Gateway-Enabled LLMs

Deep Integration Features

Model Selection across 350+ Models

Dynamic Tool Management

Built-in Observability

Advanced Integration Capabilities

Virtual Model Integration

MCP Tool Integration

Multi-Provider Model Access

Integrated Workflow Management

Crew Definition with LangDB AI Gateway Integration

Task Execution with Built-in Observability

Integration Benefits

Simplified Development

Dynamic Capabilities

Production Ready

Running the Integrated System

Real-World Integration Example

Conclusion

Kimi K2 vs Grok 4: Open-Source Challenger vs Premium Powerhouse

TL;DR:

Architecture

Kimi K2

Grok 4

Benchmarks

Use Cases

Accessibility through LangDB

Real-World LangGraph Performance

Conclusion

Designing Smart Multi-Agent Workflows with Agno & LangDB

TL;DR:

The Code

The Architecture: A Trio of Financial Experts

Enhanced Tracing with pylangdb.agno.init()

Code Walkthrough: Building the Team

The Web Search Agent: Decoupled and Dynamic

The Finance Agent: The Quantitative Analyst

The Coordinating Team: The Orchestrator

Dynamic Tooling with Virtual Models and Virtual MCPs

Running the Team and Observing the Results

Full Observability with LangDB Tracing

Conclusion

Grok 4: Fast Start Guide for Developers

Why Grok 4 matters

Building with Grok-4: The Developer's Stack

Example: Multi-Agent Financial Reasoning with Grok 4 & LangDB

Observability in Action: What LangDB Adds

Wrap-up

Further Reading & References

Discover End-to-End Tracing on Google ADK with LangDB

The Magic Behind the Curtain: pylangdb.adk.init()

The Architecture: Root Agent and Sub-Agents

Dynamic Tooling with Virtual Models and Virtual MCPs

Example: The inspiration_agent and Google Maps

Example: Grounding with Google Search

Example: The planning_agent for Flights and Hotels

The Flow: From Query to Answer

Conclusion

Empower Your AI Agents with Virtual MCP Servers

What is an MCP Server?

What are Virtual MCP Servers

Why You Need Virtual MCP Servers

Selective Tool Exposure

Clean Auth Handling

Version Lock-in + Poisoning Protection

Why Use a Virtual MCP?

How to Set Up a Virtual MCP Server

Who Is This For?

Final Thoughts

Explore More

Smarter Coding Workflows with Context7 + Sequential Thinking

How to Set It Up

Enhanced Tracing with `pylangdb.agno.init()`

The Magic Behind the Curtain: `pylangdb.adk.init()`

Example: The `inspiration_agent` and Google Maps

Example: The `planning_agent` for Flights and Hotels