Building a Stock AI Assistant: Architecture Deep Dive

How to Combine LangChain, Google Gemini, and FastAPI into a Scalable, API-Driven Application

The world of financial data is a sea of numbers, charts, and jargon. For most people, extracting a simple insight — like “which tech stocks are most popular right now?” — can be a daunting task. What if you could just ask?

That’s the premise behind a project I recently developed: a Stock Market AI Assistant that allows users to ask complex questions in plain English and receive clear, formatted answers. This isn’t just another chatbot; it’s a robust, scalable application built on a modern, API-driven architecture.

In this article, I’ll take you on an architectural tour of how this system was built. We’ll explore why separating the AI logic from the data layer is crucial and how technologies like LangChain, Google’s Gemini, FastAPI, and Gradio come together to create a powerful and maintainable solution.

The Vision: From Raw Data to Conversational Insights

The goal was simple: create an intuitive interface where a user can ask questions like:

“What are the top 10 most traded stocks?”
“Tell me about Apple’s stock price (AAPL).”
“Show me technology companies with a dividend yield over 3%.”

The application should understand the user’s intent, fetch the relevant data from a database, and present it in a clean, human-readable format.

Application With Response for Technology Companies

The Architectural Blueprint: A Three-Tiered Approach

To build a system that is both powerful and maintainable, I followed a core software engineering principle: Separation of Concerns. Instead of a monolithic script, the application is broken down into three distinct, independent layers that communicate via a well-defined interface.

Here’s a high-level look at the architecture:

Press enter or click to view image in full size

Let’s break down each component.

Tier 1: The User Interface (Gradio)

The frontend is a simple web interface built with Gradio. I chose Gradio for its ability to create clean, interactive demos for machine learning models with minimal code. It provides a textbox for user input and a space to display the formatted output from our AI agent.

Tier 2: The Brains (LangChain ReAct Agent with Gemini)

This is where the magic happens. The core logic is handled by a LangChain ReAct Agent. “ReAct” stands for “Reasoning and Acting.”

Reasoning: The agent uses a powerful Large Language Model (LLM) — in this case, Google’s Gemini 1.5 Flash — to understand the user’s question and reason about the steps needed to answer it.
Acting: The agent doesn’t access the database directly. Instead, it has access to a set of predefined “tools.” Each tool is a Python function designed to perform a specific action, like get_stock_information or get_sector_performance.

The agent’s job is to select the right tool and provide it with the correct input based on the user’s query.

Tier 3: The Data Powerhouse (FastAPI & PostgreSQL)

This is the most critical part of our architecture. Instead of letting the AI agent connect directly to the database, all data operations are handled by a dedicated, standalone FastAPI server.

This API-driven approach is a best practice for several reasons:

Security: The agent never has direct database credentials. It can only access data through the specific, secure endpoints we define.
Centralized Logic: All SQL queries and data processing logic are centralized in the API. If we need to optimize a query, we only have to change it in one place.
Scalability: If our application gets popular, we can scale the API by running multiple instances behind a load balancer, without touching the AI agent code.
Reusability: This API can now be used by other clients — a mobile app, a trading bot, or another internal service — completely independently of our Gradio UI.

The FastAPI server uses Pydantic for data validation and SQLAlchemy to interact with our PostgreSQL database, which stores all the stock market data.

The Journey of a User Query

To see how these layers work together, let’s trace a simple user query: “Tell me about Apple stock (AAPL)”.

User Input: The user types the question into the Gradio interface.
Agent Analysis: The question is sent to the LangChain agent. The Gemini LLM analyzes the text and thinks: “The user wants details for a specific stock. The ticker is ‘AAPL’. I should use the get_stock_information tool.”
Tool Selection: The agent decides to call the get_stock_information tool with the argument “AAPL”.
HTTP API Call: The Python code inside the tool doesn’t run a SQL query. Instead, it makes a simple, clean HTTP request to our backend: GET http://127.0.0.1:8000/stocks/AAPL.
Database Query: The FastAPI server receives this request. The corresponding endpoint function executes a pre-written, optimized SQL query to fetch Apple’s data from the PostgreSQL database.
JSON Response: The server packages the data into a clean JSON format and sends it back as the HTTP response.
Observation: The agent’s tool receives this JSON. It then formats the data into a friendly, human-readable string:

  Stock Details for AAPL:Company: Apple Inc.
Sector: Technology
Current Price: $175.25
...

8. Final Answer: The agent receives this formatted string as its “Observation.” It concludes that it now has the final answer and presents this text back to the user through the Gradio interface.

Key Implementation Highlight: The Resilient HTTP Client

A critical piece of glue between the agent and the API is the synchronous HTTP client. In a production system, network requests can fail. Our SyncHTTPStockMCPClient is designed for resilience. It uses a requests.Session object and is configured with an automatic retry strategy. If the API server is temporarily busy and returns a 503 error, the client will wait a moment and try again automatically. This small detail makes the entire application more robust.

# From sync_http_client.pyretry_strategy = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
backoff_factor=1
)
adapter = HTTPAdapter(max_retries=retry_strategy)
self.session.mount("http://", adapter)

Why This Architecture Is the Right Choice

Building the application this way might seem like more work upfront than a simple script, but the long-term benefits are immense:

Maintainability: Each component can be developed, tested, and updated independently. A bug in the UI doesn’t require a change to the database schema.
Testability: We can write unit tests for our API endpoints completely separately from the complexities of the AI agent.
Flexibility: We can swap out any component without rewriting the entire system. If we want to switch from Gradio to a React frontend, we can. If we find a better LLM than Gemini, we just update the agent. The API backend remains untouched.

Conclusion

By combining the natural language power of LangChain and LLMs with the structured, secure, and scalable world of API backends, we can build truly remarkable applications. This Stock Market AI Assistant is more than just a demo; it’s a blueprint for creating production-ready, AI-powered systems that are built to last.

The future of software is not just about making applications more powerful, but also about making them more intuitive. This architectural approach provides a solid foundation for that future.