LlamaTalks

Language(s): Java

LlamaTalks is a sophisticated Spring Boot-based chatbot application that democratizes access to advanced AI by leveraging local LLMs through Ollama. It integrates LangChain4j to orchestrate complex conversational flows and implements a robust Retrieval-Augmented Generation (RAG) pipeline. This allows users to chat with their own data, ensuring responses are grounded in provided context rather than just the model's training data.

Designed for performance and scalability, the application supports fully reactive, real-time response streaming using Reactor/Flux and Server-Sent Events (SSE). It creates a seamless "typing" effect similar to commercial LLM interfaces.

Technical Architecture

The system follows a modular microservices-ready architecture:

Core Backend: Built with Spring Boot 3, utilizing JPA/Hibernate for structured data persistence (conversations, messages).
AI Orchestration: LangChain4j manages the interaction with LLMs, memory management, and the RAG pipeline.
Data Ingestion: Apache Tika is used to parse and extract text from various document formats for the vector store.
Vector Store: Embeddings are generated locally and stored in a vector database to enable semantic search capabilities.

      
[ User Request ] --> [ Spring Boot API ] --> [ LangChain4j Orchestrator ]
                            ↓                        ↓
                     [ Vector Store ]  <--   [ Ollama (Local LLM) ]

Key Features

Context-Aware RAG: Dynamically retrieves relevant information from uploaded documents to answer user queries accurately.
Streaming API: Utilizing Spring WebFlux principles for non-blocking, real-time token streaming.
Flexible Model Support: Easily switch between different open-source models (Llama 3, Mistral, Gemma) hosted via Ollama.
Persistent Chat History: Full conversation history is stored in PostgreSQL, allowing context retention across sessions.

Technologies Used

JavaSpring BootLangChain4jOllamaRAGSSE