How SensAI Works
A look under the hood at how SensAI routes your requests, manages knowledge bases, and delivers real-time AI responses.
You do not need to understand any of this to use SensAI — but if you are curious about what happens when you send a message, this page explains it clearly.
Platform Overview
When you send a message in SensAI, it travels through a short chain of components and comes back to you as a streaming response — token by token, in real time.
- SensAI Frontend — the browser interface you interact with, built with Next.js
- SensAI Backend — a FastAPI server that handles authentication, routing, and streaming
- AI Models — external APIs from providers like OpenAI, Anthropic, Google, and Meta
The backend acts as an intelligent router and security layer. It validates your session, applies your settings (model choice, tone, custom instructions), and forwards your request to the appropriate AI provider.
How a Request Flows
Every message you send follows the same path: your browser sends an authenticated request to the SensAI backend, which validates your session, enriches the request with your preferences (model, tone, custom instructions), and forwards it to the chosen AI provider. The provider streams its response back through the backend to your browser, where it is displayed token by token. The entire round-trip typically begins showing output within a second or two of sending your message.
AI Model Routing
SensAI connects to multiple AI providers simultaneously. When you select a model in the settings panel, the backend routes your request to the correct provider's API.
Why Multiple Models?
Different AI models have different strengths. OpenAI's GPT-4o excels at instruction-following and broad general knowledge. Anthropic's Claude models are known for careful reasoning and long-context tasks. Google's Gemini models perform strongly on multimodal and factual tasks. Meta's Llama models are highly capable open-weight models suited to a wide range of applications.
By giving you access to all of them from a single interface, SensAI lets you pick the right tool for each task — without needing separate accounts or API keys for each provider.
How Routing Works
When you send a message, the backend inspects the model you have selected and routes the request to the corresponding provider's API using SensAI's own credentials. You never need to manage API keys yourself. The routing layer also handles retries and error handling, so temporary provider outages are handled gracefully without exposing errors to you unnecessarily.
Knowledge Base and RAG Pipeline
The Knowledge Agent uses a technique called Retrieval-Augmented Generation (RAG) to answer questions from your uploaded documents. Rather than stuffing your entire document into the AI, RAG finds only the most relevant passages and sends those.
Upload and Processing
When you upload a PDF, DOCX, or TXT file, SensAI immediately begins processing it. The text is extracted from the file, cleaned, and split into overlapping chunks so that each piece is focused and self-contained. Each chunk is then converted into a vector embedding — a numerical representation that captures the semantic meaning of the text. These embeddings are stored in a vector database, indexed and ready for fast similarity search the moment you ask a question.
Querying Your Documents
When you type a question in the Knowledge Agent, your question is also converted into a vector embedding using the same model that was used for your documents. The vector database then performs a similarity search — finding the chunks whose embeddings are closest in meaning to your question. Those top-matching chunks, along with your original question, are sent to the AI model, which reads them and produces an answer grounded in your documents.
How RAG Works Step by Step
- Text Extraction — When you upload a PDF, DOCX, or TXT file, SensAI extracts the raw text from it.
- Chunking — The text is split into smaller overlapping sections (chunks) so each piece is focused and manageable.
- Embedding — Each chunk is converted into a numerical representation (a vector embedding) that captures its meaning.
- Vector Storage — The embeddings are stored in a vector database, indexed and ready for fast similarity search.
- Query and Retrieval — When you ask a question, your question is also embedded, and the system finds the chunks whose embeddings are most similar to your question. Those chunks, along with your question, are sent to the AI model, which generates an answer and cites the sources.
This is why the Knowledge Agent can point you to exact passages — it always knows which chunks contributed to each answer.
Real-Time Streaming
SensAI displays AI responses as they are generated, word by word, rather than waiting for the full response to complete. This uses a web standard called Server-Sent Events (SSE).
How Streaming Works
Each "token" is roughly a word or part of a word. The AI model generates them sequentially, and each one is pushed to your browser the moment it is produced via a persistent SSE connection. The frontend appends each token to the response in real time, so you see the answer build up word by word rather than receiving a single large block of text after a delay.
Why Streaming Matters
Streaming makes the experience feel fast and responsive even for long answers, because you start reading while the model is still writing. It also gives you early signal — if the response is going in the wrong direction, you can stop it and rephrase your question without waiting for the full output. For longer reasoning tasks or detailed explanations, streaming is the difference between a tool that feels alive and one that feels sluggish.
Documentation AI Assistant
Every page in this documentation has an Ask AI button. Clicking it opens an AI assistant that is specifically trained on SensAI's own documentation.
How It Works
This assistant uses the same RAG pipeline described above. When SensAI is deployed, the documentation is automatically indexed — all pages are processed into chunks, embedded, and stored in a vector database. When you ask the assistant a question, it retrieves the most relevant documentation passages and uses them to answer. The assistant only draws from what is actually written in the docs, so its answers are grounded and verifiable rather than speculative.
Automatic Updates
Because the indexing happens at deploy time, the documentation assistant always reflects the current state of the docs. When a page is updated or a new feature is documented, the next deployment re-indexes everything automatically. You do not need to manually refresh or rebuild the assistant — it stays in sync with the documentation without any extra steps.
Security and Authentication
SensAI uses session-based authentication. When you sign in, a secure session is created and stored in your browser. Every request you make to the backend is validated against your session before anything is processed.
Session-Based Authentication
When you sign in, SensAI creates a secure, time-limited session tied to your account. This session token is sent automatically with every request you make — you never need to handle it manually. The backend validates the session on every call before processing anything, ensuring that only authenticated users can access the AI, their chat history, or their Knowledge Bases. Sessions expire after a period of inactivity, and you will be prompted to sign in again to continue.
Data Privacy
- Your data is private — Knowledge Bases, chat history, and settings are associated with your account only. No other user can access them.
- No cross-user sharing — There is no shared memory or state between users. Each account is fully isolated.
- Sessions expire — If you are inactive for an extended period, your session will expire and you will need to sign in again.
You do not need to manage API keys or any credentials beyond your SensAI account login.