Transform scattered documents, emails, and files into an intelligent knowledge system. AI-generated answers with verifiable citations, powered by your data, on your infrastructure.
Knowledge workers spend 20% of their time searching for information. Traditional tools can't connect the dots.
Searching "Q4 decisions" doesn't find the email that says "we agreed to prioritize latency"
AI chatbots hallucinate confidently. No way to verify where information came from
Sensitive documents leak into AI responses. HR data mixed with public content
Information trapped in emails, Slack, Drive, and wikis. No unified view
Hybrid vector + keyword search finds answers by meaning, not just matching words
Click any claim to see the original source. Frozen snapshots preserve provenance
Permissions checked before content reaches the AI. 13 red-team tests verify isolation
One system ingests all sources and builds an auto-generated wiki organized by topic
A complete system, not a library. Ingest, search, synthesize, and govern your organization's knowledge.
Vector embeddings + full-text search fused via Reciprocal Rank Fusion. Graph-enhanced retrieval connects entities across documents.
Fine-grained ACLs, sensitivity labels, pre-ranking permission filtering. GDPR-ready with tombstone propagation for right-to-forget.
Email (Gmail, Outlook), cloud drives, PDFs, web pages, Markdown. Thread-aware email processing with quote detection and delta extraction.
Every AI-generated answer links to source chunks with frozen snapshots. Click to verify. Provenance tracking from ingestion to answer.
AI synthesizes clean, topic-organized wiki pages from raw documents. Taxonomy with 5 categories, bidirectional linking, and staleness detection.
Automatic entity extraction (11 types), relationship mapping, and interactive graph visualization. Multi-hop reasoning across documents.
A production-grade pipeline that processes, enriches, and indexes your documents for instant retrieval.
Connect your sources. Upload documents, link email accounts, paste URLs. Connectors handle authentication and incremental sync.
Documents are parsed with format-aware extractors, then split into semantic chunks that preserve context and structure.
LLMs extract entities, assign topic tags, generate summaries, and classify content. All enrichments run in parallel for speed.
Vector embeddings stored in PostgreSQL with pgvector HNSW index. Full-text search via GIN index. No separate vector database needed.
Hybrid search finds relevant chunks, ACL filtering enforces permissions, reranking prioritizes quality, and an LLM generates cited answers.
Proven technologies that your team already knows. Everything runs in Docker Compose on a single machine.
Async Python backend
Vectors + relations + graph
Cache + rate limiting
Durable orchestration
React 19 frontend
Full stack in 5 containers
Embeddings (512-dim)
LLM gateway + fallbacks
Not a library, not a vector database. A complete, self-hosted knowledge system.
| Capability | Hippocortex | RAG Libraries | Commercial RAG | Vector DBs |
|---|---|---|---|---|
| Self-hosted / data control | ✓ | ✓ | ✗ | ✓ |
| Production-ready application | ✓ | ✗ | ✓ | ✗ |
| ACL / governance | ✓ | ✗ | ✓ | ✗ |
| Knowledge graph | ✓ | ✗ | ✗ | ✗ |
| Citation provenance | ✓ | ✗ | Partial | ✗ |
| Auto wiki synthesis | ✓ | ✗ | ✗ | ✗ |
| No vendor lock-in | ✓ | ✓ | ✗ | Partial |
| Single-DB architecture | ✓ | ✗ | ✗ | ✗ |
Deploy on your infrastructure with Docker Compose. No vendor lock-in, no usage-based pricing, full data control.