Quick Start Β· Key Features Β· Web UI Β· How it Works Β· FAQ
π Agentic Search Β β’Β π§ Knowledge Clustering Β β’Β π Monte Carlo Evidence Sampling
β‘ Indexless Retrieval Β β’Β π Self-Evolving Knowledge Base Β β’Β π¬ Real-time Chat
Intelligence pipelines built upon vector-based retrieval can be rigid and brittle. They rely on static vector embeddings that are expensive to compute, blind to real-time changes, and detached from the raw context. We introduce Sirchmunk to usher in a more agile paradigm, where data is no longer treated as a snapshot, and insights can evolve together with the data.
Sirchmunk works directly with raw data -- bypassing the heavy overhead of squeezing your rich files into fixed-dimensional vectors.
- Instant Search: Eliminating complex pre-processing pipelines in hours long indexing; just drop your files and search immediately.
- Full Fidelity: Zero information loss β- stay true to your data without vector approximation.
Data is a stream, not a snapshot. Sirchmunk is dynamic by design, while vector DB can become obsolete the moment your data changes.
- Context-Aware: Evolves in real-time with your data context.
- LLM-Powered Autonomy: Designed for Agents that perceive data as it lives, utilizing token-efficient reasoning that triggers LLM inference only when necessary to maximize intelligence while minimizing cost.
Sirchmunk bridges massive local repositories and the web with high-scale throughput and real-time awareness.
It serves as a unified intelligent hub for AI agents, delivering deep insights across vast datasets at the speed of thought.
| Dimension | Traditional RAG | β¨Sirchmunk |
|---|---|---|
| π° Setup Cost | High Overhead (VectorDB, GraphDB, Complex Document Parser...) |
β
Zero Infrastructure Direct-to-data retrieval without vector silos |
| π Data Freshness | Stale (Batch Re-indexing) |
β
Instant & Dynamic Self-evolving index that reflects live changes |
| π Scalability | Linear Cost Growth |
β
Extremely low RAM/CPU consumption Native Elastic Support, efficiently handles large-scale datasets |
| π― Accuracy | Approximate Vector Matches |
β
Deterministic & Contextual Hybrid logic ensuring semantic precision |
| βοΈ Workflow | Complex ETL Pipelines |
β
Drop-and-Search Zero-config integration for rapid deployment |
-
π Feb 5, 2026: Release v0.0.2 β MCP Support, CLI Commands & Knowledge Persistence!
- MCP Integration: Full Model Context Protocol support, works seamlessly with Claude Desktop and Cursor IDE.
- CLI Commands: New
sirchmunkCLI withinit,config,serve, andsearchcommands. - KnowledgeCluster Persistence: DuckDB-powered storage with Parquet export for efficient knowledge management.
- Knowledge Reuse: Semantic similarity-based cluster retrieval for faster searches via embedding vectors.
-
ππ Jan 22, 2026: Introducing Sirchmunk: Initial Release v0.0.1 Now Available!
- Python 3.10+
- LLM API Key (OpenAI-compatible endpoint, local or remote)
- Node.js 18+ (Optional, for web interface)
# Create virtual environment (recommended)
conda create -n sirchmunk python=3.13 -y && conda activate sirchmunk
pip install sirchmunk
# Or via UV:
uv pip install sirchmunk
# Alternatively, install from source:
git clone https://github.com/modelscope/sirchmunk.git && cd sirchmunk
pip install -e .import asyncio
from sirchmunk import AgenticSearch
from sirchmunk.llm import OpenAIChat
llm = OpenAIChat(
api_key="your-api-key",
base_url="your-base-url", # e.g., https://api.openai.com/v1
model="your-model-name" # e.g., gpt-4o
)
async def main():
searcher = AgenticSearch(llm=llm)
result: str = await searcher.search(
query="How does transformer attention work?",
search_paths=["/path/to/documents"],
)
print(result)
asyncio.run(main())- Upon initialization,
AgenticSearchautomatically checks ifripgrep-allandripgrepare installed. If they are missing, it will attempt to install them automatically. If the automatic installation fails, please install them manually. - Replace
"your-api-key","your-base-url","your-model-name"and/path/to/documentswith your actual values.
Sirchmunk provides a powerful CLI for server management and search operations.
pip install "sirchmunk[web]"
# or install via UV
uv pip install "sirchmunk[web]"# Initialize Sirchmunk with default settings (Default work path: `~/.sirchmunk/`)
sirchmunk init
# Initialize with WebUI frontend build (requires Node.js 18+)
sirchmunk init --ui
# Alternatively, initialize with custom work path
sirchmunk init --work-path /path/to/workspace# Show current configuration
sirchmunk config
# Regenerate configuration file if needed (Default config file: ~/.sirchmunk/.env)
sirchmunk config --generate# Start backend API server only
sirchmunk serve
# Start with WebUI on a single port (requires prior `sirchmunk init --ui`)
sirchmunk serve --ui
# Development mode: backend + Next.js dev server with hot-reload
sirchmunk serve --ui --dev
# Custom host and port
sirchmunk serve --host 0.0.0.0 --port 8000# Search in current directory
sirchmunk search "How does authentication work?"
# Search in specific paths
sirchmunk search "find all API endpoints" ./src ./docs
# Quick filename search
sirchmunk search "config" --mode FILENAME_ONLY
# Output as JSON
sirchmunk search "database schema" --output json
# Use API server (requires running server)
sirchmunk search "query" --api --api-url http://localhost:8584| Command | Description |
|---|---|
sirchmunk init |
Initialize working directory and configuration |
sirchmunk init --ui |
Initialize with WebUI frontend build |
sirchmunk config |
Show or generate configuration |
sirchmunk serve |
Start the API server (backend only) |
sirchmunk serve --ui |
Start with embedded WebUI (single port) |
sirchmunk serve --ui --dev |
Start with Next.js dev server (hot-reload) |
sirchmunk search |
Perform search queries |
sirchmunk version |
Show version information |
Sirchmunk provides a Model Context Protocol (MCP) server that exposes its intelligent search capabilities as MCP tools. This enables seamless integration with AI assistants like Claude Desktop and Cursor IDE.
# Install MCP package
pip install sirchmunk-mcp
# Initialize and configure
sirchmunk-mcp init
sirchmunk-mcp config --generate
# Edit ~/.sirchmunk/.mcp_env with your LLM API key
# Test with MCP Inspector
npx @modelcontextprotocol/inspector sirchmunk-mcp serve- Multi-Mode Search: DEEP mode for comprehensive analysis, FILENAME_ONLY for fast file discovery
- Knowledge Cluster Management: Automatic extraction, storage, and reuse of knowledge
- Standard MCP Protocol: Works with stdio and Streamable HTTP transports
π For detailed documentation, see Sirchmunk MCP README.
The web UI is built for fast, transparent workflows: chat, knowledge analytics, and system monitoring in one place.
Build the frontend once, then serve everything from a single port β no Node.js needed at runtime.
# Initialize with WebUI build (requires Node.js 18+ at build time)
sirchmunk init --ui
# Start server with embedded WebUI
sirchmunk serve --uiAccess: http://localhost:8584 (API + WebUI on the same port)
For frontend development with hot-reload:
# Start backend + Next.js dev server
sirchmunk serve --ui --devAccess:
- Frontend (hot-reload): http://localhost:8585
- Backend APIs: http://localhost:8584/docs
# Start frontend and backend via script
python scripts/start_web.py
# Stop all services
python scripts/stop_web.pyConfiguration:
- Access
SettingsβEnvrionment Variablesto configure LLM API, and other parameters.
| Component | Description |
|---|---|
| AgenticSearch | Search orchestrator with LLM-enhanced retrieval capabilities |
| KnowledgeBase | Transforms raw results into structured knowledge clusters with evidences |
| EvidenceProcessor | Evidence processing based on the MonteCarlo Importance Sampling |
| GrepRetriever | High-performance indexless file search with parallel processing |
| OpenAIChat | Unified LLM interface supporting streaming and usage tracking |
| MonitorTracker | Real-time system and application metrics collection |
All persistent data is stored in the configured SIRCHMUNK_WORK_PATH (default: ~/.sirchmunk/):
{SIRCHMUNK_WORK_PATH}/
βββ .cache/
βββ history/ # Chat session history (DuckDB)
β βββ chat_history.db
βββ knowledge/ # Knowledge clusters (Parquet)
β βββ knowledge_clusters.parquet
βββ settings/ # User settings (DuckDB)
βββ settings.db
When the server is running (sirchmunk serve or sirchmunk serve --ui), the Search API is accessible via any HTTP client.
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/search |
Execute a search query |
GET |
/api/v1/search/status |
Check server and LLM configuration status |
Interactive Docs: http://localhost:8584/docs (Swagger UI)
cURL Examples
# Basic search (DEEP mode)
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "How does authentication work?",
"search_paths": ["/path/to/project"],
"mode": "DEEP"
}'
# Filename search (fast, no LLM required)
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "config",
"search_paths": ["/path/to/project"],
"mode": "FILENAME_ONLY"
}'
# Full parameters
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "database connection pooling",
"search_paths": ["/path/to/project/src"],
"mode": "DEEP",
"max_depth": 10,
"top_k_files": 20,
"keyword_levels": 3,
"include_patterns": ["*.py", "*.java"],
"exclude_patterns": ["*test*", "*__pycache__*"],
"return_cluster": true
}'
# Check server status
curl http://localhost:8584/api/v1/search/statusPython Client Examples
Using requests:
import requests
response = requests.post(
"http://localhost:8584/api/v1/search",
json={
"query": "How does authentication work?",
"search_paths": ["/path/to/project"],
"mode": "DEEP"
},
timeout=300 # DEEP mode may take a while
)
data = response.json()
if data["success"]:
print(data["data"]["result"])Using httpx (async):
import httpx
import asyncio
async def search():
async with httpx.AsyncClient(timeout=300) as client:
resp = await client.post(
"http://localhost:8584/api/v1/search",
json={
"query": "find all API endpoints",
"search_paths": ["/path/to/project"],
"mode": "DEEP"
}
)
data = resp.json()
print(data["data"]["result"])
asyncio.run(search())JavaScript Client Example
const response = await fetch("http://localhost:8584/api/v1/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
query: "How does authentication work?",
search_paths: ["/path/to/project"],
mode: "DEEP"
})
});
const data = await response.json();
if (data.success) {
console.log(data.data.result);
}Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
required | Search query or question |
search_paths |
string[] |
required | Directories or files to search (min 1) |
mode |
string |
"DEEP" |
DEEP or FILENAME_ONLY |
max_depth |
int |
null |
Maximum directory depth |
top_k_files |
int |
null |
Number of top files to return |
keyword_levels |
int |
null |
Keyword granularity levels |
include_patterns |
string[] |
null |
File glob patterns to include |
exclude_patterns |
string[] |
null |
File glob patterns to exclude |
return_cluster |
bool |
false |
Return full KnowledgeCluster object |
Note:
FILENAME_ONLYmode does not require an LLM API key.DEEPmode requires a configured LLM.
How is this different from traditional RAG systems?
Sirchmunk takes an indexless approach:
- No pre-indexing: Direct file search without vector database setup
- Self-evolving: Knowledge clusters evolve based on search patterns
- Multi-level retrieval: Adaptive keyword granularity for better recall
- Evidence-based: Monte Carlo sampling for precise content extraction
What LLM providers are supported?
Any OpenAI-compatible API endpoint, including (but not limited too):
- OpenAI (GPT-4, GPT-4o, GPT-3.5)
- Local models served via Ollama, llama.cpp, vLLM, SGLang etc.
- Claude via API proxy
How do I add documents to search?
Simply specify the path in your search query:
result = await searcher.search(
query="Your question",
search_paths=["/path/to/folder", "/path/to/file.pdf"]
)No pre-processing or indexing required!
Where are knowledge clusters stored?
Knowledge clusters are persisted in Parquet format at:
{SIRCHMUNK_WORK_PATH}/.cache/knowledge/knowledge_clusters.parquet
You can query them using DuckDB or the KnowledgeManager API.
How do I monitor LLM token usage?
- Web Dashboard: Visit the Monitor page for real-time statistics
- API:
GET /api/v1/monitor/llmreturns usage metrics - Code: Access
searcher.llm_usagesafter search completion
- Text-retrieval from raw files
- Knowledge structuring & persistence
- Real-time chat with RAG
- Web UI support
- Web search integration
- Multi-modal support (images, videos)
- Distributed search across nodes
- Knowledge visualization and deep analytics
- More file type support
We welcome contributions !
This project is licensed under the Apache License 2.0.
ModelScope Β· β Star us Β· π Report a bug Β· π¬ Discussions
β¨ Sirchmunk: Raw data to self-evolving intelligence, real-time.



