Ambient voice intelligence
for AI agents.
Not just transcription. Percept builds a knowledge graph from your conversations — entity extraction, speaker identification, relationship mapping, semantic search — so your agent actually understands what's being said.
Open source · Local-first · Works with 🦞 OpenClaw or any agent framework
Speak. It happens.
Seven action types by voice — email, text, reminder, search, calendar, note, and order.
Context Intelligence Layer
Transcription is a commodity. What happens after is what matters. The CIL transforms raw speech into structured, actionable context — so "email the client" actually works because your agent knows who the client is.
- Two-pass entity extraction — fast regex + LLM semantic pass
- Relationship graph with weighted edges and linear decay
- 5-tier entity resolution — exact, fuzzy, contextual, recency, semantic
- NVIDIA NIM embeddings + LanceDB vector search
- FTS5 full-text search with porter stemming
- Context packets — single JSON blob with everything an agent needs
- SQLite persistence — single file, zero deps, WAL mode
Everything that's working today.
Shipped in 5 days. Dogfooded daily. Not a roadmap — a product.
Entity Extraction
Two-pass pipeline extracts people, orgs, locations, projects, and topics. Maps relationships automatically from co-occurrence.
Knowledge Graph
Relationship graph with weighted edges and linear decay. 5-tier entity resolution — exact, fuzzy, contextual, recency, semantic. Your agent knows who "the client" is.
Speaker ID
Knows who's talking. Resolves contacts, builds per-speaker analytics. "That was Sarah" teaches it new voices.
Semantic Search
NVIDIA NIM embeddings + LanceDB vectors, with FTS5 keyword fallback. Search what anyone said, anytime.
Dashboard
Full management UI — conversation history, speaker management, contacts, settings, analytics, search, data export and purge.
Three-Tier Transcriber
Local (faster-whisper) → NVIDIA Riva → Cloud. Privacy by default. Your audio never leaves your machine unless you choose.
TTL Auto-Purge
Configurable retention — utterances 30d, summaries 90d, relationships 180d. Your data, your rules. Export anytime.
Works with what you wear.
Omi pendant for ambient intelligence. Apple Watch for push-to-talk. More coming.
Omi Pendant
Apple Watch
Any Webhook Source
Percept Protocol
A framework-agnostic JSON schema for voice → intent → action handoff. Six event types. Three transports. Unix composable.
Your agent framework doesn't matter. LangChain, CrewAI, AutoGen, 🦞 OpenClaw, or a bash script — if it reads JSON, it works with Percept.
# Pipe voice events to any consumer
percept listen \
| jq 'select(.type == "intent")' \
| my-agent --stdin
# Or use webhooks
percept serve \
--webhook https://my-agent.com/voice
# Or WebSocket
percept serve --ws
about the proposal deadline"
proposal deadline ✓"
Built for 🦞 OpenClaw
Percept is designed as a native 🦞 OpenClaw skill — your agent gets ears, context intelligence, and ambient awareness out of the box.
Every voice command flows through your agent's full context: memory, tools, integrations, relationship graph. "Email the client" just works because the system knows.
Works standalone too. Any framework that consumes JSON can integrate via the Percept Protocol.
Self-host in 5 minutes.
MIT license. Your audio stays on your machine. Single SQLite file — nothing to configure.
# Install
pip install getpercept
# Start (receiver + dashboard + CIL)
percept serve
# Dashboard at :8960 · Receiver at :8900
# Your agent can hear now.
Self-host free. Forever.
Open source is the product, not a teaser. Run it on your hardware, no strings attached.
Don't want to self-host? Join the waitlist — we'll notify you when the managed cloud is ready.
Get early access.
Be first to know when the hosted API launches and the repo goes public.