Stack/nVoice

nVoice

BUILTOpen Source

A 3-second pause feels like an eternity to a caller. They start talking again. The agent gets confused. The conversation falls apart.

nVoice is the voice infrastructure that doesn't pause. Sub-1.5s end-to-end latency from caller speech to agent response. STT, LLM, and TTS in one pipeline. Multilingual including Hinglish. Built on Pipecat and LiveKit. Open source.

< 1.5sLatency: <1.5s

SIGNAL_FLOW

Architecture

Inputs

CALLER_AUDIO
nMEMORY_CONTEXT
nPULSE_SIGNALS

Engine

STT_PROCESSING
LLM_REASONING
TTS_SYNTHESIS

Outputs

VOICE_OUTPUT
TRANSCRIPT
LATENCY_METRICS

SYSTEM_CAPABILITIES

Full-stack voice infrastructure.

01

Sub-1.5s Latency

End-to-end from caller speech to agent response in under 1.5 seconds.

02

Multilingual

Supports English, Hindi, Hinglish, and extensible to other languages.

03

Pulse-Modulated Output

TTS output adapts pace, tone, and emotion based on real-time nPulse signals.

04

STT Integration

Pluggable speech-to-text. Deepgram, Whisper, or custom models.

05

TTS Integration

Pluggable text-to-speech. ElevenLabs, Resemble AI, or custom voices.

06

Transport Layer

Built on Pipecat and LiveKit for reliable, low-latency WebRTC streaming.

Open Source

nVoice is fully open source. Inspect every line of the voice pipeline. Contribute, fork, or self-host.

View on GitHub →