nVoice

BUILTOpen Source

A 3-second pause feels like an eternity to a caller. They start talking again. The agent gets confused. The conversation falls apart.

nVoice is the voice infrastructure that doesn't pause. Sub-1.5s end-to-end latency from caller speech to agent response. STT, LLM, and TTS in one pipeline. Multilingual including Hinglish. Built on Pipecat and LiveKit. Open source.

Let's Talk View Source

Latency: <1.5s

SIGNAL_FLOW

Architecture

Inputs

CALLER_AUDIO

nMEMORY_CONTEXT

nPULSE_SIGNALS

→

Engine

STT_PROCESSING

LLM_REASONING

TTS_SYNTHESIS

→

Outputs

VOICE_OUTPUT

TRANSCRIPT

LATENCY_METRICS

SYSTEM_CAPABILITIES

Full-stack voice infrastructure.

Sub-1.5s Latency

End-to-end from caller speech to agent response in under 1.5 seconds.

Multilingual

Supports English, Hindi, Hinglish, and extensible to other languages.

Pulse-Modulated Output

TTS output adapts pace, tone, and emotion based on real-time nPulse signals.

STT Integration

Pluggable speech-to-text. Deepgram, Whisper, or custom models.

TTS Integration

Pluggable text-to-speech. ElevenLabs, Resemble AI, or custom voices.

Transport Layer

Built on Pipecat and LiveKit for reliable, low-latency WebRTC streaming.

Open Source

nVoice is fully open source. Inspect every line of the voice pipeline. Contribute, fork, or self-host.

View on GitHub →

Works with

nPulse

Emotion signals modulate voice output

nMemory

Conversation context feeds the LLM

nGuard

Production readiness audit for the voice pipeline