nForge
SIGNAL_FLOW
Architecture
Inputs
Engine
Outputs
SYSTEM_CAPABILITIES
Voice agents pass demos. They break in production.
01
AGENTNET
30+ simulated callers: Ananya the Hinglish learner, James the skeptical buyer, the angry complainer, the executive in a hurry. Each is a structured identity with traits, voice mappings, and stable demographic profiles for ID and data tests.
02
Auto-Suite Generation
Describe your bot. nForge auto-classifies the pattern (sales closer, intake, identity verifier, support...) and generates a tailored test plan.
03
3-Layer Test Catalog
Agent × Scenario × Test Case. Separate WHO calls from WHERE they call from and WHAT you're asserting. One persona x ten scenarios = ten distinct tests with no duplicated work.
04
2-Layer Evaluation
Deterministic voice metrics: responsiveness, conciseness, efficiency, delivery. Measured from audio and timestamps. LLM-as-judge for semantic quality. Both feed a composite VoiceScore A to F per call and per suite.
05
Gap Analysis Loop
Every question your bot couldn't answer accumulates as a knowledge_gap. Resolve it once and the next run gets smarter. The bot improves between releases without rewriting the prompt by hand.
06
Connect to Anything
WebSocket, Twilio Media Streams, Daily, OpenAI Realtime, Pipecat Protobuf, raw PCM. 15+ provider integrations including Anthropic, OpenAI, Cartesia, ElevenLabs, Deepgram. Plus any OpenAI-compatible self-hosted model.
BEHAVIORAL_PERSONAS
Test Personas
Ananya, Hinglish Learner
Eager beginner who code-switches Hindi-English mid-sentence. Tests STT robustness on accented input, agent patience, and graceful repair behavior when the model mishears.
James, Skeptical Buyer
Pushes back on every claim, asks for receipts, threatens to leave. Tests objection handling, pricing recovery, and whether your bot can hold a sales conversation against resistance.