Stack/nForge

nForge

HARDENING
AGENTNET dials your bot in real audio. VoiceScore A to F per call. Every question your bot missed loops back into project knowledge, automatically.
AGENTNETAJMERYOUR BOTreal audioVOICESCOREB+RESPONSIVEEFFICIENCYDELIVERYGOALREAL CALLS · GRADED A TO F · GAP LOOPAGENTNET: 30+

SIGNAL_FLOW

Architecture

Inputs

PRODUCT_DESC
BOT_ENDPOINT
AGENT_SELECT

Engine

AUTO_SUITE_GEN
AGENTNET_CALL
VOICE_METRICS
LLM_AS_JUDGE

Outputs

VOICESCORE_AF
KNOWLEDGE_GAPS
TRANSCRIPTS_WAV
RUN_HISTORY

SYSTEM_CAPABILITIES

Voice agents pass demos. They break in production.

01

AGENTNET

30+ simulated callers: Ananya the Hinglish learner, James the skeptical buyer, the angry complainer, the executive in a hurry. Each is a structured identity with traits, voice mappings, and stable demographic profiles for ID and data tests.

02

Auto-Suite Generation

Describe your bot. nForge auto-classifies the pattern (sales closer, intake, identity verifier, support...) and generates a tailored test plan.

03

3-Layer Test Catalog

Agent × Scenario × Test Case. Separate WHO calls from WHERE they call from and WHAT you're asserting. One persona x ten scenarios = ten distinct tests with no duplicated work.

04

2-Layer Evaluation

Deterministic voice metrics: responsiveness, conciseness, efficiency, delivery. Measured from audio and timestamps. LLM-as-judge for semantic quality. Both feed a composite VoiceScore A to F per call and per suite.

05

Gap Analysis Loop

Every question your bot couldn't answer accumulates as a knowledge_gap. Resolve it once and the next run gets smarter. The bot improves between releases without rewriting the prompt by hand.

06

Connect to Anything

WebSocket, Twilio Media Streams, Daily, OpenAI Realtime, Pipecat Protobuf, raw PCM. 15+ provider integrations including Anthropic, OpenAI, Cartesia, ElevenLabs, Deepgram. Plus any OpenAI-compatible self-hosted model.

BEHAVIORAL_PERSONAS

Test Personas

A

Ananya, Hinglish Learner

AGENTNETMULTILINGUALL2

Eager beginner who code-switches Hindi-English mid-sentence. Tests STT robustness on accented input, agent patience, and graceful repair behavior when the model mishears.

J

James, Skeptical Buyer

AGENTNETSALESOBJECTIONS

Pushes back on every claim, asks for receipts, threatens to leave. Tests objection handling, pricing recovery, and whether your bot can hold a sales conversation against resistance.