Can I use my own LLM?

No. Voxworks is a calibrated system and to maintain the best experience for our users we preselect the best available AI models that work for our overall stack.

How do you handle "Barge-in"?

We use an aggressive VAD (Voice Activity Detection) model on the incoming audio stream. If speech is detected while the AI is speaking, we send a "Clear Buffer" signal to cut audio instantly.

Does it support function calling?

Yes. You can define "Tools" (e.g., checkCalendar, lookUpPricing). The Engine will pause, execute your API call, and use the result to formulate the answer.

Voxworks is more expensive that other standard AI voice agents because it is a higher quality voice experience. We charge per minute of conversation. There are no setup fees. See our Pricing page for volume discounts.

Do you support other languages?

The voice engine natively supports any language however please contact support if you want to use a language other than english.

Voxworks Voice Engine

Australia's fastest AI voice engine for phone calls.

Built for ultra-low latency calling on Australian networks. Voxworks delivers natural conversational turn handling at speed, so your calls feel human, stay on-script, and stay safe.

Core Capabilities

Engineered for the millisecond.

Most LLMs are too slow for phone calls. We optimized the entire stack: VAD, STT, LLM, and TTS to shave off every millisecond of delay.

Sub-800ms Response

Optimised for real-time calling. We host inference at the edge in Sydney to minimize network travel time for local calls.

True Barge-In

Users can interrupt the AI mid-sentence. Our Voice Activity Detection (VAD) stops audio playback instantly, just like a human would.

Telco Noise Robustness

Tuned for 8kHz phone audio. It filters out background noise, static, and poor reception to understand intent clearly.

Safety Guardrails

Define strict guardrails that controls what the AI says in relation to your products or competitors, rather than letting AI improvise.

Smart Endpointing

The engine distinguishes between a "pause for thought" and "end of turn," reducing those awkward moments where the AI cuts you off.

Why Developers Choose Voxworks

Latency is the feature.

In voice, speed = intelligence. Slow responses make users hang up. Our infrastructure is peered directly with major AU carriers to ensure the fastest possible packet transit.

Voxworks Voice Engine↗

Sydney

Text

HTTP / Webhooks

Audio Out

Contact

I/O Channels

Gateway

STT

LLM

TTS

Custom
Business
Logic

Calendar

CRM

Pre-configured Agents

Don't start from zero. Clone these battle-tested JSON configurations.

The "Listener" Config

High endpointing timeout (1200ms) for consultative calls where users speak in long paragraphs.

TherapySupportAdvice

The "Rapid Fire" Config

Aggressive turn-taking (400ms endpointing) and concise answers for fast-paced qualifying calls.

LeadsQualifyingDispatch

The "Secure" Config

PII scrubbing enabled, strict state enforcement, and zero data retention mode.

FinanceHealthEnterprise

Hear the Engine

Listen to raw audio output demonstrating edge-case handling.

Interruption Handling

User: "Actually, wait, stop." -> AI stops instantly and asks for clarification.

Fast Turn-Taking

A rapid back-and-forth conversation checking name, address, and date without pauses.

Aussie Slang Test

User uses terms like "Arvo", "Rego", and "Ute" -> AI understands and responds appropriately.

Architecture

How the Voice Engine pipeline works.

Audio Ingest (WebSocket)

Your telephony provider sends µ-law 8kHz audio via WebSocket. We buffer and process typically within <50ms.

VAD & Transcription

Our Voice Activity Detector flags speech vs noise. The transcription model converts speech to text, optimized for Australian accents.

Reasoning & State Check

The LLM determines the next action based on your defined State Graph. It checks guardrails before generating a single token.

Streaming Synthesis

The TTS engine begins streaming audio bytes back to the caller before the full sentence is even generated.

Connectivity

Works with your stack.

Connect via standard protocols. No proprietary hardware required.

Twilio Media Streams

One-click XML configuration to fork audio from Twilio Programmable Voice.

Telnyx / SignalWire

Native support for VXML and WebSocket audio forks.

SIP Trunking

Direct SIP-in capabilities for high-volume enterprise diallers.

Voxworks API

Control the call logic from your own backend code in real-time.

Infrastructure

Enterprise uptime & security.

Check our Status Page for real-time latency metrics across all Australian capital cities.

99.95% Uptime.
Servers located in Sydney (AWS ap-southeast-2) for minimum latency.
ISO27001 aligned infrastructure.
Ephemeral processing mode (no audio stored to disk).

Technical Questions

On Australian networks, we aim for <800ms "Voice-to-Voice" latency. This includes network transit, transcription, LLM token generation, and TTS synthesis.