Ears
The ears of an Intelligence Lab — a single speech-to-text model that turns spoken audio into text, so an agent can hear what's being said in chat or on a phone call. It is the input half of the lab's audio pipeline; mouth speaks, brain thinks.
Working with it
Selecting a Ears reveals its settings in the properties panel; it has no dedicated full-screen workbench.
How it appears
The same element type rendered as a definition, a circle instance, and a live workspace card.
When to use / not
When to use
- Giving an agent a voice front door — letting it hear mic input in chat or caller audio on a SIP phone line, with brain reasoning over the transcript.
- Pinning a specific STT model (provider + model_id) and its expected audio format — sample rate, encoding, channels, language hint — inside a lab.
- Routing transcription to an in-cluster realtime server (vllm-realtime / openai-realtime) via a per-element base_url while sibling transcribers keep the lab's hosted API.
- Batch-transcribing a stored recording into text with the transcribe op, separate from any live stream.
When not to use
- Synthesizing text back into speech — that is the lab's mouth element; ears only goes audio to text.
- Driving real-time mic capture or phone audio yourself — the live streaming path is handled by the audio gateway directly, not the batch transcribe op.
- Configuring an ears as a free-standing element — it must live nested inside a lab, which supplies the API endpoint and credentials.
Topology
Lives nested inside a parent element rather than standing alone — it is created in the context of its container.
Properties
providerstring- STT provider dispatch key. `mistral` / `openai` / `google` / `custom` hit their respective hosted APIs using the parent lab's credentials. `vllm-realtime` (and the alias `openai-realtime`) route through an in-cluster server that implements the OpenAI Realtime API schema — use element-level `base_url` to point at it (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007).
model_idstring- Model identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602)
display_namestring- Human label for this transcriber
base_urlstring- Optional per-element endpoint override. Wins over the parent lab's `base_url`. Use for routing one ears to an internal deployment (e.g. `http://voxtral-2602.intelligence-production.svc.cluster.local:8007`) while keeping the lab's hosted-API URL for siblings. Leave empty to inherit from the parent lab.
credential_refstring- Reference to secret element with provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset.
sample_rateinteger- Expected input sample rate in Hz. 16kHz recommended for Voxtral.
encodingstring- Expected input encoding. pcm_s16le matches browser MediaRecorder and 46elks RTP.
channelsinteger- Mono (1) or stereo (2). Speech recognition typically uses mono.
chunk_duration_msinteger- Duration of each audio chunk sent to the provider in ms. Smaller = more responsive, larger = less chatty.
language_hintstring- ISO language code hint (e.g. "en", "sv"). Leave empty for auto-detect.
target_streaming_delay_msinteger- How long the provider waits before emitting partial transcripts. Lower = more responsive, higher = more accurate. 240ms is "fast", 2400ms is "slow/accurate".
endpoint_silence_msinteger- How long of a silence marks the end of an utterance (for turn-taking in voice agents).
pricingobject- Cost reference (USD)
Capabilities
Defined for this element
- Observe
Operations
- activityGET
- attachmentsGET
- batch_statsGET
- composePOST
- contextGET
- createPOST
- deleteDELETE
- disablePOST
- enablePOST
- export_bundleGET
- getGET
- import_bundlePOST
- infoGET
- intentionGET
- promotePOST
- readmeGET
- readme_updatePOST
- remove-modifierPOST
- restorePOST
- schemaGET
- sourceGET
- source_branchesGET
- source_promotePOST
- source_repairPOST
- source_statusGET
- source_validatePOST
- statsGET
- testPOST
- transcribePOST
- treeGET
- updatePATCH
- update_metaPATCH
- versionGET
Ports
Inputs
- requestrequest
- inforequest
- resultevent
Composition
Validation rules
- Ears model id required
Ears (ears)
Category: intelligence | Form: | Symbol: Er
A speech-to-text transcriber within an Intelligence Lab
An Ears is a specific STT model (e.g. Mistral Voxtral Mini Transcribe Realtime) configured with sample rate, language hint, and streaming delay. Agents with ears can hear what’s being said — in chat via the mic, or on phone calls via the SIP audio pipeline. Select ears by ID; the runtime finds the parent lab and handles the transcription.
Guide
A speech-to-text transcriber within an Intelligence Lab
What It Does
An Ears represents a single speech-to-text (STT) model within a Lab (provider) — for example Mistral Voxtral Mini Transcribe Realtime. It defines the transcriber’s identity (provider + model_id), the expected audio input format (sample rate, encoding, channels, chunk duration), an optional language hint, streaming behavior, and a cost reference. When something needs to turn audio into text, it references an ears — the runtime resolves which lab the ears belongs to and uses that lab’s connection details to transcribe incoming audio.
Ears are atoms with nested residence: they have no children and live inside a Lab element, which provides the API endpoint and credentials. An ears can also carry a per-element base_url that wins over the parent lab’s base_url — useful for routing one ears to an internal deployment (e.g. an in-cluster vllm-realtime server implementing the OpenAI Realtime API schema) while siblings keep the lab’s hosted-API URL.
Within the Lab, ears is the input/transcription half of the audio pipeline: ears turns speech into text, while its sibling mouth turns text into speech (synthesis). Both compose with brain (the LLM) so a lab can hear (ears), think (brain), and speak (mouth). Per the element hint, agents with ears can hear what’s being said — in chat via the mic, or on phone calls via the SIP audio pipeline. supports_streaming is true: the element advertises a streaming capability and endpoint-detection for turn-taking, with the real-time path handled by the audio gateway directly.
Element Definition
| Property | Value |
|---|---|
| Type | ears |
| Category | intelligence |
| Form | atom |
| Residence | nested |
| Symbol | Er / #3B82F6 (icon hearing) |
| Activity type | resource |
| Streaming | true |
| Handler | EarsHandler |
| Allowed visibility | collaborator |
| States | draft (initial) → active → error |
Properties
| Field | Type | Default | Description |
|---|---|---|---|
provider | string (enum) | mistral | STT provider dispatch key. mistral / openai / google / custom hit hosted APIs with the parent lab’s credentials; vllm-realtime (alias openai-realtime) routes through an in-cluster OpenAI Realtime API server via base_url |
model_id | string | — | Model identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602). Max 256 chars |
display_name | string | — | Human label for this transcriber. Max 128 chars |
base_url | string (url) | — | Optional per-element endpoint override. Wins over the parent lab’s base_url. Leave empty to inherit. Max 512 chars |
credential_ref | string (secret picker) | — | Reference to a secret element with the provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset |
sample_rate | integer (enum) | 16000 | Expected input sample rate in Hz. One of 8000 / 16000 / 22050 / 24000 / 44100 / 48000. 16kHz recommended for Voxtral |
encoding | string (enum) | pcm_s16le | Expected input encoding. One of pcm_s16le / pcm_s16be / pcm_f32le / opus. pcm_s16le matches browser MediaRecorder and 46elks RTP |
channels | integer (enum) | 1 | Mono (1) or stereo (2). Speech recognition typically uses mono |
chunk_duration_ms | integer (slider) | 480 | Duration of each audio chunk sent to the provider in ms (20–1000). Smaller = more responsive, larger = less chatty |
language_hint | string | — | ISO language code hint (e.g. en, sv). Leave empty for auto-detect. Max 8 chars |
target_streaming_delay_ms | integer (slider) | 480 | How long the provider waits before emitting partial transcripts (0–10000). Lower = more responsive, higher = more accurate |
endpoint_silence_ms | integer (slider) | 800 | How long a silence marks the end of an utterance for turn-taking (200–5000) |
pricing | object | — | Cost reference (USD): input_per_mtok (cost per million input audio tokens) and/or per_minute (cost per minute of audio) |
Ports
| Direction | Port | Schema | Required | Description |
|---|---|---|---|---|
| Input | request | TranscribeRequest | no | Audio to transcribe |
| Output | info | EarsInfo | yes | Ears metadata for the STT picker UI |
| Output | result | TranscribeResponse (event) | no | Transcription result |
Capabilities
| Capability | Description |
|---|---|
speech-to-text | Transcribe speech to text |
streaming | Incremental transcription over a live audio stream |
endpoint-detection | Detect end of utterance via silence for turn-taking |
Attaches / Modifiers
- Accepts modifier:
rate-limit
Error Codes
| Code | Class | Retryable | Description |
|---|---|---|---|
EARS_UNAVAILABLE | internal | yes | Ears’ parent lab is unreachable or credentials missing |
EARS_CREDENTIAL_MISSING | auth | no | Provider API key not set |
EARS_AUDIO_UNSUPPORTED | validation | no | Input audio format/sample rate not supported by the provider |
EARS_TRANSCRIPTION_FAILED | internal | yes | Provider returned an error during transcription |
Operations
| Operation | Method + Path | Auth | Description |
|---|---|---|---|
info | GET info | read | Get ears metadata |
transcribe | POST transcribe | execute | Transcribe an audio blob (batch) |
test | POST test | execute | Verify the STT connection |
info
GET info (auth read). Returns provider, model, and expected audio format — used by the STT picker UI. Output fields: provider, model_id, display_name, sample_rate, encoding, channels, language_hint, target_streaming_delay_ms.
transcribe
POST transcribe (auth execute). One-shot transcription of a complete audio buffer. For real-time use (mic capture, phone calls), prefer the streaming route handled by the audio gateway directly — this op is for stored recordings.
- Input:
audio_data_b64(required, base64-encoded audio),content_type(optional MIME-type override),language(optional ISO language code hint). - Output:
text,language,duration_ms,cost_au.
test
POST test (auth execute). Transcribes a short built-in test clip to verify the connection works. Output fields: success, latency_ms, text, error.
Quick Start
Creating via API
An ears is created inside its parent lab element:
POST /api/{circle}/{lab-element}/
Content-Type: application/json
{
"element_type": "ears",
"slug": "voxtral-ears",
"name": "Voxtral Mini Transcribe",
"spec": {
"provider": "vllm-realtime",
"model_id": "voxtral-mini-transcribe-realtime-2602",
"display_name": "Voxtral Mini Transcribe",
"base_url": "http://voxtral-2602.intelligence-production.svc.cluster.local:8007",
"sample_rate": 16000,
"encoding": "pcm_s16le",
"channels": 1,
"language_hint": "en"
}
}
Transcribing a stored recording
POST /api/{circle}/{ears}/ops/transcribe
Content-Type: application/json
{
"audio_data_b64": "<base64-encoded audio>",
"content_type": "audio/wav",
"language": "en"
}
Returns { "text": "...", "language": "en", "duration_ms": 1234, "cost_au": 5 }.
Verifying the connection
POST /api/{circle}/{ears}/ops/test
Transcribes a short built-in clip and returns success, latency_ms, text, and any error — useful for confirming credentials and connectivity before the ears goes active.
Common Mistakes
Missing model_id. Validation warns when spec.model_id is empty: an ears without a model_id can’t transcribe. Set it before the ears goes active.
Expecting transcribe to do real-time. The transcribe op is batch-only — one-shot over a complete audio buffer, for stored recordings. Real-time use (mic capture, phone calls) goes through the streaming route handled by the audio gateway directly, not this op.
Mismatched audio format. The provider must support the configured sample_rate/encoding/channels, or transcription fails with EARS_AUDIO_UNSUPPORTED. The defaults (16kHz, pcm_s16le, mono) match browser MediaRecorder and 46elks RTP; override the encoding per request with content_type when sending a different blob.
Assuming base_url is required for hosted providers. For mistral / openai / google / custom, the parent lab supplies the endpoint and credentials. Set base_url only to point one ears at an internal deployment (e.g. a vllm-realtime in-cluster server); leave it empty to inherit from the lab.
Forgetting the parent lab. An ears must live inside a lab element (nested residence) — the lab is what provides the connection details the runtime resolves to. Without a credential_ref, the provider key falls back to the platform MISTRAL_API_KEY (or provider-equivalent); if neither is set, calls fail with EARS_CREDENTIAL_MISSING.
Relationships
- Attaches to: rate-limit
Capabilities
- speech-to-text: Transcribe speech to text
- streaming: Incremental transcription over a live audio stream
- endpoint-detection: Detect end of utterance via silence for turn-taking
Properties
| Property | Type | Default | Description |
|---|---|---|---|
provider | string | "mistral" | STT provider dispatch key. mistral / openai / google / custom hit their respective hosted APIs using the parent lab’s credentials. vllm-realtime (and the alias openai-realtime) route through an in-cluster server that implements the OpenAI Realtime API schema — use element-level base_url to point at it (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007). |
model_id | string | — | Model identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602) |
display_name | string | — | Human label for this transcriber |
base_url | string | — | Optional per-element endpoint override. Wins over the parent lab’s base_url. Use for routing one ears to an internal deployment (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007) while keeping the lab’s hosted-API URL for siblings. Leave empty to inherit from the parent lab. |
credential_ref | string | — | Reference to secret element with provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset. |
sample_rate | integer | 16000 | Expected input sample rate in Hz. 16kHz recommended for Voxtral. |
encoding | string | "pcm_s16le" | Expected input encoding. pcm_s16le matches browser MediaRecorder and 46elks RTP. |
channels | integer | 1 | Mono (1) or stereo (2). Speech recognition typically uses mono. |
chunk_duration_ms | integer | 480 | Duration of each audio chunk sent to the provider in ms. Smaller = more responsive, larger = less chatty. |
language_hint | string | — | ISO language code hint (e.g. “en”, “sv”). Leave empty for auto-detect. |
target_streaming_delay_ms | integer | 480 | How long the provider waits before emitting partial transcripts. Lower = more responsive, higher = more accurate. 240ms is “fast”, 2400ms is “slow/accurate”. |
endpoint_silence_ms | integer | 800 | How long of a silence marks the end of an utterance (for turn-taking in voice agents). |
pricing | object | — | Cost reference (USD) |
Operations
activity
Get /ops/activity | Auth: Read
Get activity events for this element
Scope depends on element capabilities: individual elements query by element_id, project-form elements with activity-scope-members include member activities, circle-level elements with activity-scope-all query the entire circle. Gracefully returns empty list if activities table is missing (old circles).
attachments
Get /ops/attachments | Auth: Read
List all modifiers and resources attached to this element
Returns both modifiers (policy enforcement) and resources (data injection) with is_modifier flag to distinguish. Items in the generated MODIFIER_TYPES list are modifiers; everything else is a resource. Includes cascade_policy and version pin info.
batch_stats
Get /ops/batch_stats | Auth: Read
Get per-element statistics for all children of this element
Returns per-child stats plus an aggregate. Most meaningful on compound or manifest form elements (repositories, circles, projects); atoms have no children so the result is an empty children array with a zeroed aggregate. Uses efficient GROUP BY SQL. Weighted averages for eval scores.
compose
Post /ops/compose | Auth: Execute
Batch add and remove modifiers on this element in a single call
Declarative composition: add modifiers by ref path (slug or path@version) and remove by attachment ID, all in one atomic call on the target element. Each ‘add’ entry resolves the source element, validates topology, attaches with optional priority and cascade policy. Each ‘remove’ entry deletes the attachment row. Returns a summary of what was added and removed. Example: compose({ add: [{ref: “my-prompt”}, {ref: “rate-limit/api@v2”, priority: 50}], remove: [{attachment_id: “uuid”}] })
context
Get /ops/context | Auth: Read
Get connected elements (graph traversal)
Graph traversal showing all connected elements with their relationship type (contains, contained_by, references, referenced_by, attaches, etc.). Use ?depth=N to control traversal depth (default 1) and ?types=actor,data to filter by element types.
create
Post /ops/create | Auth: Write
Create child element
POST to the parent path — element_type goes in the request body, NOT the URL. Both element_type and slug are required and must be non-empty. Name is derived from slug if omitted. Writes to both Git and PostgreSQL. All elements are stored flat under the circle — no intermediate library wrapper rows.
delete
Delete /ops/delete | Auth: Admin
Delete element (soft delete)
Soft delete — sets state to ‘deleted’ but retains the record. Cannot delete elements that have children (has_no_bond precondition) or active runs. Requires admin auth and confirmation.
disable
Post /ops/disable | Auth: Admin
Disable element (hides and prevents use)
Idempotent — safe to call on already-disabled elements. Optionally pass a reason string. Disabled elements cannot be invoked or executed. Inverse of enable.
enable
Post /ops/enable | Auth: Admin
Enable element (makes usable and visible)
Idempotent — safe to call on already-enabled elements. Transitions element to ready/enabled state. Cannot enable deleted elements. Inverse of disable.
export_bundle
Get /ops/export/bundle | Auth: Read
Export element as downloadable git bundle
On non-root-namespace elements, returns a binary git bundle. On root-namespace (circle) elements, dispatch hands off to the circle’s own export_bundle op, which returns a multi-element JSON envelope with one base64 bundle per child element — this is intentional, not an error.
get
Get /ops/get | Auth: Read
Get element details
Element is already resolved by the routing layer — this returns the cached element, not a fresh DB query. Use the path /api/{circle}/{slug} to address elements.
import_bundle
Post /ops/import/bundle | Auth: Write
Import git bundle into element
Accepts a base64-encoded git bundle in the JSON bundle_base64 field. Use overwrite=true to replace existing elements with same slug (default skips duplicates). Imported elements get new UUIDs. Returns counts of imported/skipped elements and any errors.
info
Get /ops/info | Auth: Read
Get ears metadata
Returns provider, model, and expected audio format — used by the STT picker UI.
intention
Get /ops/intention | Auth: Read
Get element intention with full inheritance chain
Returns three levels: direct (this element’s intention), inherited (from category and root), and resolved (final merged intention). Useful for understanding an element’s purpose in context of its hierarchy.
promote
Post /ops/promote | Auth: Admin
Promote element configuration to a target environment
Only for manifest-form elements (projects). Environments advance: dev → demo → live. dev→demo requires member+ role, demo→live requires admin. Freezes member versions at promotion time (creates snapshot). Persists environment config to spec.environments.
readme
Get /ops/readme | Auth: Read
Get element README.md content
Reads README.md from the element’s git repository. Returns empty content (not an error) if no README exists. Always returns markdown format.
readme_update
Post /ops/readme_update | Auth: Write
Update element README.md content
Creates or overwrites README.md in the element’s git repo. Commits to the draft branch. Content must be provided as a markdown string.
remove-modifier
Post /ops/remove-modifier | Auth: Execute
Remove an attached modifier from this element by attachment ID
Removes a modifier/resource attachment by its row ID. The ID comes from the attachments or context API. This is the reverse of attach — called on the target element, not the source.
restore
Post /ops/restore | Auth: Admin
Restore element to a specific version
Automatically snapshots the current state before restoring (creates a ‘Before restore to vN’ version entry). Writes restored spec to git as .triform/spec.yaml. Git failures warn but don’t fail the operation — DB state is authoritative. Cannot restore deleted elements.
schema
Get /ops/schema | Auth: Read
Get element input/output schema (MCP tools/list compatible)
Returns type-level port schemas from the TypeRegistry — not instance-specific overrides. Includes direction (input/output), required flag, and JSON schema per port. Useful for understanding what data an element accepts and produces.
source
Get /ops/source | Auth: Read
Get any file’s content from the element’s git repository
Reads an arbitrary file from the element’s CAS-backed git tree by its relative path. Same store as
readme, just generalized. Path safety: rejects..traversal, leading/, and null bytes. Use this to viewmain.pyfor action elements, asset files for SPAs, etc. Returns empty content (not an error) if the file doesn’t exist.
source_branches
Get /ops/source/branches | Auth: Read
List Source branches for this element
Returns the standard draft/demo/live Source branches, their current commits, and promotion relationships. Use GET /api/{element_path}/ops/source/branches.
source_promote
Post /ops/source/promote | Auth: Write
Promote Source branch forward
Promotes draft to demo or demo to live through the generated element op path. Direct Git pushes to demo/live are blocked by Source policy.
source_repair
Post /ops/source/repair | Auth: Write
Inspect or repair the element Source index
Runs Source repair through the element operation path. Defaults to dry_run=true; set dry_run=false only after reviewing a dry-run report.
source_status
Get /ops/source/status | Auth: Read
Get Source control status for this element
Returns the branch-aware clone URL, checkout commands, current draft commit, child source-link count, portable export summary, Source health, warnings, and auth hints for the addressed element. Use the element-first path: GET /api/{element_path}/ops/source/status.
source_validate
Post /ops/source/validate | Auth: Read
Validate Source branch contents
Validates a Source branch before accepting local Git workflow changes or promotion. Defaults to branch=draft and rejects runtime data, generated output, secret material, and unreadable CAS refs.
stats
Get /ops/stats | Auth: Read
Get aggregate statistics for this element
Health status is computed: error if errors_per_day > 5 or success_rate < 0.8, warning if errors_per_day > 0 or success_rate < 0.95. Firing alerts escalate health to error/warning. Default period is ‘day’. Returns runs_per_day, success_rate, avg_duration_ms, and more.
test
Post /ops/test | Auth: Execute
Verify the STT connection
Transcribes a short built-in test clip to verify the connection works.
transcribe
Post /ops/transcribe | Auth: Execute
Transcribe an audio blob (batch)
One-shot transcription of a complete audio buffer. For real-time use (mic capture, phone calls), prefer the streaming route handled by the audio gateway directly — this op is for stored recordings.
tree
Get /ops/tree | Auth: Read
Get the element’s position in the graph — ancestors, children, references, and subtree statistics
Uses per-circle ElementGraph cache for O(1) lookups. Returns ancestors (containment chain), children (direct), members (references), referenced_by (reverse refs), attachments, and subtree stats. Default depth is 3, max is 10. Pass ?include_metadata=true for name/state on each node.
update
Patch /ops/update | Auth: Write
Update element
Partial update — send only the fields you want to change.
spec,name, andintentionare all independently optional.specMUST be a JSON object when present; deep-merged into the existing spec by default. Empty{"spec":{}}preserves existing spec content but still records a new version (no-op for content, not for version state). To clear/replace the entire spec wholesale send{"spec":{...},"deep":false}. List-typed spec fields use replace semantics (the patch list replaces the existing list, no array merging). Coordinates Git + DB writes. Slug cannot be changed after creation.
update_meta
Patch /ops/update_meta | Auth: Write
Update element metadata (lightweight merge — does NOT bump version or snapshot spec)
Shallow JSONB merge into element.meta. Top-level keys in the provided value replace existing meta values; other keys are preserved. Used for UI metadata like canvas positions, panel state, viewer preferences. Wire-shape op_name is
update_meta(distinct fromupdate) so SSE subscribers + the cache auto-invalidator can distinguish lightweight metadata changes from spec edits without inspecting the payload. The MutatingElementStore wrapper stamps this op_name on the lifecycle event emitted byupdate_element_metastorage calls.
version
Get /ops/version | Auth: Read
Get current version or full history
Returns current version by default. Pass ?history=true for full version history (up to ?limit=N, default 50). Versions are backed by the element_versions table. Every spec update creates a new version entry.
Error Codes
| Code | Class | Retryable | Description |
|---|---|---|---|
EARS_UNAVAILABLE | internal | yes | Ears’ parent lab is unreachable or credentials missing |
EARS_CREDENTIAL_MISSING | auth | no | Provider API key not set |
EARS_AUDIO_UNSUPPORTED | validation | no | Input audio format/sample rate not supported by the provider |
EARS_TRANSCRIPTION_FAILED | internal | yes | Provider returned an error during transcription |
Observability
Defined for this element
Metrics
- ears_transcribe_total
- ears_transcribe_latency_ms
- ears_stream_active
Pricing / cost
Inherited from intelligence
Operation costs
- invoke: 10000 micro-AU
Set it up
- Namestring
- A label for this transcriber
- Providerstring
- STT provider
- Modelstring
- Model ID (e.g. voxtral-mini-transcribe-realtime-2602)
- Language hintstring
- ISO code (e.g. en, sv). Leave blank for auto-detect.
- Streaming delaystring
- Provider wait time before emitting partials. 240ms = fast, 2400ms = accurate.
- Sample ratestring
- Input sample rate (Hz)