Download all docs
intelligence

Ears

The ears of an Intelligence Lab — a single speech-to-text model that turns spoken audio into text, so an agent can hear what's being said in chat or on a phone call. It is the input half of the lab's audio pipeline; mouth speaks, brain thinks.

Working with it

Selecting a Ears reveals its settings in the properties panel; it has no dedicated full-screen workbench.

How it appears

The same element type rendered as a definition, a circle instance, and a live workspace card.

Er
type

Ears

A speech-to-text transcriber within an Intelligence Lab

intelligenceatomdefinition

When to use / not

When to use

  • Giving an agent a voice front door — letting it hear mic input in chat or caller audio on a SIP phone line, with brain reasoning over the transcript.
  • Pinning a specific STT model (provider + model_id) and its expected audio format — sample rate, encoding, channels, language hint — inside a lab.
  • Routing transcription to an in-cluster realtime server (vllm-realtime / openai-realtime) via a per-element base_url while sibling transcribers keep the lab's hosted API.
  • Batch-transcribing a stored recording into text with the transcribe op, separate from any live stream.

When not to use

  • Synthesizing text back into speech — that is the lab's mouth element; ears only goes audio to text.
  • Driving real-time mic capture or phone audio yourself — the live streaming path is handled by the audio gateway directly, not the batch transcribe op.
  • Configuring an ears as a free-standing element — it must live nested inside a lab, which supplies the API endpoint and credentials.

Topology

Lives nested inside a parent element rather than standing alone — it is created in the context of its container.

Properties

providerstring
STT provider dispatch key. `mistral` / `openai` / `google` / `custom` hit their respective hosted APIs using the parent lab's credentials. `vllm-realtime` (and the alias `openai-realtime`) route through an in-cluster server that implements the OpenAI Realtime API schema — use element-level `base_url` to point at it (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007).
model_idstring
Model identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602)
display_namestring
Human label for this transcriber
base_urlstring
Optional per-element endpoint override. Wins over the parent lab's `base_url`. Use for routing one ears to an internal deployment (e.g. `http://voxtral-2602.intelligence-production.svc.cluster.local:8007`) while keeping the lab's hosted-API URL for siblings. Leave empty to inherit from the parent lab.
credential_refstring
Reference to secret element with provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset.
sample_rateinteger
Expected input sample rate in Hz. 16kHz recommended for Voxtral.
encodingstring
Expected input encoding. pcm_s16le matches browser MediaRecorder and 46elks RTP.
channelsinteger
Mono (1) or stereo (2). Speech recognition typically uses mono.
chunk_duration_msinteger
Duration of each audio chunk sent to the provider in ms. Smaller = more responsive, larger = less chatty.
language_hintstring
ISO language code hint (e.g. "en", "sv"). Leave empty for auto-detect.
target_streaming_delay_msinteger
How long the provider waits before emitting partial transcripts. Lower = more responsive, higher = more accurate. 240ms is "fast", 2400ms is "slow/accurate".
endpoint_silence_msinteger
How long of a silence marks the end of an utterance (for turn-taking in voice agents).
pricingobject
Cost reference (USD)

Capabilities

Defined for this element
  • Observe

Operations

  • activityGET
  • attachmentsGET
  • batch_statsGET
  • composePOST
  • contextGET
  • createPOST
  • deleteDELETE
  • disablePOST
  • enablePOST
  • export_bundleGET
  • getGET
  • import_bundlePOST
  • infoGET
  • intentionGET
  • promotePOST
  • readmeGET
  • readme_updatePOST
  • remove-modifierPOST
  • restorePOST
  • schemaGET
  • sourceGET
  • source_branchesGET
  • source_promotePOST
  • source_repairPOST
  • source_statusGET
  • source_validatePOST
  • statsGET
  • testPOST
  • transcribePOST
  • treeGET
  • updatePATCH
  • update_metaPATCH
  • versionGET

Ports

Inputs

  • requestrequest
  • inforequest
  • resultevent

Composition

Attaches
Referenced by

Validation rules

  • Ears model id required

Ears (ears)

Category: intelligence | Form: | Symbol: Er

A speech-to-text transcriber within an Intelligence Lab

An Ears is a specific STT model (e.g. Mistral Voxtral Mini Transcribe Realtime) configured with sample rate, language hint, and streaming delay. Agents with ears can hear what’s being said — in chat via the mic, or on phone calls via the SIP audio pipeline. Select ears by ID; the runtime finds the parent lab and handles the transcription.

Guide

A speech-to-text transcriber within an Intelligence Lab

What It Does

An Ears represents a single speech-to-text (STT) model within a Lab (provider) — for example Mistral Voxtral Mini Transcribe Realtime. It defines the transcriber’s identity (provider + model_id), the expected audio input format (sample rate, encoding, channels, chunk duration), an optional language hint, streaming behavior, and a cost reference. When something needs to turn audio into text, it references an ears — the runtime resolves which lab the ears belongs to and uses that lab’s connection details to transcribe incoming audio.

Ears are atoms with nested residence: they have no children and live inside a Lab element, which provides the API endpoint and credentials. An ears can also carry a per-element base_url that wins over the parent lab’s base_url — useful for routing one ears to an internal deployment (e.g. an in-cluster vllm-realtime server implementing the OpenAI Realtime API schema) while siblings keep the lab’s hosted-API URL.

Within the Lab, ears is the input/transcription half of the audio pipeline: ears turns speech into text, while its sibling mouth turns text into speech (synthesis). Both compose with brain (the LLM) so a lab can hear (ears), think (brain), and speak (mouth). Per the element hint, agents with ears can hear what’s being said — in chat via the mic, or on phone calls via the SIP audio pipeline. supports_streaming is true: the element advertises a streaming capability and endpoint-detection for turn-taking, with the real-time path handled by the audio gateway directly.

Element Definition

PropertyValue
Typeears
Categoryintelligence
Formatom
Residencenested
SymbolEr / #3B82F6 (icon hearing)
Activity typeresource
Streamingtrue
HandlerEarsHandler
Allowed visibilitycollaborator
Statesdraft (initial) → activeerror

Properties

FieldTypeDefaultDescription
providerstring (enum)mistralSTT provider dispatch key. mistral / openai / google / custom hit hosted APIs with the parent lab’s credentials; vllm-realtime (alias openai-realtime) routes through an in-cluster OpenAI Realtime API server via base_url
model_idstringModel identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602). Max 256 chars
display_namestringHuman label for this transcriber. Max 128 chars
base_urlstring (url)Optional per-element endpoint override. Wins over the parent lab’s base_url. Leave empty to inherit. Max 512 chars
credential_refstring (secret picker)Reference to a secret element with the provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset
sample_rateinteger (enum)16000Expected input sample rate in Hz. One of 8000 / 16000 / 22050 / 24000 / 44100 / 48000. 16kHz recommended for Voxtral
encodingstring (enum)pcm_s16leExpected input encoding. One of pcm_s16le / pcm_s16be / pcm_f32le / opus. pcm_s16le matches browser MediaRecorder and 46elks RTP
channelsinteger (enum)1Mono (1) or stereo (2). Speech recognition typically uses mono
chunk_duration_msinteger (slider)480Duration of each audio chunk sent to the provider in ms (20–1000). Smaller = more responsive, larger = less chatty
language_hintstringISO language code hint (e.g. en, sv). Leave empty for auto-detect. Max 8 chars
target_streaming_delay_msinteger (slider)480How long the provider waits before emitting partial transcripts (0–10000). Lower = more responsive, higher = more accurate
endpoint_silence_msinteger (slider)800How long a silence marks the end of an utterance for turn-taking (200–5000)
pricingobjectCost reference (USD): input_per_mtok (cost per million input audio tokens) and/or per_minute (cost per minute of audio)

Ports

DirectionPortSchemaRequiredDescription
InputrequestTranscribeRequestnoAudio to transcribe
OutputinfoEarsInfoyesEars metadata for the STT picker UI
OutputresultTranscribeResponse (event)noTranscription result

Capabilities

CapabilityDescription
speech-to-textTranscribe speech to text
streamingIncremental transcription over a live audio stream
endpoint-detectionDetect end of utterance via silence for turn-taking

Attaches / Modifiers

  • Accepts modifier: rate-limit

Error Codes

CodeClassRetryableDescription
EARS_UNAVAILABLEinternalyesEars’ parent lab is unreachable or credentials missing
EARS_CREDENTIAL_MISSINGauthnoProvider API key not set
EARS_AUDIO_UNSUPPORTEDvalidationnoInput audio format/sample rate not supported by the provider
EARS_TRANSCRIPTION_FAILEDinternalyesProvider returned an error during transcription

Operations

OperationMethod + PathAuthDescription
infoGET inforeadGet ears metadata
transcribePOST transcribeexecuteTranscribe an audio blob (batch)
testPOST testexecuteVerify the STT connection

info

GET info (auth read). Returns provider, model, and expected audio format — used by the STT picker UI. Output fields: provider, model_id, display_name, sample_rate, encoding, channels, language_hint, target_streaming_delay_ms.

transcribe

POST transcribe (auth execute). One-shot transcription of a complete audio buffer. For real-time use (mic capture, phone calls), prefer the streaming route handled by the audio gateway directly — this op is for stored recordings.

  • Input: audio_data_b64 (required, base64-encoded audio), content_type (optional MIME-type override), language (optional ISO language code hint).
  • Output: text, language, duration_ms, cost_au.

test

POST test (auth execute). Transcribes a short built-in test clip to verify the connection works. Output fields: success, latency_ms, text, error.

Quick Start

Creating via API

An ears is created inside its parent lab element:

POST /api/{circle}/{lab-element}/
Content-Type: application/json

{
  "element_type": "ears",
  "slug": "voxtral-ears",
  "name": "Voxtral Mini Transcribe",
  "spec": {
    "provider": "vllm-realtime",
    "model_id": "voxtral-mini-transcribe-realtime-2602",
    "display_name": "Voxtral Mini Transcribe",
    "base_url": "http://voxtral-2602.intelligence-production.svc.cluster.local:8007",
    "sample_rate": 16000,
    "encoding": "pcm_s16le",
    "channels": 1,
    "language_hint": "en"
  }
}

Transcribing a stored recording

POST /api/{circle}/{ears}/ops/transcribe
Content-Type: application/json

{
  "audio_data_b64": "<base64-encoded audio>",
  "content_type": "audio/wav",
  "language": "en"
}

Returns { "text": "...", "language": "en", "duration_ms": 1234, "cost_au": 5 }.

Verifying the connection

POST /api/{circle}/{ears}/ops/test

Transcribes a short built-in clip and returns success, latency_ms, text, and any error — useful for confirming credentials and connectivity before the ears goes active.

Common Mistakes

Missing model_id. Validation warns when spec.model_id is empty: an ears without a model_id can’t transcribe. Set it before the ears goes active.

Expecting transcribe to do real-time. The transcribe op is batch-only — one-shot over a complete audio buffer, for stored recordings. Real-time use (mic capture, phone calls) goes through the streaming route handled by the audio gateway directly, not this op.

Mismatched audio format. The provider must support the configured sample_rate/encoding/channels, or transcription fails with EARS_AUDIO_UNSUPPORTED. The defaults (16kHz, pcm_s16le, mono) match browser MediaRecorder and 46elks RTP; override the encoding per request with content_type when sending a different blob.

Assuming base_url is required for hosted providers. For mistral / openai / google / custom, the parent lab supplies the endpoint and credentials. Set base_url only to point one ears at an internal deployment (e.g. a vllm-realtime in-cluster server); leave it empty to inherit from the lab.

Forgetting the parent lab. An ears must live inside a lab element (nested residence) — the lab is what provides the connection details the runtime resolves to. Without a credential_ref, the provider key falls back to the platform MISTRAL_API_KEY (or provider-equivalent); if neither is set, calls fail with EARS_CREDENTIAL_MISSING.

Relationships

  • Attaches to: rate-limit

Capabilities

  • speech-to-text: Transcribe speech to text
  • streaming: Incremental transcription over a live audio stream
  • endpoint-detection: Detect end of utterance via silence for turn-taking

Properties

PropertyTypeDefaultDescription
providerstring"mistral"STT provider dispatch key. mistral / openai / google / custom hit their respective hosted APIs using the parent lab’s credentials. vllm-realtime (and the alias openai-realtime) route through an in-cluster server that implements the OpenAI Realtime API schema — use element-level base_url to point at it (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007).
model_idstringModel identifier sent to the provider API (e.g. voxtral-mini-transcribe-realtime-2602)
display_namestringHuman label for this transcriber
base_urlstringOptional per-element endpoint override. Wins over the parent lab’s base_url. Use for routing one ears to an internal deployment (e.g. http://voxtral-2602.intelligence-production.svc.cluster.local:8007) while keeping the lab’s hosted-API URL for siblings. Leave empty to inherit from the parent lab.
credential_refstringReference to secret element with provider API key. Optional — falls back to platform MISTRAL_API_KEY (or provider-equivalent) when unset.
sample_rateinteger16000Expected input sample rate in Hz. 16kHz recommended for Voxtral.
encodingstring"pcm_s16le"Expected input encoding. pcm_s16le matches browser MediaRecorder and 46elks RTP.
channelsinteger1Mono (1) or stereo (2). Speech recognition typically uses mono.
chunk_duration_msinteger480Duration of each audio chunk sent to the provider in ms. Smaller = more responsive, larger = less chatty.
language_hintstringISO language code hint (e.g. “en”, “sv”). Leave empty for auto-detect.
target_streaming_delay_msinteger480How long the provider waits before emitting partial transcripts. Lower = more responsive, higher = more accurate. 240ms is “fast”, 2400ms is “slow/accurate”.
endpoint_silence_msinteger800How long of a silence marks the end of an utterance (for turn-taking in voice agents).
pricingobjectCost reference (USD)

Operations

activity

Get /ops/activity | Auth: Read

Get activity events for this element

Scope depends on element capabilities: individual elements query by element_id, project-form elements with activity-scope-members include member activities, circle-level elements with activity-scope-all query the entire circle. Gracefully returns empty list if activities table is missing (old circles).

attachments

Get /ops/attachments | Auth: Read

List all modifiers and resources attached to this element

Returns both modifiers (policy enforcement) and resources (data injection) with is_modifier flag to distinguish. Items in the generated MODIFIER_TYPES list are modifiers; everything else is a resource. Includes cascade_policy and version pin info.

batch_stats

Get /ops/batch_stats | Auth: Read

Get per-element statistics for all children of this element

Returns per-child stats plus an aggregate. Most meaningful on compound or manifest form elements (repositories, circles, projects); atoms have no children so the result is an empty children array with a zeroed aggregate. Uses efficient GROUP BY SQL. Weighted averages for eval scores.

compose

Post /ops/compose | Auth: Execute

Batch add and remove modifiers on this element in a single call

Declarative composition: add modifiers by ref path (slug or path@version) and remove by attachment ID, all in one atomic call on the target element. Each ‘add’ entry resolves the source element, validates topology, attaches with optional priority and cascade policy. Each ‘remove’ entry deletes the attachment row. Returns a summary of what was added and removed. Example: compose({ add: [{ref: “my-prompt”}, {ref: “rate-limit/api@v2”, priority: 50}], remove: [{attachment_id: “uuid”}] })

context

Get /ops/context | Auth: Read

Get connected elements (graph traversal)

Graph traversal showing all connected elements with their relationship type (contains, contained_by, references, referenced_by, attaches, etc.). Use ?depth=N to control traversal depth (default 1) and ?types=actor,data to filter by element types.

create

Post /ops/create | Auth: Write

Create child element

POST to the parent path — element_type goes in the request body, NOT the URL. Both element_type and slug are required and must be non-empty. Name is derived from slug if omitted. Writes to both Git and PostgreSQL. All elements are stored flat under the circle — no intermediate library wrapper rows.

delete

Delete /ops/delete | Auth: Admin

Delete element (soft delete)

Soft delete — sets state to ‘deleted’ but retains the record. Cannot delete elements that have children (has_no_bond precondition) or active runs. Requires admin auth and confirmation.

disable

Post /ops/disable | Auth: Admin

Disable element (hides and prevents use)

Idempotent — safe to call on already-disabled elements. Optionally pass a reason string. Disabled elements cannot be invoked or executed. Inverse of enable.

enable

Post /ops/enable | Auth: Admin

Enable element (makes usable and visible)

Idempotent — safe to call on already-enabled elements. Transitions element to ready/enabled state. Cannot enable deleted elements. Inverse of disable.

export_bundle

Get /ops/export/bundle | Auth: Read

Export element as downloadable git bundle

On non-root-namespace elements, returns a binary git bundle. On root-namespace (circle) elements, dispatch hands off to the circle’s own export_bundle op, which returns a multi-element JSON envelope with one base64 bundle per child element — this is intentional, not an error.

get

Get /ops/get | Auth: Read

Get element details

Element is already resolved by the routing layer — this returns the cached element, not a fresh DB query. Use the path /api/{circle}/{slug} to address elements.

import_bundle

Post /ops/import/bundle | Auth: Write

Import git bundle into element

Accepts a base64-encoded git bundle in the JSON bundle_base64 field. Use overwrite=true to replace existing elements with same slug (default skips duplicates). Imported elements get new UUIDs. Returns counts of imported/skipped elements and any errors.

info

Get /ops/info | Auth: Read

Get ears metadata

Returns provider, model, and expected audio format — used by the STT picker UI.

intention

Get /ops/intention | Auth: Read

Get element intention with full inheritance chain

Returns three levels: direct (this element’s intention), inherited (from category and root), and resolved (final merged intention). Useful for understanding an element’s purpose in context of its hierarchy.

promote

Post /ops/promote | Auth: Admin

Promote element configuration to a target environment

Only for manifest-form elements (projects). Environments advance: dev → demo → live. dev→demo requires member+ role, demo→live requires admin. Freezes member versions at promotion time (creates snapshot). Persists environment config to spec.environments.

readme

Get /ops/readme | Auth: Read

Get element README.md content

Reads README.md from the element’s git repository. Returns empty content (not an error) if no README exists. Always returns markdown format.

readme_update

Post /ops/readme_update | Auth: Write

Update element README.md content

Creates or overwrites README.md in the element’s git repo. Commits to the draft branch. Content must be provided as a markdown string.

remove-modifier

Post /ops/remove-modifier | Auth: Execute

Remove an attached modifier from this element by attachment ID

Removes a modifier/resource attachment by its row ID. The ID comes from the attachments or context API. This is the reverse of attach — called on the target element, not the source.

restore

Post /ops/restore | Auth: Admin

Restore element to a specific version

Automatically snapshots the current state before restoring (creates a ‘Before restore to vN’ version entry). Writes restored spec to git as .triform/spec.yaml. Git failures warn but don’t fail the operation — DB state is authoritative. Cannot restore deleted elements.

schema

Get /ops/schema | Auth: Read

Get element input/output schema (MCP tools/list compatible)

Returns type-level port schemas from the TypeRegistry — not instance-specific overrides. Includes direction (input/output), required flag, and JSON schema per port. Useful for understanding what data an element accepts and produces.

source

Get /ops/source | Auth: Read

Get any file’s content from the element’s git repository

Reads an arbitrary file from the element’s CAS-backed git tree by its relative path. Same store as readme, just generalized. Path safety: rejects .. traversal, leading /, and null bytes. Use this to view main.py for action elements, asset files for SPAs, etc. Returns empty content (not an error) if the file doesn’t exist.

source_branches

Get /ops/source/branches | Auth: Read

List Source branches for this element

Returns the standard draft/demo/live Source branches, their current commits, and promotion relationships. Use GET /api/{element_path}/ops/source/branches.

source_promote

Post /ops/source/promote | Auth: Write

Promote Source branch forward

Promotes draft to demo or demo to live through the generated element op path. Direct Git pushes to demo/live are blocked by Source policy.

source_repair

Post /ops/source/repair | Auth: Write

Inspect or repair the element Source index

Runs Source repair through the element operation path. Defaults to dry_run=true; set dry_run=false only after reviewing a dry-run report.

source_status

Get /ops/source/status | Auth: Read

Get Source control status for this element

Returns the branch-aware clone URL, checkout commands, current draft commit, child source-link count, portable export summary, Source health, warnings, and auth hints for the addressed element. Use the element-first path: GET /api/{element_path}/ops/source/status.

source_validate

Post /ops/source/validate | Auth: Read

Validate Source branch contents

Validates a Source branch before accepting local Git workflow changes or promotion. Defaults to branch=draft and rejects runtime data, generated output, secret material, and unreadable CAS refs.

stats

Get /ops/stats | Auth: Read

Get aggregate statistics for this element

Health status is computed: error if errors_per_day > 5 or success_rate < 0.8, warning if errors_per_day > 0 or success_rate < 0.95. Firing alerts escalate health to error/warning. Default period is ‘day’. Returns runs_per_day, success_rate, avg_duration_ms, and more.

test

Post /ops/test | Auth: Execute

Verify the STT connection

Transcribes a short built-in test clip to verify the connection works.

transcribe

Post /ops/transcribe | Auth: Execute

Transcribe an audio blob (batch)

One-shot transcription of a complete audio buffer. For real-time use (mic capture, phone calls), prefer the streaming route handled by the audio gateway directly — this op is for stored recordings.

tree

Get /ops/tree | Auth: Read

Get the element’s position in the graph — ancestors, children, references, and subtree statistics

Uses per-circle ElementGraph cache for O(1) lookups. Returns ancestors (containment chain), children (direct), members (references), referenced_by (reverse refs), attachments, and subtree stats. Default depth is 3, max is 10. Pass ?include_metadata=true for name/state on each node.

update

Patch /ops/update | Auth: Write

Update element

Partial update — send only the fields you want to change. spec, name, and intention are all independently optional. spec MUST be a JSON object when present; deep-merged into the existing spec by default. Empty {"spec":{}} preserves existing spec content but still records a new version (no-op for content, not for version state). To clear/replace the entire spec wholesale send {"spec":{...},"deep":false}. List-typed spec fields use replace semantics (the patch list replaces the existing list, no array merging). Coordinates Git + DB writes. Slug cannot be changed after creation.

update_meta

Patch /ops/update_meta | Auth: Write

Update element metadata (lightweight merge — does NOT bump version or snapshot spec)

Shallow JSONB merge into element.meta. Top-level keys in the provided value replace existing meta values; other keys are preserved. Used for UI metadata like canvas positions, panel state, viewer preferences. Wire-shape op_name is update_meta (distinct from update) so SSE subscribers + the cache auto-invalidator can distinguish lightweight metadata changes from spec edits without inspecting the payload. The MutatingElementStore wrapper stamps this op_name on the lifecycle event emitted by update_element_meta storage calls.

version

Get /ops/version | Auth: Read

Get current version or full history

Returns current version by default. Pass ?history=true for full version history (up to ?limit=N, default 50). Versions are backed by the element_versions table. Every spec update creates a new version entry.

Error Codes

CodeClassRetryableDescription
EARS_UNAVAILABLEinternalyesEars’ parent lab is unreachable or credentials missing
EARS_CREDENTIAL_MISSINGauthnoProvider API key not set
EARS_AUDIO_UNSUPPORTEDvalidationnoInput audio format/sample rate not supported by the provider
EARS_TRANSCRIPTION_FAILEDinternalyesProvider returned an error during transcription

Observability

Defined for this element

Metrics

  • ears_transcribe_total
  • ears_transcribe_latency_ms
  • ears_stream_active

Pricing / cost

Inherited from intelligence

Operation costs

  • invoke: 10000 micro-AU

Set it up

Namestring
A label for this transcriber
Providerstring
STT provider
Modelstring
Model ID (e.g. voxtral-mini-transcribe-realtime-2602)
Language hintstring
ISO code (e.g. en, sv). Leave blank for auto-detect.
Streaming delaystring
Provider wait time before emitting partials. 240ms = fast, 2400ms = accurate.
Sample ratestring
Input sample rate (Hz)