Download all docs
modifiers

Filter Words

A response-phase guard that scans what your actors say on the way out — redacting forbidden terms and anonymizing sensitive ones — so an agent's tool output is scrubbed before it ever reaches a caller, without ever failing the request.

Working with it

Selecting a Filter Words reveals its settings in the properties panel; it has no dedicated full-screen workbench.

How it appears

The same element type rendered as a definition, a circle instance, and a live workspace card.

Fw
type

Filter Words

Filter forbidden and anonymize sensitive words in agent tool outputs

modifiersmodifierdefinition

When to use / not

When to use

  • Keeping profanity, PII, or regulated data patterns (card numbers, ID formats) out of agent responses — redaction happens in-place, the response still succeeds.
  • Blocking competitor names, internal codenames, or restricted terminology from anything an actor returns to a caller.
  • Adding a last-line defence alongside auth-policy and prompt — scrubbing the output even when upstream controls let something through.

When not to use

  • Validating or rejecting request input — filter-words only runs on the response phase; use validation for input schema checking.
  • Hard-failing a request when something matches — filter-words redacts and lets the response through; reach for auth-policy when you need to abort.
  • Filtering anything other than actor output — it applies to actors only, not arbitrary elements.

Topology

Attaches to another element as a modifier, shaping that element's behaviour rather than running on its own.

Properties

forbiddenarray
Forbidden words with block or redact actions
anonymizedarray
Words to anonymize with consistent placeholders ([ANON-1], [ANON-2], etc.)
case_sensitiveboolean
Whether word matching is case-sensitive
wordsarray
Words or phrases to filter (used with action and replacement)
actionstring
Action when word is found: redact (replace), reject (block), or warn (log and continue)
replacementstring
Replacement string when masking words

Capabilities

Inherited from modifiers
  • Evaluate
  • Observe

Operations

  • attachPOST
  • deleteDELETE
  • detachPOST
  • disablePOST
  • enablePOST
  • evaluatePOST
  • getGET
  • get_attached_modifiersGET
  • intentionGET
  • list_attachmentsGET
  • readme_updatePOST
  • schemaGET
  • statusGET
  • updatePATCH

Ports

Inputs

  • forbiddenconfig
  • anonymizedconfig
  • case_sensitiveconfig

Composition

Attaches
Referenced by

Errors / when it fails

Each forbidden word must have action 'block' or 'redact'
Fails unless: all(rule.action in ['block', 'redact'] for rule in forbidden if forbidden)

Validation rules

  • No filter rules configured — modifier has no effect

Filter Words (filter-words)

Category: modifiers | Form: | Symbol: Fw

Filter forbidden and anonymize sensitive words in agent tool outputs

Filters forbidden words and anonymizes sensitive terms in agent tool outputs. Phase: response (processes output after execution). Evaluation order 50. Applies to actors only. Cascade behavior: union — inherited and local filter lists are merged (all forbidden words from all levels apply). Fail action: redact (replaces matched words rather than rejecting the entire response). Spec defines forbidden (blocked term rules), anonymized (words replaced with placeholders), and case_sensitive flag. Use filter-words for content safety in agent outputs; use validation for input schema checking. Common mistake: expecting filter-words to work on request input — it only runs on response output (phase: response).

Guide

Overview

A Filter Words modifier defines lists of prohibited or sensitive words and phrases that are scanned in actor responses before they are returned to callers. Matched content is redacted or replaced rather than causing a hard failure.

Why Filter Words Exists

  • Content Safety: Prevent sensitive or offensive terms from appearing in responses
  • Compliance: Redact regulated data patterns (e.g., PII, profanity) before delivery
  • Brand Protection: Block competitor names or restricted terminology
  • Layered Defense: Works alongside auth-policy and protection for defense in depth

Configuration

Basic Example

element_type: filter-words
slug: content-filter
name: Content Filter

spec:
  # Exact words or phrases to redact
  words:
    - "internal-codename"
    - "confidential"

  # Regex patterns to redact
  patterns:
    - "\\b\\d{4}-\\d{4}-\\d{4}-\\d{4}\\b"  # Credit card numbers
    - "\\b[A-Z]{2}\\d{6}\\b"                  # Passport-style IDs

  # Replacement string (defaults to [REDACTED])
  replacement: "[REDACTED]"

  # Case-insensitive matching (default: true)
  case_sensitive: false

Category Presets

spec:
  # Enable built-in category presets
  presets:
    - pii           # Names, emails, phone numbers, SSNs
    - credit_cards  # Card number patterns
    - profanity     # Common profanity list

  replacement: "***"

Allow List

spec:
  words:
    - "password"

  # Never redact these even if they match a pattern
  allowlist:
    - "reset-password-guide"

Cascade Behavior

Filter word lists from all attached scopes are unioned — every word and pattern from every scope applies. There is no way for a child element to remove a word added by a parent scope.

Response Transformation

When matches are found, the response body is rewritten with replacements before delivery. The HTTP status code is not changed — the response succeeds but with redacted content. This is different from middleware modifiers that can abort a request.

Files

  • README.md - Documentation
  • .triform/definition.yaml - Element type definition
  • .triform/properties.yaml - Configurable properties
  • .triform/contract.yaml - Bonds and capabilities
  • .triform/ops.yaml - Operations

Runtime Behavior

PropertyValue
Cascadeunion — all word lists from all scopes are combined
Eval Order50
Phaseresponse
Fail Actionredact matched content (no HTTP error — response succeeds with replacements)
Applies Toactors

Relationships

  • Attaches to: circle

Capabilities

  • word-blocking: Block tool output containing forbidden words
  • word-redaction: Redact forbidden words from tool output
  • anonymization: Replace sensitive words with consistent anonymous placeholders

Properties

PropertyTypeDefaultDescription
forbiddenarray[]Forbidden words with block or redact actions
anonymizedarray[]Words to anonymize with consistent placeholders ([ANON-1], [ANON-2], etc.)
case_sensitivebooleanfalseWhether word matching is case-sensitive
wordsarray[]Words or phrases to filter (used with action and replacement)
actionstring"redact"Action when word is found: redact (replace), reject (block), or warn (log and continue)
replacementstring"[FILTERED]"Replacement string when masking words

Operations

attach

Post /ops/attach | Auth: Read

Attach this modifier to a target element

Attaches this modifier to a target element. The target_id must be a UUID of an existing element that supports this modifier type (check applies_to in definition.yaml). Priority controls evaluation order when multiple modifiers of the same type are attached — lower priority runs first. The attachment is stored in element_modifiers table. Cascade resolution runs at bond-time to merge this modifier into the target’s resolved config. Common mistake: attaching to an incompatible element type — check topology rules first.

delete

Delete /ops/delete | Auth: Admin

Delete element (soft delete)

Soft delete — sets state to ‘deleted’ but retains the record. Cannot delete elements that have children (has_no_bond precondition) or active runs. Requires admin auth and confirmation.

detach

Post /ops/detach | Auth: Read

Detach this modifier from a target element

Removes this modifier from a target element. Requires the target_id. Pervasive modifiers (audit, policy) can only be detached at the level they were originally attached — inherited pervasive modifiers cannot be detached by child elements. After detach, cascade resolution re-runs to remove this modifier’s effect from the resolved config.

disable

Post /ops/disable | Auth: Admin

Disable element (hides and prevents use)

Idempotent — safe to call on already-disabled elements. Optionally pass a reason string. Disabled elements cannot be invoked or executed. Inverse of enable.

enable

Post /ops/enable | Auth: Admin

Enable element (makes usable and visible)

Idempotent — safe to call on already-enabled elements. Transitions element to ready/enabled state. Cannot enable deleted elements. Inverse of disable.

evaluate

Post /ops/evaluate | Auth: Read

Evaluate text against the filter — return matches, blocked status, and sanitized text

Pass {text: “…”} to check the text against the configured forbidden and anonymized word lists. Returns evaluation (“pass” or “fail” — fail when any forbidden word with action “block”/“reject” matched), the matched words, and a sanitized version of the text with forbidden words redacted and anonymized terms replaced by [ANON-N] placeholders. Used by automation pipelines to gate or scrub agent tool outputs before they reach the LLM.

get

Get /ops/get | Auth: Read

Get element details

Element is already resolved by the routing layer — this returns the cached element, not a fresh DB query. Use the path /api/{circle}/{slug} to address elements.

get_attached_modifiers

Get /ops/attached/{target_id} | Auth: Read

Get all modifiers attached to a target element

Lists all modifiers attached to a specific target element, including modifier_id, type, subcategory, and priority. Useful for debugging cascade resolution or understanding which policies apply to an element before invoking it.

intention

Get /ops/intention | Auth: Read

Get element intention with full inheritance chain

Returns three levels: direct (this element’s intention), inherited (from category and root), and resolved (final merged intention). Useful for understanding an element’s purpose in context of its hierarchy.

list_attachments

Get /ops/targets | Auth: Read

List all elements this modifier is attached to

Returns all target elements where this modifier is currently applied. Shows target_id, target_type, priority, and cascade_policy.

readme_update

Post /ops/readme_update | Auth: Write

Update element README.md content

Creates or overwrites README.md in the element’s git repo. Commits to the draft branch. Content must be provided as a markdown string.

schema

Get /ops/schema | Auth: Read

Get element input/output schema (MCP tools/list compatible)

Returns type-level port schemas from the TypeRegistry — not instance-specific overrides. Includes direction (input/output), required flag, and JSON schema per port. Useful for understanding what data an element accepts and produces.

status

Get /ops/status | Auth: Read

Get current filter words configuration summary

Returns a summary of the filter configuration: forbidden_count (blocked words), anonymized_count (words replaced with placeholders), and case_sensitive flag. Use to verify the filter is configured correctly before attaching to an actor.

update

Patch /ops/update | Auth: Write

Update element

Partial update — send only the fields you want to change. spec, name, and intention are all independently optional. spec MUST be a JSON object when present; deep-merged into the existing spec by default. Empty {"spec":{}} preserves existing spec content but still records a new version (no-op for content, not for version state). To clear/replace the entire spec wholesale send {"spec":{...},"deep":false}. List-typed spec fields use replace semantics (the patch list replaces the existing list, no array merging). Coordinates Git + DB writes. Slug cannot be changed after creation.

Error Codes

CodeClassRetryableDescription
FILTER_WORDS_BLOCKEDlimitnoTool output blocked due to forbidden word
FILTER_WORDS_CONFIG_INVALIDvalidationnoInvalid filter words configuration

Lifecycle / runtime

Inherited from modifiers

Execution model: async

Observability

Defined for this element

Metrics

  • evaluation_count
  • block_count
  • redaction_count

Events

  • filter-words.evaluated
  • filter-words.blocked
  • filter-words.redacted

Pricing / cost

Platform default

Operation costs

  • create: free
  • update: free
  • delete: free
  • get: free
  • list: free
  • invoke: 10000 micro-AU
  • tool_use: free

Set it up

Forbidden Wordsstring
Words to block or redact. Each entry is an object: {word: 'string', action: 'block'|'redact'}. Simple strings are auto-wrapped as {word: value, action: 'redact'}.
Anonymizestring
Words to replace with consistent placeholders ([ANON-1], [ANON-2], etc.)
Case Sensitivestring
Whether word matching is case-sensitive (default: false)
Wordsstring
List of words for word-based operations
Actionstring
Action to take: redact (replace), reject (block), or warn (log and continue)
Replacementstring
Replacement string for redacted words