Filter Words
A response-phase guard that scans what your actors say on the way out — redacting forbidden terms and anonymizing sensitive ones — so an agent's tool output is scrubbed before it ever reaches a caller, without ever failing the request.
Working with it
Selecting a Filter Words reveals its settings in the properties panel; it has no dedicated full-screen workbench.
How it appears
The same element type rendered as a definition, a circle instance, and a live workspace card.
When to use / not
When to use
- Keeping profanity, PII, or regulated data patterns (card numbers, ID formats) out of agent responses — redaction happens in-place, the response still succeeds.
- Blocking competitor names, internal codenames, or restricted terminology from anything an actor returns to a caller.
- Adding a last-line defence alongside auth-policy and prompt — scrubbing the output even when upstream controls let something through.
When not to use
- Validating or rejecting request input — filter-words only runs on the response phase; use validation for input schema checking.
- Hard-failing a request when something matches — filter-words redacts and lets the response through; reach for auth-policy when you need to abort.
- Filtering anything other than actor output — it applies to actors only, not arbitrary elements.
Topology
Attaches to another element as a modifier, shaping that element's behaviour rather than running on its own.
Properties
forbiddenarray- Forbidden words with block or redact actions
anonymizedarray- Words to anonymize with consistent placeholders ([ANON-1], [ANON-2], etc.)
case_sensitiveboolean- Whether word matching is case-sensitive
wordsarray- Words or phrases to filter (used with action and replacement)
actionstring- Action when word is found: redact (replace), reject (block), or warn (log and continue)
replacementstring- Replacement string when masking words
Capabilities
Inherited from modifiers
- Evaluate
- Observe
Operations
- attachPOST
- deleteDELETE
- detachPOST
- disablePOST
- enablePOST
- evaluatePOST
- getGET
- get_attached_modifiersGET
- intentionGET
- list_attachmentsGET
- readme_updatePOST
- schemaGET
- statusGET
- updatePATCH
Ports
Inputs
- forbiddenconfig
- anonymizedconfig
- case_sensitiveconfig
Composition
Errors / when it fails
- Each forbidden word must have action 'block' or 'redact'
- Fails unless:
all(rule.action in ['block', 'redact'] for rule in forbidden if forbidden)
Validation rules
- No filter rules configured — modifier has no effect
Filter Words (filter-words)
Category: modifiers | Form: | Symbol: Fw
Filter forbidden and anonymize sensitive words in agent tool outputs
Filters forbidden words and anonymizes sensitive terms in agent tool outputs. Phase: response (processes output after execution). Evaluation order 50. Applies to actors only. Cascade behavior: union — inherited and local filter lists are merged (all forbidden words from all levels apply). Fail action: redact (replaces matched words rather than rejecting the entire response). Spec defines forbidden (blocked term rules), anonymized (words replaced with placeholders), and case_sensitive flag. Use filter-words for content safety in agent outputs; use validation for input schema checking. Common mistake: expecting filter-words to work on request input — it only runs on response output (phase: response).
Guide
Overview
A Filter Words modifier defines lists of prohibited or sensitive words and phrases that are scanned in actor responses before they are returned to callers. Matched content is redacted or replaced rather than causing a hard failure.
Why Filter Words Exists
- Content Safety: Prevent sensitive or offensive terms from appearing in responses
- Compliance: Redact regulated data patterns (e.g., PII, profanity) before delivery
- Brand Protection: Block competitor names or restricted terminology
- Layered Defense: Works alongside auth-policy and protection for defense in depth
Configuration
Basic Example
element_type: filter-words
slug: content-filter
name: Content Filter
spec:
# Exact words or phrases to redact
words:
- "internal-codename"
- "confidential"
# Regex patterns to redact
patterns:
- "\\b\\d{4}-\\d{4}-\\d{4}-\\d{4}\\b" # Credit card numbers
- "\\b[A-Z]{2}\\d{6}\\b" # Passport-style IDs
# Replacement string (defaults to [REDACTED])
replacement: "[REDACTED]"
# Case-insensitive matching (default: true)
case_sensitive: false
Category Presets
spec:
# Enable built-in category presets
presets:
- pii # Names, emails, phone numbers, SSNs
- credit_cards # Card number patterns
- profanity # Common profanity list
replacement: "***"
Allow List
spec:
words:
- "password"
# Never redact these even if they match a pattern
allowlist:
- "reset-password-guide"
Cascade Behavior
Filter word lists from all attached scopes are unioned — every word and pattern from every scope applies. There is no way for a child element to remove a word added by a parent scope.
Response Transformation
When matches are found, the response body is rewritten with replacements before delivery. The HTTP status code is not changed — the response succeeds but with redacted content. This is different from middleware modifiers that can abort a request.
Files
README.md- Documentation.triform/definition.yaml- Element type definition.triform/properties.yaml- Configurable properties.triform/contract.yaml- Bonds and capabilities.triform/ops.yaml- Operations
Runtime Behavior
| Property | Value |
|---|---|
| Cascade | union — all word lists from all scopes are combined |
| Eval Order | 50 |
| Phase | response |
| Fail Action | redact matched content (no HTTP error — response succeeds with replacements) |
| Applies To | actors |
Relationships
- Attaches to: circle
Capabilities
- word-blocking: Block tool output containing forbidden words
- word-redaction: Redact forbidden words from tool output
- anonymization: Replace sensitive words with consistent anonymous placeholders
Properties
| Property | Type | Default | Description |
|---|---|---|---|
forbidden | array | [] | Forbidden words with block or redact actions |
anonymized | array | [] | Words to anonymize with consistent placeholders ([ANON-1], [ANON-2], etc.) |
case_sensitive | boolean | false | Whether word matching is case-sensitive |
words | array | [] | Words or phrases to filter (used with action and replacement) |
action | string | "redact" | Action when word is found: redact (replace), reject (block), or warn (log and continue) |
replacement | string | "[FILTERED]" | Replacement string when masking words |
Operations
attach
Post /ops/attach | Auth: Read
Attach this modifier to a target element
Attaches this modifier to a target element. The target_id must be a UUID of an existing element that supports this modifier type (check applies_to in definition.yaml). Priority controls evaluation order when multiple modifiers of the same type are attached — lower priority runs first. The attachment is stored in element_modifiers table. Cascade resolution runs at bond-time to merge this modifier into the target’s resolved config. Common mistake: attaching to an incompatible element type — check topology rules first.
delete
Delete /ops/delete | Auth: Admin
Delete element (soft delete)
Soft delete — sets state to ‘deleted’ but retains the record. Cannot delete elements that have children (has_no_bond precondition) or active runs. Requires admin auth and confirmation.
detach
Post /ops/detach | Auth: Read
Detach this modifier from a target element
Removes this modifier from a target element. Requires the target_id. Pervasive modifiers (audit, policy) can only be detached at the level they were originally attached — inherited pervasive modifiers cannot be detached by child elements. After detach, cascade resolution re-runs to remove this modifier’s effect from the resolved config.
disable
Post /ops/disable | Auth: Admin
Disable element (hides and prevents use)
Idempotent — safe to call on already-disabled elements. Optionally pass a reason string. Disabled elements cannot be invoked or executed. Inverse of enable.
enable
Post /ops/enable | Auth: Admin
Enable element (makes usable and visible)
Idempotent — safe to call on already-enabled elements. Transitions element to ready/enabled state. Cannot enable deleted elements. Inverse of disable.
evaluate
Post /ops/evaluate | Auth: Read
Evaluate text against the filter — return matches, blocked status, and sanitized text
Pass {text: “…”} to check the text against the configured forbidden and anonymized word lists. Returns evaluation (“pass” or “fail” — fail when any forbidden word with action “block”/“reject” matched), the matched words, and a sanitized version of the text with forbidden words redacted and anonymized terms replaced by [ANON-N] placeholders. Used by automation pipelines to gate or scrub agent tool outputs before they reach the LLM.
get
Get /ops/get | Auth: Read
Get element details
Element is already resolved by the routing layer — this returns the cached element, not a fresh DB query. Use the path /api/{circle}/{slug} to address elements.
get_attached_modifiers
Get /ops/attached/{target_id} | Auth: Read
Get all modifiers attached to a target element
Lists all modifiers attached to a specific target element, including modifier_id, type, subcategory, and priority. Useful for debugging cascade resolution or understanding which policies apply to an element before invoking it.
intention
Get /ops/intention | Auth: Read
Get element intention with full inheritance chain
Returns three levels: direct (this element’s intention), inherited (from category and root), and resolved (final merged intention). Useful for understanding an element’s purpose in context of its hierarchy.
list_attachments
Get /ops/targets | Auth: Read
List all elements this modifier is attached to
Returns all target elements where this modifier is currently applied. Shows target_id, target_type, priority, and cascade_policy.
readme_update
Post /ops/readme_update | Auth: Write
Update element README.md content
Creates or overwrites README.md in the element’s git repo. Commits to the draft branch. Content must be provided as a markdown string.
schema
Get /ops/schema | Auth: Read
Get element input/output schema (MCP tools/list compatible)
Returns type-level port schemas from the TypeRegistry — not instance-specific overrides. Includes direction (input/output), required flag, and JSON schema per port. Useful for understanding what data an element accepts and produces.
status
Get /ops/status | Auth: Read
Get current filter words configuration summary
Returns a summary of the filter configuration: forbidden_count (blocked words), anonymized_count (words replaced with placeholders), and case_sensitive flag. Use to verify the filter is configured correctly before attaching to an actor.
update
Patch /ops/update | Auth: Write
Update element
Partial update — send only the fields you want to change.
spec,name, andintentionare all independently optional.specMUST be a JSON object when present; deep-merged into the existing spec by default. Empty{"spec":{}}preserves existing spec content but still records a new version (no-op for content, not for version state). To clear/replace the entire spec wholesale send{"spec":{...},"deep":false}. List-typed spec fields use replace semantics (the patch list replaces the existing list, no array merging). Coordinates Git + DB writes. Slug cannot be changed after creation.
Error Codes
| Code | Class | Retryable | Description |
|---|---|---|---|
FILTER_WORDS_BLOCKED | limit | no | Tool output blocked due to forbidden word |
FILTER_WORDS_CONFIG_INVALID | validation | no | Invalid filter words configuration |
Lifecycle / runtime
Inherited from modifiers
Execution model: async
Observability
Defined for this element
Metrics
- evaluation_count
- block_count
- redaction_count
Events
- filter-words.evaluated
- filter-words.blocked
- filter-words.redacted
Pricing / cost
Platform default
Operation costs
- create: free
- update: free
- delete: free
- get: free
- list: free
- invoke: 10000 micro-AU
- tool_use: free
Set it up
- Forbidden Wordsstring
- Words to block or redact. Each entry is an object: {word: 'string', action: 'block'|'redact'}. Simple strings are auto-wrapped as {word: value, action: 'redact'}.
- Anonymizestring
- Words to replace with consistent placeholders ([ANON-1], [ANON-2], etc.)
- Case Sensitivestring
- Whether word matching is case-sensitive (default: false)
- Wordsstring
- List of words for word-based operations
- Actionstring
- Action to take: redact (replace), reject (block), or warn (log and continue)
- Replacementstring
- Replacement string for redacted words