Rate Limit
A throttling modifier you attach to any element to cap how fast it can be called — the cascade middleware enforces the limit on every incoming request, returns HTTP 429 when a key runs over, and resolves the strictest limit when inherited and local rules conflict.
Working with it
Selecting a Rate Limit reveals its settings in the properties panel; it has no dedicated full-screen workbench.
How it appears
The same element type rendered as a definition, a circle instance, and a live workspace card.
When to use / not
When to use
- Protecting an element from abuse or traffic spikes — cap requests-per-window so a runaway caller can't overwhelm it.
- Guarding expensive operations (LLM calls, file uploads, image generation) where each request carries real cost.
- Enforcing fair-usage or tier-based quotas per user, org, or API key across everything attached below it.
- Showing callers their remaining quota before they hit a 429, via the live `status` operation keyed on user/IP.
When not to use
- Deciding *who* may call an element at all — that is authentication/authorization; use auth-policy, which evaluates first (order 50) before rate-limit (order 100).
- Metering spend or enforcing a wallet balance — rate-limit caps request frequency, not cost; billing is handled by the AU ledger, not this modifier.
- Smoothing or retrying work that overflows — rate-limit rejects with 429; queue the overflow with a queue element instead of expecting the limiter to buffer it.
Topology
Attaches to another element as a modifier, shaping that element's behaviour rather than running on its own.
Properties
limitinteger- Simple format: max requests per window (use with window_seconds)
window_secondsinteger- Simple format: time window in seconds (use with limit)
limitsarray- Rate limit rules (array format). Prefer simple format (limit + window_seconds) for single rules.
keyobject- Rate limit key configuration
scopeobject- Rate limit scope
responseobject- Response when rate limited
exemptionsarray- Exempted identities
Capabilities
Defined for this element
- Evaluate
- Storage
- Observe
Operations
- attachPOST
- deleteDELETE
- detachPOST
- disablePOST
- enablePOST
- evaluatePOST
- getGET
- get_attached_modifiersGET
- intentionGET
- list_attachmentsGET
- readme_updatePOST
- resetPOST
- schemaGET
- statusGET
- updatePATCH
Ports
Inputs
- limitsconfig
- burstconfig
- responseconfig
Errors / when it fails
- Rate limit by header requires key.header_name
- At least one rate limit rule must be defined
- Fails unless:
len(limits) > 0
Validation rules
- IP-based rate limiting may affect users behind NAT/proxies
- No exemptions configured - all traffic is rate limited
Rate Limit (rate-limit)
Category: modifiers | Form: | Symbol: Rl
Control request rates and protect against abuse
Limits request throughput on attached elements. Cascade strategy: restrictive — when inherited and local rate-limits conflict, the lower (more restrictive) requests_per_minute wins. Evaluation order 100 (after auth-policy at 50). Applies to all element types. Fails with HTTP 429. Spec accepts two formats: array format
{limits: [{requests: N, window_seconds: N, burst: N}]}or simple format{limit: N, window_seconds: N}. Both are normalized to requests_per_minute during cascade resolution. Useresetoperation to clear counters. Usestatusto check remaining quota for a key. Note: directevaluatecalls are stateless preview checks — actual rate limiting enforcement happens at the cascade middleware layer on incoming requests to attached elements. Common mistake: setting window_seconds to 0 (causes division, defaults to multiplying limit by 60).
Guide
Overview
A Rate Limit resource defines request throttling rules to protect services from abuse, ensure fair usage, and enforce billing tier limits. Rate limits can be applied globally, per-endpoint, per-user, or per-organization.
Why Rate Limits Exist
- Protection: Prevent DDoS and abuse
- Fair Usage: Ensure equitable resource distribution
- Billing Enforcement: Enforce tier-based usage limits
- Cost Control: Limit expensive operations (LLM calls, etc.)
- Stability: Prevent cascading failures from traffic spikes
Directory Structure
api-limits/
├── README.md
├── .triform/
│ ├── triform.yaml # kind: config/rate-limit
│ ├── contract.yaml
│ ├── spec.yaml
│ └── schema.json
└── examples/
└── tiered-limits.yaml
Creating a Rate Limit
$ triform create config/rate-limit api-limits
Configuration
Basic Rate Limit
kind: config/rate-limit
slug: api-limits
spec:
# Default limit for all endpoints
default:
requests: 1000
window: "1m"
burst: 50
Per-Endpoint Limits
spec:
default:
requests: 1000
window: "1m"
endpoints:
# Expensive LLM endpoints
"/api/v1/llm/*":
requests: 100
window: "1m"
cost_weight: 10 # Counts as 10 requests for quota
# File uploads
"/api/v1/files/upload":
requests: 50
window: "1h"
max_size_mb: 100
# Public read endpoints (more generous)
"/api/v1/public/*":
requests: 10000
window: "1m"
Tier-Based Limits
spec:
default:
requests: 100
window: "1m"
tiers:
free:
requests: 100
window: "1m"
daily_cap: 1000
pro:
requests: 1000
window: "1m"
daily_cap: 50000
enterprise:
requests: 10000
window: "1m"
daily_cap: null # Unlimited
User and Org Limits
spec:
# Per-user limits
per_user:
requests: 100
window: "1m"
# Per-organization limits (shared across all org members)
per_org:
requests: 5000
window: "1m"
# Per-API-key limits
per_api_key:
requests: 1000
window: "1m"
Burst Handling
spec:
default:
requests: 100
window: "1m"
# Allow short bursts above limit
burst:
size: 50 # Extra requests allowed
refill_rate: 10 # Tokens/second added back
# Token bucket algorithm
algorithm: "token-bucket"
Cost-Weighted Limits
Some operations should count more toward the limit:
spec:
default:
requests: 1000
window: "1m"
cost_weights:
# LLM completions cost more
"/api/v1/llm/complete":
weight: 10
# Also consider token count
dynamic_weight:
header: "X-Token-Count"
multiplier: 0.001 # 1000 tokens = 1 extra unit
# Image generation is expensive
"/api/v1/images/generate":
weight: 50
Response Headers
Rate limit info included in responses:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1704631200
X-RateLimit-Policy: "1000;w=60"
Retry-After: 42
Exceeded Behavior
spec:
default:
requests: 100
window: "1m"
on_exceed:
# HTTP status code
status: 429
# Response body
body:
error: "rate_limit_exceeded"
message: "Too many requests. Please retry after {retry_after} seconds."
retry_after: "{seconds_until_reset}"
# Optional: queue instead of reject
queue:
enabled: false
max_wait_seconds: 30
Rate Limit Strategies
Sliding Window (Default)
More accurate but slightly more expensive:
spec:
algorithm: "sliding-window"
# Considers requests in rolling time window
Fixed Window
Simpler, resets at window boundary:
spec:
algorithm: "fixed-window"
# Resets every minute on the minute
Token Bucket
Best for bursty traffic:
spec:
algorithm: "token-bucket"
bucket:
capacity: 100
refill_rate: 10 # tokens/second
Leaky Bucket
Smooth output rate:
spec:
algorithm: "leaky-bucket"
bucket:
capacity: 100
leak_rate: 10 # requests/second processed
SDK Usage
Checking Limits Programmatically
from triform import rate_limits
# Check before making request
limit_info = rate_limits.check("api-limits", user=current_user)
if limit_info.remaining > 0:
make_request()
else:
wait(limit_info.retry_after)
Custom Keys
# Rate limit by custom key (e.g., IP address)
rate_limits.check("api-limits", key=f"ip:{request.ip}")
# Rate limit by resource
rate_limits.check("api-limits", key=f"resource:{resource_id}")
Monitoring
Rate limits emit metrics:
rate_limit_requests_total- Total requests checkedrate_limit_exceeded_total- Requests that exceeded limitrate_limit_remaining- Current remaining quotarate_limit_queue_size- Requests waiting in queue
Runtime Behavior
| Property | Value |
|---|---|
| Cascade | restrictive — strictest limit wins across scopes |
| Eval Order | 100 |
| Phase | request |
| Fail Action | HTTP 429 (Too Many Requests) with Retry-After header |
| Applies To | actors, frontend |
Files in This Resource
README.md- Documentation.triform/triform.yaml- Metadata.triform/contract.yaml- Capabilities.triform/spec.yaml- Configuration
Capabilities
- sliding-window: Sliding window rate limiting
- multi-key: Rate limit by IP, user, API key
- burst: Burst allowance
- headers: Rate limit headers (X-RateLimit-*)
Properties
| Property | Type | Default | Description |
|---|---|---|---|
limit | integer | — | Simple format: max requests per window (use with window_seconds) |
window_seconds | integer | — | Simple format: time window in seconds (use with limit) |
limits | array | [{"burst":20,"requests":100,"window_seconds":60}] | Rate limit rules (array format). Prefer simple format (limit + window_seconds) for single rules. |
key | object | — | Rate limit key configuration |
scope | object | — | Rate limit scope |
response | object | — | Response when rate limited |
exemptions | array | [] | Exempted identities |
Operations
attach
Post /ops/attach | Auth: Read
Attach this modifier to a target element
Attaches this modifier to a target element. The target_id must be a UUID of an existing element that supports this modifier type (check applies_to in definition.yaml). Priority controls evaluation order when multiple modifiers of the same type are attached — lower priority runs first. The attachment is stored in element_modifiers table. Cascade resolution runs at bond-time to merge this modifier into the target’s resolved config. Common mistake: attaching to an incompatible element type — check topology rules first.
delete
Delete /ops/delete | Auth: Admin
Delete element (soft delete)
Soft delete — sets state to ‘deleted’ but retains the record. Cannot delete elements that have children (has_no_bond precondition) or active runs. Requires admin auth and confirmation.
detach
Post /ops/detach | Auth: Read
Detach this modifier from a target element
Removes this modifier from a target element. Requires the target_id. Pervasive modifiers (audit, policy) can only be detached at the level they were originally attached — inherited pervasive modifiers cannot be detached by child elements. After detach, cascade resolution re-runs to remove this modifier’s effect from the resolved config.
disable
Post /ops/disable | Auth: Admin
Disable element (hides and prevents use)
Idempotent — safe to call on already-disabled elements. Optionally pass a reason string. Disabled elements cannot be invoked or executed. Inverse of enable.
enable
Post /ops/enable | Auth: Admin
Enable element (makes usable and visible)
Idempotent — safe to call on already-enabled elements. Transitions element to ready/enabled state. Cannot enable deleted elements. Inverse of disable.
evaluate
Post /ops/evaluate | Auth: Read
Preview rate-limit evaluation (stateless — does NOT affect actual counters)
IMPORTANT: This is a PREVIEW-ONLY operation. It performs a stateless check against the current rate-limit config (given the supplied context) and returns what the decision WOULD be if this request hit the real middleware. It does NOT increment counters, consume quota, or affect the live rate-limit state in any way. Two requests in quick succession will both see the same
remainingvalue. Actual enforcement happens in the cascade middleware layer on incoming requests to attached elements — NOT here. If this rate-limit is not yet attached to any target, callingevaluatewill still succeed (stateless preview), but nothing is actually being throttled in production. The response will include a_notefield whendelivered_to == 0warning that the modifier is unattached. To test real enforcement end-to-end:
- POST /attach with target_id → actually wire the modifier to an element
- Deploy / reload the target element
- Send real requests to the target element’s operations
- Observe 429 responses when limits are exceeded
- Use
statusto inspect live counters,resetto clear themCommon confusion: users think
evaluateconsumes their quota. It does not. Users also thinkevaluatereflects live counter state. It does not — it previews the decision based on the provided context, not the current counter.
get
Get /ops/get | Auth: Read
Get element details
Element is already resolved by the routing layer — this returns the cached element, not a fresh DB query. Use the path /api/{circle}/{slug} to address elements.
get_attached_modifiers
Get /ops/attached/{target_id} | Auth: Read
Get all modifiers attached to a target element
Lists all modifiers attached to a specific target element, including modifier_id, type, subcategory, and priority. Useful for debugging cascade resolution or understanding which policies apply to an element before invoking it.
intention
Get /ops/intention | Auth: Read
Get element intention with full inheritance chain
Returns three levels: direct (this element’s intention), inherited (from category and root), and resolved (final merged intention). Useful for understanding an element’s purpose in context of its hierarchy.
list_attachments
Get /ops/targets | Auth: Read
List all elements this modifier is attached to
Returns all target elements where this modifier is currently applied. Shows target_id, target_type, priority, and cascade_policy.
readme_update
Post /ops/readme_update | Auth: Write
Update element README.md content
Creates or overwrites README.md in the element’s git repo. Commits to the draft branch. Content must be provided as a markdown string.
reset
Post /ops/reset | Auth: Admin
Reset rate limit counter for a key
Clears rate limit counters. Pass key to reset a specific user/IP, or omit to reset all. Requires admin auth. Counters are ephemeral (in-memory/Redis) so this is non-destructive. Use when a user was incorrectly rate-limited or during testing.
schema
Get /ops/schema | Auth: Read
Get element input/output schema (MCP tools/list compatible)
Returns type-level port schemas from the TypeRegistry — not instance-specific overrides. Includes direction (input/output), required flag, and JSON schema per port. Useful for understanding what data an element accepts and produces.
status
Get /ops/status | Auth: Read
Get current rate limit status for a key (LIVE counter state)
Returns LIVE remaining requests, limit, window duration, and reset time for a given key. Pass ?key=<user_id|ip> as query param. Unlike
evaluate(which is stateless preview),statusreflects the actual current counter state stored in the rate-limit backend — the same counter that the cascade middleware increments on incoming requests. Limit rules use ‘requests’ (not ‘max_requests’) and ‘window_seconds’ fields. Useful for showing users their remaining quota before they hit 429, or for debugging rate-limit state in production. Counters are ephemeral (in-memory/NATS KV) and reset when the window expires. Useresetto manually clear.
update
Patch /ops/update | Auth: Write
Update element
Partial update — send only the fields you want to change.
spec,name, andintentionare all independently optional.specMUST be a JSON object when present; deep-merged into the existing spec by default. Empty{"spec":{}}preserves existing spec content but still records a new version (no-op for content, not for version state). To clear/replace the entire spec wholesale send{"spec":{...},"deep":false}. List-typed spec fields use replace semantics (the patch list replaces the existing list, no array merging). Coordinates Git + DB writes. Slug cannot be changed after creation.
Error Codes
| Code | Class | Retryable | Description |
|---|---|---|---|
RATE_LIMIT_EXCEEDED | limit | yes | Rate limit exceeded |
RATE_LIMIT_CONFIG_INVALID | validation | no | Invalid rate limit configuration |
Lifecycle / runtime
Defined for this element
Execution model: sync
Observability
Defined for this element
Metrics
- evaluation_count
- rejection_count
Events
- rate-limit.evaluated
- rate-limit.rejected
Pricing / cost
Platform default
Operation costs
- create: free
- update: free
- delete: free
- get: free
- list: free
- invoke: 10000 micro-AU
- tool_use: free
Set it up
- Rate Limitsstring
- Add rate limit rules