Rate limiting
Available in Imposter 5 and later
Rate limiting lets you control the number of concurrent requests to your mock endpoints. This is useful for simulating real-world API behaviour, testing system resilience, and reproducing overload conditions during load testing.
Overview
The rate limiter tracks active concurrent requests per resource and returns a different response when configurable thresholds are exceeded. It supports:
- Concurrent request limiting: control how many requests can be processed simultaneously
- Multiple threshold tiers: define different responses for different concurrency levels
- Progressive throttling: add delays, return errors, or apply custom responses
- Store backend flexibility: works with in-memory, Redis, and DynamoDB stores
- Distributed operation: consistent behaviour across multiple server instances (with shared stores)
Configuration
Rate limiting is configured per resource using the concurrency property. Each concurrency limit defines a threshold and the response to return when that threshold is exceeded.
Basic rate limiting
plugin: rest
resources:
- path: /api/users
method: GET
concurrency:
- threshold: 5
response:
statusCode: 429
content: "Too many concurrent requests"
headers:
Retry-After: "10"
response:
statusCode: 200
content: "User data"
In this example:
- Normal requests (1-5 concurrent) return the standard 200 response
- When more than 5 requests are active, additional requests receive a 429 "Too Many Requests" response
Multiple thresholds
You can define multiple concurrency thresholds with different responses for progressive rate limiting:
plugin: rest
resources:
- path: /api/heavy-operation
method: POST
concurrency:
# First tier: add a delay to slow down requests
- threshold: 3
response:
statusCode: 200
content: "Request throttled"
delay:
exact: 2000 # 2 second delay
headers:
X-Throttled: "true"
# Second tier: return 503 Service Unavailable
- threshold: 7
response:
statusCode: 503
content: "Service temporarily overloaded"
headers:
Retry-After: "30"
# Third tier: return 429 Rate Limit Exceeded
- threshold: 10
response:
statusCode: 429
content: "Rate limit exceeded"
response:
statusCode: 202
content: "Operation accepted"
With this configuration:
- 1-3 concurrent requests: normal 202 response
- 4-7 concurrent requests: 200 response with a 2-second delay and throttling header
- 8-10 concurrent requests: 503 Service Unavailable
- 11+ concurrent requests: 429 Rate Limit Exceeded
SOAP rate limiting
Rate limiting works with SOAP endpoints using the operation name:
plugin: soap
resources:
- operation: getUserDetails
concurrency:
- threshold: 3
response:
statusCode: 503
content: |
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<soap:Fault>
<faultcode>Server</faultcode>
<faultstring>Service temporarily overloaded</faultstring>
</soap:Fault>
</soap:Body>
</soap:Envelope>
response:
statusCode: 200
content: |
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<getUserDetailsResponse>
<user>
<id>123</id>
<name>John Doe</name>
</user>
</getUserDetailsResponse>
</soap:Body>
</soap:Envelope>
Response configuration
Rate limit responses support the same features as regular responses.
Status codes and content
Custom headers
concurrency:
- threshold: 5
response:
statusCode: 503
headers:
Retry-After: "30"
X-RateLimit-Limit: "5"
X-RateLimit-Remaining: "0"
content: "Service temporarily unavailable"
Delays
Rate limit responses can inject delays in the same way as performance simulation.
concurrency:
- threshold: 3
response:
delay:
exact: 1000 # fixed 1-second delay
statusCode: 200
content: "Throttled response"
- threshold: 5
response:
delay: # random delay between 0.5-2 seconds
min: 500
max: 2000
statusCode: 200
content: "Heavily throttled"
Templated responses
Rate limit responses support response templates:
concurrency:
- threshold: 5
response:
statusCode: 429
template: true
content: |
{
"error": "Rate limit exceeded",
"timestamp": "${datetime.now.iso8601_datetime}",
"retry_after": 30
}
headers:
Content-Type: application/json
Store backends
Rate limit counters are held in a store, so the behaviour depends on the store backend you choose.
In-memory store (default)
No configuration is needed; this is the default.
Characteristics:
- Fast performance
- Per-instance rate limiting
- Suitable for single-instance deployments
- Counters are lost on restart
Redis store
Characteristics:
- Shared rate limiting across multiple instances
- Persistent across restarts
- High performance with atomic operations
- Suitable for distributed deployments
DynamoDB store
export IMPOSTER_STORE_DRIVER=store-dynamodb
export IMPOSTER_DYNAMODB_TABLE=imposter-store
imposter up
Characteristics:
- Shared rate limiting across multiple instances
- Fully managed and scalable
- Automatic TTL support
- Suitable for cloud deployments
Environment variables
Configure rate limiting behaviour with these environment variables:
| Variable | Purpose | Default | Example |
|---|---|---|---|
IMPOSTER_RATE_LIMITER_TTL |
TTL for rate limit entries (seconds) | 300 (5 minutes) |
IMPOSTER_RATE_LIMITER_TTL=600 |
IMPOSTER_STORE_DRIVER |
Store backend selection | in-memory | IMPOSTER_STORE_DRIVER=store-redis |
REDIS_ADDR |
Redis server address (if using Redis) | - | REDIS_ADDR=localhost:6379 |
IMPOSTER_DYNAMODB_TABLE |
DynamoDB table name (if using DynamoDB) | - | IMPOSTER_DYNAMODB_TABLE=imposter-store |
How it works
Threshold logic
Rate limiting uses "greater than" logic to determine when thresholds are exceeded:
- A
thresholdof3means rate limiting applies when there are more than 3 concurrent requests (i.e. 4 or more) - Multiple thresholds are evaluated in order, with the highest matching threshold taking precedence
Request processing flow
- Request arrives at the configured endpoint
- Counter incremented atomically for the resource
- Threshold evaluation checks whether any limits are exceeded
- If rate limited: the counter is rolled back and the rate limit response is returned
- If not rate limited: the request is processed normally
- After the response: the counter is decremented automatically
Resource identification
Each resource is identified by a unique key that includes:
- HTTP method (GET, POST, etc.) or
*for SOAP - Resource path (REST) or operation name (SOAP)
- Hash of matching criteria (if the resource has specific headers, query parameters, etc.)
This ensures that resources with the same path but different matching criteria get separate rate limit counters.
Error handling
The rate limiter follows a "fail-open" approach:
- Store failures allow requests to proceed normally (rate limiting is disabled)
- Network issues do not block request processing
- Counter errors are logged but do not impact request flow
This ensures that rate limiting enhances your testing without compromising availability.
Load testing
Rate limiting integrates well with load testing tools such as hey:
# Test concurrent requests
hey -n 100 -c 20 -m GET http://localhost:8080/api/users
# Test a different endpoint
hey -n 50 -c 10 -m POST http://localhost:8080/api/heavy-operation
Monitor the output for the status code distribution to see rate limiting in action:
Status code distribution:
[200] 75 responses # normal responses
[429] 20 responses # rate limited
[503] 5 responses # service overloaded
Best practices
1. Progressive throttling
Use multiple thresholds to handle load gracefully:
concurrency:
- threshold: 5 # warn with delays
response:
delay: { exact: 1000 }
headers: { X-Throttled: "true" }
- threshold: 10 # soft rejection
response:
statusCode: 503
headers: { Retry-After: "10" }
- threshold: 20 # hard rejection
response:
statusCode: 429
2. Appropriate HTTP status codes
- 429 Too Many Requests: for hard rate limits
- 503 Service Unavailable: for temporary overload
- 200 OK with delays: for throttling without errors
3. Helpful headers
Include headers to help clients understand the rate limiting:
headers:
Retry-After: "30"
X-RateLimit-Limit: "10"
X-RateLimit-Remaining: "0"
X-RateLimit-Reset: "${datetime.now.plus_seconds(30).epoch_second}"
4. Realistic thresholds
Set thresholds based on your actual API capacity:
- Database APIs: lower thresholds (2-5 concurrent)
- Cache APIs: higher thresholds (20-50 concurrent)
- File uploads: very low thresholds (1-2 concurrent)
5. Store selection
Choose the appropriate store backend for your testing scenario:
- In-memory: single-instance testing
- Redis: multi-instance testing with shared state
- DynamoDB: cloud-based distributed testing
Availability
Rate limiting is available in Imposter 5.x onwards. It is not available in 4.x.