gentlify — Adaptive Async Rate Limiting for Python

Everything you need, nothing you don’t

Built-in primitives that work together — no extra libraries required.

Adaptive Concurrency

Dynamic concurrency limits that decelerate on failures and cautiously recover after a cooling period of sustained success.

Dispatch Interval + Jitter

Enforces minimum time gaps between requests with stochastic jitter to prevent thundering-herd bursts.

Token-Aware Budgeting

Track per-window resource consumption — LLM tokens, API credits, bytes — independently of request-count limits.

Circuit Breaker

Hard stop after sustained failures with automatic half-open probing for safe recovery.

Built-in Retry

Configurable retry with exponential backoff and jitter — retries happen inside the throttled slot so concurrency accounting stays correct.

Zero Dependencies

Pure Python standard library. No runtime dependencies. Ships with py.typed and passes mypy --strict.

Progress & Observability

Real-time snapshots with ETA, structured event callbacks, and standard logging integration.

Graceful Shutdown

Drain in-flight requests on shutdown — no dropped work, no hard stops.

Quick Start

from gentlify import Throttle

throttle = Throttle(max_concurrency=5)

# Simple — just pass a callable
result = await throttle.execute(lambda slot: call_api(item))

# With custom logic — token recording, result inspection
async def my_task(slot):
    result = await call_llm(prompt)
    slot.record_tokens(result.usage.total_tokens)
    return result.text

text = await throttle.execute(my_task)

Adaptive async rate limiting for Python