Adaptive async rate limiting for Python

Closed-loop feedback control for API concurrency. When APIs push back, gentlify slows down. When pressure eases, it speeds up.

$ pip install gentlify

Everything you need, nothing you don’t

Built-in primitives that work together — no extra libraries required.

Adaptive Concurrency

Dynamic concurrency limits that decelerate on failures and cautiously recover after a cooling period of sustained success.

Dispatch Interval + Jitter

Enforces minimum time gaps between requests with stochastic jitter to prevent thundering-herd bursts.

Token-Aware Budgeting

Track per-window resource consumption — LLM tokens, API credits, bytes — independently of request-count limits.

Circuit Breaker

Hard stop after sustained failures with automatic half-open probing for safe recovery.

Built-in Retry

Configurable retry with exponential backoff and jitter — retries happen inside the throttled slot so concurrency accounting stays correct.

Zero Dependencies

Pure Python standard library. No runtime dependencies. Ships with py.typed and passes mypy --strict.

Progress & Observability

Real-time snapshots with ETA, structured event callbacks, and standard logging integration.

Graceful Shutdown

Drain in-flight requests on shutdown — no dropped work, no hard stops.

Quick Start

from gentlify import Throttle throttle = Throttle(max_concurrency=5) # Simple — just pass a callable result = await throttle.execute(lambda slot: call_api(item)) # With custom logic — token recording, result inspection async def my_task(slot): result = await call_llm(prompt) slot.record_tokens(result.usage.total_tokens) return result.text text = await throttle.execute(my_task)