Rate Limiting

Rate limiter implementation for pyfetcher.

Purpose:

Provide configurable per-domain and global rate limiting using pyrate_limiter. The DomainRateLimiter maintains separate rate limit buckets per domain while optionally enforcing a global request rate.

Design:
  • Rate limits are defined via RateLimitPolicy which specifies requests-per-second and an optional burst allowance.

  • The limiter uses an in-memory bucket store by default.

  • Both synchronous and asynchronous acquire methods are provided.

  • Domain extraction is performed automatically from URLs.

Examples

>>> policy = RateLimitPolicy(requests_per_second=10.0)
>>> limiter = DomainRateLimiter(default_policy=policy)
>>> limiter.acquire("https://example.com/page1")
class pyfetcher.ratelimit.limiter.RateLimitPolicy(requests_per_second=10.0, burst=1, per_domain=True)[source]

Rate limiting policy configuration.

Defines the rate at which requests may be made, with an optional burst allowance for short traffic spikes.

Parameters:
  • requests_per_second (float) – Maximum sustained request rate. Set to 0 to disable rate limiting.

  • burst (int) – Maximum number of requests that can be made in a burst before throttling kicks in. Defaults to 1 (no burst).

  • per_domain (bool) – Whether this limit applies per-domain (True) or globally (False).

Examples

>>> policy = RateLimitPolicy(requests_per_second=5.0, burst=10)
>>> policy.interval
0.2
property interval: float

Minimum interval between requests in seconds.

Returns:

The reciprocal of requests_per_second, or 0.0 if rate limiting is disabled.

Examples

>>> RateLimitPolicy(requests_per_second=2.0).interval
0.5
class pyfetcher.ratelimit.limiter.DomainRateLimiter(*, default_policy=None, domain_policies=None, global_policy=None)[source]

Per-domain rate limiter with optional global rate limiting.

Maintains separate token buckets for each domain encountered, throttling requests to stay within the configured rate limits. An optional global limiter can enforce an overall request rate across all domains.

Parameters:
  • default_policy (RateLimitPolicy | None) – The default rate limit policy for domains without a specific override.

  • domain_policies (dict[str, RateLimitPolicy] | None) – Optional mapping of domain names to specific RateLimitPolicy instances.

  • global_policy (RateLimitPolicy | None) – Optional global rate limit applied across all domains in addition to per-domain limits.

Examples

>>> limiter = DomainRateLimiter(
...     default_policy=RateLimitPolicy(requests_per_second=5.0),
...     domain_policies={"api.example.com": RateLimitPolicy(requests_per_second=1.0)},
... )
>>> limiter.acquire("https://api.example.com/data")
acquire(url)[source]

Acquire permission to make a request, blocking if rate-limited.

Checks both the per-domain rate limit and the optional global rate limit. Blocks the calling thread until a token is available.

Parameters:

url (str) – The target URL (domain is extracted automatically).

Returns:

Total time in seconds spent waiting for rate limit tokens.

Return type:

float

Examples

>>> limiter = DomainRateLimiter()
>>> wait = limiter.acquire("https://example.com/page")
async aacquire(url)[source]

Acquire permission to make a request asynchronously.

Checks both the per-domain rate limit and the optional global rate limit. Yields control while waiting for tokens.

Parameters:

url (str) – The target URL (domain is extracted automatically).

Returns:

Total time in seconds spent waiting for rate limit tokens.

Return type:

float

Examples

>>> import asyncio
>>> limiter = DomainRateLimiter()
>>> # await limiter.aacquire("https://example.com/page")
reset(domain=None)[source]

Reset rate limit state.

Parameters:

domain (str | None) – If provided, only reset the bucket for this domain. If None, reset all domain buckets.

Return type:

None

Examples

>>> limiter = DomainRateLimiter()
>>> limiter.reset()