haive.core.models.llm.rate_limiting_mixin¶
Rate limiting mixin for LLM configurations.
This module provides a mixin class that adds rate limiting capabilities to LLM configurations, allowing for controlled request rates to prevent API throttling and manage costs.
Classes¶
Mixin class that adds rate limiting configuration to LLM models. |
Module Contents¶
- class haive.core.models.llm.rate_limiting_mixin.RateLimitingMixin¶
Mixin class that adds rate limiting configuration to LLM models.
This mixin provides configuration for rate limiting when calling LLM APIs, including request limits, token limits, and time windows. It integrates with LangChain’s ChatRateLimiter for actual enforcement.
- requests_per_second¶
Maximum number of requests per second
- tokens_per_second¶
Maximum number of tokens per second (if supported)
- tokens_per_minute¶
Maximum number of tokens per minute (if supported)
- max_retries¶
Maximum number of retries for rate-limited requests
- retry_delay¶
Base delay between retries in seconds
- check_every_n_seconds¶
How often to check rate limits
- burst_size¶
Maximum burst size for rate limiting
- apply_rate_limiting(llm)¶
Apply rate limiting to an LLM instance.
- Parameters:
llm (Any) – The LLM instance to apply rate limiting to
- Returns:
The LLM instance wrapped with rate limiting, or original if rate limiting not configured
- Return type:
Any