Contracts¶
Validated URL value objects for pyfetcher.
- Purpose:
Provide a small immutable wrapper around
pydantic.HttpUrlwith useful derived helpers for host, path, and query decomposition.- Design:
URLis intentionally pure and contains no I/O behavior.Computed properties remain deterministic and serialization-friendly.
The model is frozen so it behaves like a value object.
Examples
>>> url = URL("https://example.com/a/b/?x=1&x=2&y=")
>>> url.host
'example.com'
>>> url.path_segments
['a', 'b']
>>> url.query_params["x"]
['1', '2']
- class pyfetcher.contracts.url.URL(root=PydanticUndefined)[source]¶
Validated HTTP/HTTPS URL with derived helpers.
Wraps
pydantic.HttpUrlto provide computed decomposition of scheme, host, port, path segments, and query parameters as a frozen value object suitable for embedding in request models.- Parameters:
root (RootModelRootType) – The raw URL string or
HttpUrlinstance to validate.- Raises:
pydantic.ValidationError – If the value is not a valid HTTP/HTTPS URL.
Examples
>>> url = URL("https://example.com:8443/a/b/?x=1&x=2") >>> url.host 'example.com' >>> url.port 8443
- model_config = {'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property scheme: str¶
Return the URL scheme (e.g.
'https').- Returns:
The scheme component of the URL.
Examples
>>> URL("https://example.com").scheme 'https'
- property host: str | None¶
Return the hostname.
- Returns:
The host if present, otherwise
None.
Examples
>>> URL("https://example.com").host 'example.com'
- property port: int | None¶
Return the explicit port number.
- Returns:
The explicit port if present, otherwise
None.
Examples
>>> URL("https://example.com:9443").port 9443
- property path: str | None¶
Return the path component.
- Returns:
The path if present, otherwise
None.
Examples
>>> URL("https://example.com/a/b").path '/a/b'
- property path_segments: list[str]¶
Return non-empty path segments.
- Returns:
A list of non-empty path segments split on
/.
Examples
>>> URL("https://example.com/a/b/").path_segments ['a', 'b']
- property query: str | None¶
Return the raw query string.
- Returns:
The raw query string if present, otherwise
None.
Examples
>>> URL("https://example.com?a=1&b=2").query 'a=1&b=2'
Request models for pyfetcher.
- Purpose:
Provide transport-agnostic request contracts that can be consumed by fetch services and backend implementations.
- Design:
Requests are immutable and serializable.
URL validation is delegated to
URL.Policies are embedded so one request object is self-describing.
Examples
>>> request = FetchRequest(url="https://example.com")
>>> request.method
'GET'
- class pyfetcher.contracts.request.FetchRequest(*, url, method='GET', params=<factory>, headers=<factory>, data=None, json_data=None, backend='httpx', timeout=<factory>, retry=<factory>, pool=<factory>, stream=<factory>, allow_redirects=True, verify_ssl=True, http2=True)[source]¶
Transport-agnostic fetch request.
Encapsulates everything needed to make an HTTP request: the target URL, method, headers, body, and all policy objects that control timeout, retry, pooling, and streaming behavior. The request is frozen and backend-agnostic so it can be serialized, queued, or handed to any transport.
- Parameters:
method (RequestMethod) – HTTP method (automatically uppercased).
params (dict[str, str | int | float | bool]) – Query parameters to append to the URL.
headers (dict[str, str]) – Per-request headers (merged with provider headers).
data (bytes | str | None) – Optional raw request body (bytes or string).
json_data (dict[str, Any] | list[Any] | None) – Optional JSON request body (dict or list).
backend (BackendKind) – Preferred HTTP backend.
timeout (TimeoutPolicy) – Timeout policy controlling per-phase timeouts.
retry (RetryPolicy) – Retry policy controlling backoff and retryable status codes.
pool (PoolPolicy) – Pool policy controlling connection limits and concurrency.
stream (StreamPolicy) – Stream policy controlling chunk size and byte limits.
allow_redirects (bool) – Whether HTTP redirects should be followed.
verify_ssl (bool) – Whether TLS certificate verification is enabled.
http2 (bool) – Whether HTTP/2 is preferred where the backend supports it.
Examples
>>> FetchRequest(url="https://example.com").backend 'httpx'
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.request.BatchFetchRequest(*, requests, concurrency=None)[source]¶
Batch request wrapper for multiple fetch operations.
Groups multiple
FetchRequestobjects for concurrent execution with an optional concurrency override that caps the number of in-flight requests.- Parameters:
requests (list[FetchRequest]) – Request objects to execute concurrently.
concurrency (int | None) – Optional concurrency override (defaults to pool policy).
Examples
>>> req = FetchRequest(url="https://example.com") >>> batch = BatchFetchRequest(requests=[req]) >>> len(batch.requests) 1
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Response models for pyfetcher.
- Purpose:
Provide normalized response types independent of the underlying transport.
- Design:
Response objects are transport-agnostic.
Streaming chunks are modeled separately from full responses.
Batch responses preserve ordering and capture success/failure per request.
Examples
>>> response = FetchResponse(
... request_url="https://example.com/",
... final_url="https://example.com/",
... status_code=200,
... headers={},
... backend="httpx",
... elapsed_ms=10.0,
... )
>>> response.ok
True
- class pyfetcher.contracts.response.FetchResponse(*, request_url, final_url, status_code, headers, content_type=None, text=None, body=None, backend, elapsed_ms)[source]¶
Normalized fetch response.
A transport-agnostic response model that captures the HTTP status, headers, body content, and timing information for a completed request. The
okcomputed property provides a quick success check.- Parameters:
request_url (str) – Original request URL as a string.
final_url (str) – Final URL after any redirects.
status_code (int) – HTTP status code.
content_type (str | None) – Response
Content-Typeheader value, if present.text (str | None) – Decoded text body when fully loaded.
body (bytes | None) – Raw bytes body when available.
backend (BackendKind) – Name of the backend that executed the request.
elapsed_ms (float) – Total elapsed time in milliseconds.
Examples
>>> response = FetchResponse( ... request_url="https://example.com/", ... final_url="https://example.com/", ... status_code=204, ... headers={}, ... backend="httpx", ... elapsed_ms=1.0, ... ) >>> response.ok True
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property ok: bool¶
Return whether the response indicates success.
- Returns:
Truefor 2xx and 3xx status codes.
Examples
>>> FetchResponse( ... request_url="https://example.com/", ... final_url="https://example.com/", ... status_code=200, ... headers={}, ... backend="httpx", ... elapsed_ms=1.0, ... ).ok True
- class pyfetcher.contracts.response.StreamChunk(*, request_url, final_url, backend, index, data)[source]¶
Single streamed response chunk.
Represents one chunk of a streaming HTTP response, carrying the raw bytes along with positional metadata for ordered reassembly.
- Parameters:
Examples
>>> StreamChunk( ... request_url="https://example.com/", ... final_url="https://example.com/", ... backend="aiohttp", ... index=0, ... data=b"abc", ... ).index 0
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.response.BatchItemResponse(*, request_url, ok, response=None, error=None)[source]¶
Result of a single batch item.
Captures either a successful
FetchResponseor an error message for one request within a batch execution.- Parameters:
request_url (str) – Original request URL.
ok (bool) – Whether the item succeeded.
response (FetchResponse | None) – The fetch response on success.
error (str | None) – Error message string on failure.
Examples
>>> item = BatchItemResponse( ... request_url="https://example.com/", ... ok=False, ... error="boom", ... ) >>> item.ok False
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.response.BatchFetchResponse(*, items)[source]¶
Response container for batch fetch execution.
Wraps the results of a
BatchFetchRequest, preserving input order so callers can correlate responses to their original requests by index.- Parameters:
items (list[BatchItemResponse]) – Batch item responses in input order.
Examples
>>> BatchFetchResponse(items=[]).items []
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Policy models for pyfetcher.
- Purpose:
Provide serializable policy objects that control retries, timeouts, connection pooling, and streaming behavior.
- Design:
Policies are explicit and reusable across transports.
Policies are serializable so they can be persisted or queued later.
Backend-specific conversion happens outside these models.
Examples
>>> retry = RetryPolicy(attempts=4)
>>> retry.attempts
4
- class pyfetcher.contracts.policy.RetryPolicy(*, attempts=3, wait_base_seconds=0.5, wait_max_seconds=8.0, retry_status_codes=<factory>, retry_on_connection_errors=True, reraise=True)[source]¶
Retry policy shared by fetch services.
Controls how failed requests are retried using exponential backoff. Status codes that trigger retries are configurable, as is whether connection-level errors should be retried.
- Parameters:
attempts (int) – Total number of attempts including the first call.
wait_base_seconds (float) – Base exponential backoff delay in seconds.
wait_max_seconds (float) – Maximum delay between attempts in seconds.
retry_status_codes (set[int]) – HTTP status codes that should trigger retries.
retry_on_connection_errors (bool) – Whether connection errors should retry.
reraise (bool) – Whether the final failure should be re-raised to the caller.
Examples
>>> RetryPolicy(attempts=3).attempts 3
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.policy.TimeoutPolicy(*, total_seconds=30.0, connect_seconds=10.0, read_seconds=30.0, write_seconds=30.0, pool_seconds=10.0)[source]¶
Timeout policy shared by fetch services.
Provides granular timeout control for different phases of an HTTP request. The
total_secondsvalue acts as an overall budget that caps the entire operation regardless of the per-phase values.- Parameters:
total_seconds (float) – Overall timeout budget in seconds.
connect_seconds (float) – Maximum time to establish a TCP connection.
read_seconds (float) – Maximum time to receive the response body.
write_seconds (float) – Maximum time to send the request body.
pool_seconds (float) – Maximum time to acquire a connection from the pool.
Examples
>>> TimeoutPolicy().total_seconds 30.0
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.policy.PoolPolicy(*, max_connections=100, max_keepalive_connections=20, keepalive_expiry_seconds=10.0, max_connections_per_host=10, concurrency=8)[source]¶
Connection pooling and concurrency policy.
Controls connection pool sizing and keepalive behavior for transport backends, as well as the concurrency limit used for async batch operations.
- Parameters:
max_connections (int) – Maximum total connections across all hosts.
max_keepalive_connections (int) – Maximum keepalive connections where supported.
keepalive_expiry_seconds (float) – Time-to-live for idle keepalive connections.
max_connections_per_host (int) – Maximum connections to a single host.
concurrency (int) – Maximum in-flight tasks for async batching.
Examples
>>> PoolPolicy(concurrency=8).concurrency 8
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.policy.StreamPolicy(*, chunk_size=65536, decode_text=False, max_bytes=None)[source]¶
Streaming behavior policy.
Controls chunk sizing and optional byte limits for streaming operations. The
decode_textflag signals downstream consumers that text decoding is desired.- Parameters:
Examples
>>> StreamPolicy().chunk_size 65536
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Shared resource models for pyfetcher.
- Purpose:
Provide lightweight reusable models for fetched pages and downloadable media that scraper and downloader layers can build on.
- Design:
Resource models are intentionally generic.
They reference URLs through the shared
URL.Scraper-specific models should extend or wrap these models rather than replacing them for common cases.
Examples
>>> page = WebPage(url="https://example.com", title="Home")
>>> page.title
'Home'
- class pyfetcher.contracts.resource.WebResource(*, url, mime_type=None)[source]¶
Generic web resource.
Base model for any resource identified by a URL with an optional MIME type. Scraper and downloader models extend this to add domain-specific fields.
- Parameters:
Examples
>>> WebResource(url="https://example.com/image.png").url.host 'example.com'
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.resource.WebPage(*, url, mime_type=None, title=None, description=None)[source]¶
Generic fetched web page.
Extends
WebResourcewith optional title and description fields suitable for representing a fetched HTML page.- Parameters:
Examples
>>> WebPage(url="https://example.com", title="Home").title 'Home'
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pyfetcher.contracts.resource.MediaResource(*, url, mime_type=None, filename=None, content_length=None)[source]¶
Generic downloadable media resource.
Extends
WebResourcewith filename and content length fields suitable for representing a downloadable binary resource.- Parameters:
Examples
>>> MediaResource(url="https://example.com/file.mp4", filename="file.mp4").filename 'file.mp4'
- model_config = {'extra': 'forbid', 'frozen': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].