Contracts¶

Validated URL value objects for pyfetcher.

Purpose:

Provide a small immutable wrapper around pydantic.HttpUrl with useful derived helpers for host, path, and query decomposition.

Design:

URL is intentionally pure and contains no I/O behavior.
Computed properties remain deterministic and serialization-friendly.
The model is frozen so it behaves like a value object.

Examples

>>> url = URL("https://example.com/a/b/?x=1&x=2&y=")
>>> url.host
'example.com'
>>> url.path_segments
['a', 'b']
>>> url.query_params["x"]
['1', '2']

class pyfetcher.contracts.url.URL(root=PydanticUndefined)[source]¶

Validated HTTP/HTTPS URL with derived helpers.

Wraps pydantic.HttpUrl to provide computed decomposition of scheme, host, port, path segments, and query parameters as a frozen value object suitable for embedding in request models.

Parameters:: root (RootModelRootType) – The raw URL string or HttpUrl instance to validate.
Raises:: pydantic.ValidationError – If the value is not a valid HTTP/HTTPS URL.

Examples

>>> url = URL("https://example.com:8443/a/b/?x=1&x=2")
>>> url.host
'example.com'
>>> url.port
8443

model_config = {'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property scheme: str¶

Return the URL scheme (e.g. 'https').

Returns:: The scheme component of the URL.

Examples

>>> URL("https://example.com").scheme
'https'

property host: str | None¶

Return the hostname.

Returns:: The host if present, otherwise None.

Examples

>>> URL("https://example.com").host
'example.com'

property port: int | None¶

Return the explicit port number.

Returns:: The explicit port if present, otherwise None.

Examples

>>> URL("https://example.com:9443").port
9443

property path: str | None¶

Return the path component.

Returns:: The path if present, otherwise None.

Examples

>>> URL("https://example.com/a/b").path
'/a/b'

property path_segments: list[str]¶

Return non-empty path segments.

Returns:: A list of non-empty path segments split on /.

Examples

>>> URL("https://example.com/a/b/").path_segments
['a', 'b']

property query: str | None¶

Return the raw query string.

Returns:: The raw query string if present, otherwise None.

Examples

>>> URL("https://example.com?a=1&b=2").query
'a=1&b=2'

property query_params: dict[str, list[str]]¶

Return parsed query parameters.

Returns:: Parsed query parameters as a dict mapping keys to lists of values, preserving blank values.

Examples

>>> URL("https://example.com?a=1&a=2&b=").query_params
{'a': ['1', '2'], 'b': ['']}

unicode_string()[source]¶

Return the normalized URL string.

Returns:: The normalized URL as a Unicode string.
Return type:: str

Examples

>>> URL("https://example.com").unicode_string()
'https://example.com/'

Request models for pyfetcher.

Purpose:

Provide transport-agnostic request contracts that can be consumed by fetch services and backend implementations.

Design:

Requests are immutable and serializable.
URL validation is delegated to URL.
Policies are embedded so one request object is self-describing.

Examples

>>> request = FetchRequest(url="https://example.com")
>>> request.method
'GET'

class pyfetcher.contracts.request.FetchRequest(*, url, method='GET', params=<factory>, headers=<factory>, data=None, json_data=None, backend='httpx', timeout=<factory>, retry=<factory>, pool=<factory>, stream=<factory>, allow_redirects=True, verify_ssl=True, http2=True)[source]¶

Transport-agnostic fetch request.

Encapsulates everything needed to make an HTTP request: the target URL, method, headers, body, and all policy objects that control timeout, retry, pooling, and streaming behavior. The request is frozen and backend-agnostic so it can be serialized, queued, or handed to any transport.

Parameters:

url (URL) – Target URL (string or URL).
method (RequestMethod) – HTTP method (automatically uppercased).
params (dict[str, str | int | float | bool]) – Query parameters to append to the URL.
headers (dict[str, str]) – Per-request headers (merged with provider headers).
data (bytes | str | None) – Optional raw request body (bytes or string).
json_data (dict[str, Any] | list[Any] | None) – Optional JSON request body (dict or list).
backend (BackendKind) – Preferred HTTP backend.
timeout (TimeoutPolicy) – Timeout policy controlling per-phase timeouts.
retry (RetryPolicy) – Retry policy controlling backoff and retryable status codes.
pool (PoolPolicy) – Pool policy controlling connection limits and concurrency.
stream (StreamPolicy) – Stream policy controlling chunk size and byte limits.
allow_redirects (bool) – Whether HTTP redirects should be followed.
verify_ssl (bool) – Whether TLS certificate verification is enabled.
http2 (bool) – Whether HTTP/2 is preferred where the backend supports it.

Examples

>>> FetchRequest(url="https://example.com").backend
'httpx'

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.request.BatchFetchRequest(*, requests, concurrency=None)[source]¶

Batch request wrapper for multiple fetch operations.

Groups multiple FetchRequest objects for concurrent execution with an optional concurrency override that caps the number of in-flight requests.

Parameters:

requests (list[FetchRequest]) – Request objects to execute concurrently.
concurrency (int | None) – Optional concurrency override (defaults to pool policy).

Examples

>>> req = FetchRequest(url="https://example.com")
>>> batch = BatchFetchRequest(requests=[req])
>>> len(batch.requests)
1

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Response models for pyfetcher.

Purpose:

Provide normalized response types independent of the underlying transport.

Design:

Response objects are transport-agnostic.
Streaming chunks are modeled separately from full responses.
Batch responses preserve ordering and capture success/failure per request.

Examples

>>> response = FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=200,
...     headers={},
...     backend="httpx",
...     elapsed_ms=10.0,
... )
>>> response.ok
True

class pyfetcher.contracts.response.FetchResponse(*, request_url, final_url, status_code, headers, content_type=None, text=None, body=None, backend, elapsed_ms)[source]¶

Normalized fetch response.

A transport-agnostic response model that captures the HTTP status, headers, body content, and timing information for a completed request. The ok computed property provides a quick success check.

Parameters:

request_url (str) – Original request URL as a string.
final_url (str) – Final URL after any redirects.
status_code (int) – HTTP status code.
headers (dict[str, str]) – Response headers as a flat dict.
content_type (str | None) – Response Content-Type header value, if present.
text (str | None) – Decoded text body when fully loaded.
body (bytes | None) – Raw bytes body when available.
backend (BackendKind) – Name of the backend that executed the request.
elapsed_ms (float) – Total elapsed time in milliseconds.

Examples

>>> response = FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=204,
...     headers={},
...     backend="httpx",
...     elapsed_ms=1.0,
... )
>>> response.ok
True

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property ok: bool¶

Return whether the response indicates success.

Returns:: True for 2xx and 3xx status codes.

Examples

>>> FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=200,
...     headers={},
...     backend="httpx",
...     elapsed_ms=1.0,
... ).ok
True

class pyfetcher.contracts.response.StreamChunk(*, request_url, final_url, backend, index, data)[source]¶

Single streamed response chunk.

Represents one chunk of a streaming HTTP response, carrying the raw bytes along with positional metadata for ordered reassembly.

Parameters:

request_url (str) – Original request URL.
final_url (str) – Final URL after redirects.
backend (BackendKind) – Backend that produced this chunk.
index (int) – Zero-based chunk index within the stream.
data (bytes) – Raw bytes payload for this chunk.

Examples

>>> StreamChunk(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     backend="aiohttp",
...     index=0,
...     data=b"abc",
... ).index
0

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.response.BatchItemResponse(*, request_url, ok, response=None, error=None)[source]¶

Result of a single batch item.

Captures either a successful FetchResponse or an error message for one request within a batch execution.

Parameters:

request_url (str) – Original request URL.
ok (bool) – Whether the item succeeded.
response (FetchResponse | None) – The fetch response on success.
error (str | None) – Error message string on failure.

Examples

>>> item = BatchItemResponse(
...     request_url="https://example.com/",
...     ok=False,
...     error="boom",
... )
>>> item.ok
False

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.response.BatchFetchResponse(*, items)[source]¶

Response container for batch fetch execution.

Wraps the results of a BatchFetchRequest, preserving input order so callers can correlate responses to their original requests by index.

Parameters:: items (list[BatchItemResponse]) – Batch item responses in input order.

Examples

>>> BatchFetchResponse(items=[]).items
[]

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Policy models for pyfetcher.

Purpose:

Provide serializable policy objects that control retries, timeouts, connection pooling, and streaming behavior.

Design:

Policies are explicit and reusable across transports.
Policies are serializable so they can be persisted or queued later.
Backend-specific conversion happens outside these models.

Examples

>>> retry = RetryPolicy(attempts=4)
>>> retry.attempts
4

class pyfetcher.contracts.policy.RetryPolicy(*, attempts=3, wait_base_seconds=0.5, wait_max_seconds=8.0, retry_status_codes=<factory>, retry_on_connection_errors=True, reraise=True)[source]¶

Retry policy shared by fetch services.

Controls how failed requests are retried using exponential backoff. Status codes that trigger retries are configurable, as is whether connection-level errors should be retried.

Parameters:

attempts (int) – Total number of attempts including the first call.
wait_base_seconds (float) – Base exponential backoff delay in seconds.
wait_max_seconds (float) – Maximum delay between attempts in seconds.
retry_status_codes (set[int]) – HTTP status codes that should trigger retries.
retry_on_connection_errors (bool) – Whether connection errors should retry.
reraise (bool) – Whether the final failure should be re-raised to the caller.

Examples

>>> RetryPolicy(attempts=3).attempts
3

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.TimeoutPolicy(*, total_seconds=30.0, connect_seconds=10.0, read_seconds=30.0, write_seconds=30.0, pool_seconds=10.0)[source]¶

Timeout policy shared by fetch services.

Provides granular timeout control for different phases of an HTTP request. The total_seconds value acts as an overall budget that caps the entire operation regardless of the per-phase values.

Parameters:

total_seconds (float) – Overall timeout budget in seconds.
connect_seconds (float) – Maximum time to establish a TCP connection.
read_seconds (float) – Maximum time to receive the response body.
write_seconds (float) – Maximum time to send the request body.
pool_seconds (float) – Maximum time to acquire a connection from the pool.

Examples

>>> TimeoutPolicy().total_seconds
30.0

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.PoolPolicy(*, max_connections=100, max_keepalive_connections=20, keepalive_expiry_seconds=10.0, max_connections_per_host=10, concurrency=8)[source]¶

Connection pooling and concurrency policy.

Controls connection pool sizing and keepalive behavior for transport backends, as well as the concurrency limit used for async batch operations.

Parameters:

max_connections (int) – Maximum total connections across all hosts.
max_keepalive_connections (int) – Maximum keepalive connections where supported.
keepalive_expiry_seconds (float) – Time-to-live for idle keepalive connections.
max_connections_per_host (int) – Maximum connections to a single host.
concurrency (int) – Maximum in-flight tasks for async batching.

Examples

>>> PoolPolicy(concurrency=8).concurrency
8

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.StreamPolicy(*, chunk_size=65536, decode_text=False, max_bytes=None)[source]¶

Streaming behavior policy.

Controls chunk sizing and optional byte limits for streaming operations. The decode_text flag signals downstream consumers that text decoding is desired.

Parameters:

chunk_size (int) – Size in bytes of each emitted chunk.
decode_text (bool) – Whether downstream consumers expect text decoding.
max_bytes (int | None) – Optional cap on total consumed bytes (None for unlimited).

Examples

>>> StreamPolicy().chunk_size
65536

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Shared resource models for pyfetcher.

Purpose:

Provide lightweight reusable models for fetched pages and downloadable media that scraper and downloader layers can build on.

Design:

Resource models are intentionally generic.
They reference URLs through the shared URL.
Scraper-specific models should extend or wrap these models rather than replacing them for common cases.

Examples

>>> page = WebPage(url="https://example.com", title="Home")
>>> page.title
'Home'

class pyfetcher.contracts.resource.WebResource(*, url, mime_type=None)[source]¶

Generic web resource.

Base model for any resource identified by a URL with an optional MIME type. Scraper and downloader models extend this to add domain-specific fields.

Parameters:

url (URL) – Resource URL (string or URL).
mime_type (str | None) – MIME type if known (e.g. 'text/html').

Examples

>>> WebResource(url="https://example.com/image.png").url.host
'example.com'

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.resource.WebPage(*, url, mime_type=None, title=None, description=None)[source]¶

Generic fetched web page.

Extends WebResource with optional title and description fields suitable for representing a fetched HTML page.

Parameters:

url (URL) – Page URL.
mime_type (str | None) – MIME type if known.
title (str | None) – Best-effort page title extracted from HTML.
description (str | None) – Best-effort page description.

Examples

>>> WebPage(url="https://example.com", title="Home").title
'Home'

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.resource.MediaResource(*, url, mime_type=None, filename=None, content_length=None)[source]¶

Generic downloadable media resource.

Extends WebResource with filename and content length fields suitable for representing a downloadable binary resource.

Parameters:

url (URL) – Resource URL.
mime_type (str | None) – MIME type if known.
filename (str | None) – Best-effort filename derived from URL or headers.
content_length (int | None) – Content length in bytes if known.

Examples

>>> MediaResource(url="https://example.com/file.mp4", filename="file.mp4").filename
'file.mp4'

model_config = {'extra': 'forbid', 'frozen': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].