Contracts

Validated URL value objects for pyfetcher.

Purpose:

Provide a small immutable wrapper around pydantic.HttpUrl with useful derived helpers for host, path, and query decomposition.

Design:
  • URL is intentionally pure and contains no I/O behavior.

  • Computed properties remain deterministic and serialization-friendly.

  • The model is frozen so it behaves like a value object.

Examples

>>> url = URL("https://example.com/a/b/?x=1&x=2&y=")
>>> url.host
'example.com'
>>> url.path_segments
['a', 'b']
>>> url.query_params["x"]
['1', '2']
class pyfetcher.contracts.url.URL(root=PydanticUndefined)[source]

Validated HTTP/HTTPS URL with derived helpers.

Wraps pydantic.HttpUrl to provide computed decomposition of scheme, host, port, path segments, and query parameters as a frozen value object suitable for embedding in request models.

Parameters:

root (RootModelRootType) – The raw URL string or HttpUrl instance to validate.

Raises:

pydantic.ValidationError – If the value is not a valid HTTP/HTTPS URL.

Examples

>>> url = URL("https://example.com:8443/a/b/?x=1&x=2")
>>> url.host
'example.com'
>>> url.port
8443
model_config = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property scheme: str

Return the URL scheme (e.g. 'https').

Returns:

The scheme component of the URL.

Examples

>>> URL("https://example.com").scheme
'https'
property host: str | None

Return the hostname.

Returns:

The host if present, otherwise None.

Examples

>>> URL("https://example.com").host
'example.com'
property port: int | None

Return the explicit port number.

Returns:

The explicit port if present, otherwise None.

Examples

>>> URL("https://example.com:9443").port
9443
property path: str | None

Return the path component.

Returns:

The path if present, otherwise None.

Examples

>>> URL("https://example.com/a/b").path
'/a/b'
property path_segments: list[str]

Return non-empty path segments.

Returns:

A list of non-empty path segments split on /.

Examples

>>> URL("https://example.com/a/b/").path_segments
['a', 'b']
property query: str | None

Return the raw query string.

Returns:

The raw query string if present, otherwise None.

Examples

>>> URL("https://example.com?a=1&b=2").query
'a=1&b=2'
property query_params: dict[str, list[str]]

Return parsed query parameters.

Returns:

Parsed query parameters as a dict mapping keys to lists of values, preserving blank values.

Examples

>>> URL("https://example.com?a=1&a=2&b=").query_params
{'a': ['1', '2'], 'b': ['']}
unicode_string()[source]

Return the normalized URL string.

Returns:

The normalized URL as a Unicode string.

Return type:

str

Examples

>>> URL("https://example.com").unicode_string()
'https://example.com/'

Request models for pyfetcher.

Purpose:

Provide transport-agnostic request contracts that can be consumed by fetch services and backend implementations.

Design:
  • Requests are immutable and serializable.

  • URL validation is delegated to URL.

  • Policies are embedded so one request object is self-describing.

Examples

>>> request = FetchRequest(url="https://example.com")
>>> request.method
'GET'
class pyfetcher.contracts.request.FetchRequest(*, url, method='GET', params=<factory>, headers=<factory>, data=None, json_data=None, backend='httpx', timeout=<factory>, retry=<factory>, pool=<factory>, stream=<factory>, allow_redirects=True, verify_ssl=True, http2=True)[source]

Transport-agnostic fetch request.

Encapsulates everything needed to make an HTTP request: the target URL, method, headers, body, and all policy objects that control timeout, retry, pooling, and streaming behavior. The request is frozen and backend-agnostic so it can be serialized, queued, or handed to any transport.

Parameters:
  • url (URL) – Target URL (string or URL).

  • method (RequestMethod) – HTTP method (automatically uppercased).

  • params (dict[str, str | int | float | bool]) – Query parameters to append to the URL.

  • headers (dict[str, str]) – Per-request headers (merged with provider headers).

  • data (bytes | str | None) – Optional raw request body (bytes or string).

  • json_data (dict[str, Any] | list[Any] | None) – Optional JSON request body (dict or list).

  • backend (BackendKind) – Preferred HTTP backend.

  • timeout (TimeoutPolicy) – Timeout policy controlling per-phase timeouts.

  • retry (RetryPolicy) – Retry policy controlling backoff and retryable status codes.

  • pool (PoolPolicy) – Pool policy controlling connection limits and concurrency.

  • stream (StreamPolicy) – Stream policy controlling chunk size and byte limits.

  • allow_redirects (bool) – Whether HTTP redirects should be followed.

  • verify_ssl (bool) – Whether TLS certificate verification is enabled.

  • http2 (bool) – Whether HTTP/2 is preferred where the backend supports it.

Examples

>>> FetchRequest(url="https://example.com").backend
'httpx'
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.request.BatchFetchRequest(*, requests, concurrency=None)[source]

Batch request wrapper for multiple fetch operations.

Groups multiple FetchRequest objects for concurrent execution with an optional concurrency override that caps the number of in-flight requests.

Parameters:
  • requests (list[FetchRequest]) – Request objects to execute concurrently.

  • concurrency (int | None) – Optional concurrency override (defaults to pool policy).

Examples

>>> req = FetchRequest(url="https://example.com")
>>> batch = BatchFetchRequest(requests=[req])
>>> len(batch.requests)
1
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Response models for pyfetcher.

Purpose:

Provide normalized response types independent of the underlying transport.

Design:
  • Response objects are transport-agnostic.

  • Streaming chunks are modeled separately from full responses.

  • Batch responses preserve ordering and capture success/failure per request.

Examples

>>> response = FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=200,
...     headers={},
...     backend="httpx",
...     elapsed_ms=10.0,
... )
>>> response.ok
True
class pyfetcher.contracts.response.FetchResponse(*, request_url, final_url, status_code, headers, content_type=None, text=None, body=None, backend, elapsed_ms)[source]

Normalized fetch response.

A transport-agnostic response model that captures the HTTP status, headers, body content, and timing information for a completed request. The ok computed property provides a quick success check.

Parameters:
  • request_url (str) – Original request URL as a string.

  • final_url (str) – Final URL after any redirects.

  • status_code (int) – HTTP status code.

  • headers (dict[str, str]) – Response headers as a flat dict.

  • content_type (str | None) – Response Content-Type header value, if present.

  • text (str | None) – Decoded text body when fully loaded.

  • body (bytes | None) – Raw bytes body when available.

  • backend (BackendKind) – Name of the backend that executed the request.

  • elapsed_ms (float) – Total elapsed time in milliseconds.

Examples

>>> response = FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=204,
...     headers={},
...     backend="httpx",
...     elapsed_ms=1.0,
... )
>>> response.ok
True
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property ok: bool

Return whether the response indicates success.

Returns:

True for 2xx and 3xx status codes.

Examples

>>> FetchResponse(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     status_code=200,
...     headers={},
...     backend="httpx",
...     elapsed_ms=1.0,
... ).ok
True
class pyfetcher.contracts.response.StreamChunk(*, request_url, final_url, backend, index, data)[source]

Single streamed response chunk.

Represents one chunk of a streaming HTTP response, carrying the raw bytes along with positional metadata for ordered reassembly.

Parameters:
  • request_url (str) – Original request URL.

  • final_url (str) – Final URL after redirects.

  • backend (BackendKind) – Backend that produced this chunk.

  • index (int) – Zero-based chunk index within the stream.

  • data (bytes) – Raw bytes payload for this chunk.

Examples

>>> StreamChunk(
...     request_url="https://example.com/",
...     final_url="https://example.com/",
...     backend="aiohttp",
...     index=0,
...     data=b"abc",
... ).index
0
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.response.BatchItemResponse(*, request_url, ok, response=None, error=None)[source]

Result of a single batch item.

Captures either a successful FetchResponse or an error message for one request within a batch execution.

Parameters:
  • request_url (str) – Original request URL.

  • ok (bool) – Whether the item succeeded.

  • response (FetchResponse | None) – The fetch response on success.

  • error (str | None) – Error message string on failure.

Examples

>>> item = BatchItemResponse(
...     request_url="https://example.com/",
...     ok=False,
...     error="boom",
... )
>>> item.ok
False
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.response.BatchFetchResponse(*, items)[source]

Response container for batch fetch execution.

Wraps the results of a BatchFetchRequest, preserving input order so callers can correlate responses to their original requests by index.

Parameters:

items (list[BatchItemResponse]) – Batch item responses in input order.

Examples

>>> BatchFetchResponse(items=[]).items
[]
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Policy models for pyfetcher.

Purpose:

Provide serializable policy objects that control retries, timeouts, connection pooling, and streaming behavior.

Design:
  • Policies are explicit and reusable across transports.

  • Policies are serializable so they can be persisted or queued later.

  • Backend-specific conversion happens outside these models.

Examples

>>> retry = RetryPolicy(attempts=4)
>>> retry.attempts
4
class pyfetcher.contracts.policy.RetryPolicy(*, attempts=3, wait_base_seconds=0.5, wait_max_seconds=8.0, retry_status_codes=<factory>, retry_on_connection_errors=True, reraise=True)[source]

Retry policy shared by fetch services.

Controls how failed requests are retried using exponential backoff. Status codes that trigger retries are configurable, as is whether connection-level errors should be retried.

Parameters:
  • attempts (int) – Total number of attempts including the first call.

  • wait_base_seconds (float) – Base exponential backoff delay in seconds.

  • wait_max_seconds (float) – Maximum delay between attempts in seconds.

  • retry_status_codes (set[int]) – HTTP status codes that should trigger retries.

  • retry_on_connection_errors (bool) – Whether connection errors should retry.

  • reraise (bool) – Whether the final failure should be re-raised to the caller.

Examples

>>> RetryPolicy(attempts=3).attempts
3
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.TimeoutPolicy(*, total_seconds=30.0, connect_seconds=10.0, read_seconds=30.0, write_seconds=30.0, pool_seconds=10.0)[source]

Timeout policy shared by fetch services.

Provides granular timeout control for different phases of an HTTP request. The total_seconds value acts as an overall budget that caps the entire operation regardless of the per-phase values.

Parameters:
  • total_seconds (float) – Overall timeout budget in seconds.

  • connect_seconds (float) – Maximum time to establish a TCP connection.

  • read_seconds (float) – Maximum time to receive the response body.

  • write_seconds (float) – Maximum time to send the request body.

  • pool_seconds (float) – Maximum time to acquire a connection from the pool.

Examples

>>> TimeoutPolicy().total_seconds
30.0
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.PoolPolicy(*, max_connections=100, max_keepalive_connections=20, keepalive_expiry_seconds=10.0, max_connections_per_host=10, concurrency=8)[source]

Connection pooling and concurrency policy.

Controls connection pool sizing and keepalive behavior for transport backends, as well as the concurrency limit used for async batch operations.

Parameters:
  • max_connections (int) – Maximum total connections across all hosts.

  • max_keepalive_connections (int) – Maximum keepalive connections where supported.

  • keepalive_expiry_seconds (float) – Time-to-live for idle keepalive connections.

  • max_connections_per_host (int) – Maximum connections to a single host.

  • concurrency (int) – Maximum in-flight tasks for async batching.

Examples

>>> PoolPolicy(concurrency=8).concurrency
8
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.policy.StreamPolicy(*, chunk_size=65536, decode_text=False, max_bytes=None)[source]

Streaming behavior policy.

Controls chunk sizing and optional byte limits for streaming operations. The decode_text flag signals downstream consumers that text decoding is desired.

Parameters:
  • chunk_size (int) – Size in bytes of each emitted chunk.

  • decode_text (bool) – Whether downstream consumers expect text decoding.

  • max_bytes (int | None) – Optional cap on total consumed bytes (None for unlimited).

Examples

>>> StreamPolicy().chunk_size
65536
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Shared resource models for pyfetcher.

Purpose:

Provide lightweight reusable models for fetched pages and downloadable media that scraper and downloader layers can build on.

Design:
  • Resource models are intentionally generic.

  • They reference URLs through the shared URL.

  • Scraper-specific models should extend or wrap these models rather than replacing them for common cases.

Examples

>>> page = WebPage(url="https://example.com", title="Home")
>>> page.title
'Home'
class pyfetcher.contracts.resource.WebResource(*, url, mime_type=None)[source]

Generic web resource.

Base model for any resource identified by a URL with an optional MIME type. Scraper and downloader models extend this to add domain-specific fields.

Parameters:
  • url (URL) – Resource URL (string or URL).

  • mime_type (str | None) – MIME type if known (e.g. 'text/html').

Examples

>>> WebResource(url="https://example.com/image.png").url.host
'example.com'
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.resource.WebPage(*, url, mime_type=None, title=None, description=None)[source]

Generic fetched web page.

Extends WebResource with optional title and description fields suitable for representing a fetched HTML page.

Parameters:
  • url (URL) – Page URL.

  • mime_type (str | None) – MIME type if known.

  • title (str | None) – Best-effort page title extracted from HTML.

  • description (str | None) – Best-effort page description.

Examples

>>> WebPage(url="https://example.com", title="Home").title
'Home'
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pyfetcher.contracts.resource.MediaResource(*, url, mime_type=None, filename=None, content_length=None)[source]

Generic downloadable media resource.

Extends WebResource with filename and content length fields suitable for representing a downloadable binary resource.

Parameters:
  • url (URL) – Resource URL.

  • mime_type (str | None) – MIME type if known.

  • filename (str | None) – Best-effort filename derived from URL or headers.

  • content_length (int | None) – Content length in bytes if known.

Examples

>>> MediaResource(url="https://example.com/file.mp4", filename="file.mp4").filename
'file.mp4'
model_config = {'extra': 'forbid', 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].