Downloaders¶

Base downloader protocol for pyfetcher.downloaders.

Purpose:: Define the common interface for all downloader implementations.

class pyfetcher.downloaders.base.MediaInfo(url, title=None, description=None, duration_seconds=None, thumbnail_url=None, uploader=None, upload_date=None, file_size_bytes=None, mime_type=None, ext=None, extra=<factory>)[source]¶

Extracted media metadata before download.

Parameters:

url (str)
title (str | None)
description (str | None)
duration_seconds (float | None)
thumbnail_url (str | None)
uploader (str | None)
upload_date (str | None)
file_size_bytes (int | None)
mime_type (str | None)
ext (str | None)
extra (dict[str, Any])

class pyfetcher.downloaders.base.DownloadResult(source_url, local_path=None, minio_key=None, minio_bucket=None, filename=None, file_size_bytes=None, mime_type=None, checksum_sha256=None, media_info=None, media_metadata=<factory>)[source]¶

Result of a completed download.

Parameters:

source_url (str)
local_path (str | None)
minio_key (str | None)
minio_bucket (str | None)
filename (str | None)
file_size_bytes (int | None)
mime_type (str | None)
checksum_sha256 (str | None)
media_info (MediaInfo | None)
media_metadata (dict[str, Any])

class pyfetcher.downloaders.base.DownloadProgress(status, downloaded_bytes=0, total_bytes=None, speed_bytes_per_sec=None, eta_seconds=None, filename=None, percent=None)[source]¶

Progress update during a download.

Parameters:

status (str)
downloaded_bytes (int)
total_bytes (int | None)
speed_bytes_per_sec (float | None)
eta_seconds (float | None)
filename (str | None)
percent (float | None)

class pyfetcher.downloaders.base.DownloaderProtocol(*args, **kwargs)[source]¶: Protocol for downloader implementations.

yt-dlp deep integration for pyfetcher.downloaders.

Purpose:: Wrap yt-dlp’s YoutubeDL Python API with progress hooks, metadata extraction, and structured output for pipeline integration.

class pyfetcher.downloaders.ytdlp.YtdlpDownloader(*, format_spec='bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best', extra_opts=None)[source]¶

Deep yt-dlp integration via the YoutubeDL Python API.

Hooks into progress_hooks for real-time download tracking and converts info_dict to structured MediaInfo/DownloadResult models.

Parameters:

format_spec (str) – yt-dlp format selection string.
extra_opts (dict[str, Any] | None) – Additional yt-dlp options dict.

async extract_info(url)[source]¶

Extract metadata without downloading.

Parameters:: url (str) – The URL to extract info from.
Returns:: A list of MediaInfo objects (one per video/track).
Return type:: list[MediaInfo]

async download(url, *, output_dir=None, progress_callback=None)[source]¶

Download media via yt-dlp.

Parameters:

url (str) – The URL to download from.
output_dir (str | None) – Directory for downloaded files. Uses temp dir if not provided.
progress_callback (Callable[[DownloadProgress], None] | None) – Optional callback for progress updates.

Returns:

A list of DownloadResult objects.

Return type:

list[DownloadResult]

gallery-dl deep integration for pyfetcher.downloaders.

Purpose:: Wrap gallery-dl’s job/config API for programmatic downloading with metadata capture and file interception.

class pyfetcher.downloaders.gallerydl.GalleryDlDownloader(*, extra_config=None)[source]¶

Deep gallery-dl integration via its Python API.

Uses gallery-dl’s configuration system and job runner to download images and galleries, capturing per-file metadata.

Parameters:: extra_config (dict[str, Any] | None) – Additional gallery-dl configuration dict.

async extract_info(url)[source]¶

Extract metadata for all downloadable items without downloading.

Parameters:: url (str) – Gallery or image URL.
Returns:: A list of MediaInfo objects.
Return type:: list[MediaInfo]

async download(url, *, output_dir=None, progress_callback=None)[source]¶

Download all items from a URL.

Parameters:

url (str) – Gallery or image URL.
output_dir (str | None) – Directory for downloaded files. Uses temp dir if not provided.
progress_callback (Callable[[DownloadProgress], None] | None) – Optional callback for progress updates.

Returns:

A list of DownloadResult objects.

Return type:

list[DownloadResult]

Direct HTTP download with MinIO upload for pyfetcher.downloaders.

Purpose:: Provide direct HTTP file downloads using pyfetcher’s existing fetch infrastructure, with optional streaming to MinIO.

class pyfetcher.downloaders.direct.DirectDownloader(*, fetch_service=None)[source]¶

Direct HTTP downloader using pyfetcher’s FetchService.

Streams files to disk using the existing streaming infrastructure, then optionally uploads to MinIO.

Parameters:: fetch_service (FetchService | None) – Optional FetchService instance.

async extract_info(url)[source]¶

Extract info via HEAD request.

Parameters:: url (str) – File URL.
Returns:: A list with one MediaInfo.
Return type:: list[MediaInfo]

async download(url, *, output_dir=None, progress_callback=None)[source]¶

Download a file via HTTP streaming.

Parameters:

url (str) – File URL.
output_dir (str | None) – Output directory. Uses temp dir if not provided.
progress_callback (object | None) – Not used for direct downloads.

Returns:

A list with one DownloadResult.

Return type:

list[DownloadResult]