Downloaders¶
Base downloader protocol for pyfetcher.downloaders.
- Purpose:
Define the common interface for all downloader implementations.
- class pyfetcher.downloaders.base.MediaInfo(url, title=None, description=None, duration_seconds=None, thumbnail_url=None, uploader=None, upload_date=None, file_size_bytes=None, mime_type=None, ext=None, extra=<factory>)[source]¶
Extracted media metadata before download.
- class pyfetcher.downloaders.base.DownloadResult(source_url, local_path=None, minio_key=None, minio_bucket=None, filename=None, file_size_bytes=None, mime_type=None, checksum_sha256=None, media_info=None, media_metadata=<factory>)[source]¶
Result of a completed download.
- class pyfetcher.downloaders.base.DownloadProgress(status, downloaded_bytes=0, total_bytes=None, speed_bytes_per_sec=None, eta_seconds=None, filename=None, percent=None)[source]¶
Progress update during a download.
- class pyfetcher.downloaders.base.DownloaderProtocol(*args, **kwargs)[source]¶
Protocol for downloader implementations.
yt-dlp deep integration for pyfetcher.downloaders.
- Purpose:
Wrap yt-dlp’s YoutubeDL Python API with progress hooks, metadata extraction, and structured output for pipeline integration.
- class pyfetcher.downloaders.ytdlp.YtdlpDownloader(*, format_spec='bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best', extra_opts=None)[source]¶
Deep yt-dlp integration via the YoutubeDL Python API.
Hooks into progress_hooks for real-time download tracking and converts info_dict to structured MediaInfo/DownloadResult models.
- Parameters:
- async download(url, *, output_dir=None, progress_callback=None)[source]¶
Download media via yt-dlp.
- Parameters:
url (str) – The URL to download from.
output_dir (str | None) – Directory for downloaded files. Uses temp dir if not provided.
progress_callback (Callable[[DownloadProgress], None] | None) – Optional callback for progress updates.
- Returns:
A list of
DownloadResultobjects.- Return type:
gallery-dl deep integration for pyfetcher.downloaders.
- Purpose:
Wrap gallery-dl’s job/config API for programmatic downloading with metadata capture and file interception.
- class pyfetcher.downloaders.gallerydl.GalleryDlDownloader(*, extra_config=None)[source]¶
Deep gallery-dl integration via its Python API.
Uses gallery-dl’s configuration system and job runner to download images and galleries, capturing per-file metadata.
- async download(url, *, output_dir=None, progress_callback=None)[source]¶
Download all items from a URL.
- Parameters:
url (str) – Gallery or image URL.
output_dir (str | None) – Directory for downloaded files. Uses temp dir if not provided.
progress_callback (Callable[[DownloadProgress], None] | None) – Optional callback for progress updates.
- Returns:
A list of
DownloadResultobjects.- Return type:
Direct HTTP download with MinIO upload for pyfetcher.downloaders.
- Purpose:
Provide direct HTTP file downloads using pyfetcher’s existing fetch infrastructure, with optional streaming to MinIO.
- class pyfetcher.downloaders.direct.DirectDownloader(*, fetch_service=None)[source]¶
Direct HTTP downloader using pyfetcher’s FetchService.
Streams files to disk using the existing streaming infrastructure, then optionally uploads to MinIO.
- Parameters:
fetch_service (FetchService | None) – Optional FetchService instance.