Configuration

Application configuration for pyfetcher.

Purpose:

Provide a centralized, environment-aware configuration object using pydantic-settings. Reads from environment variables (prefixed with PYFETCHER_) and .env files.

Design:
  • All infrastructure connection details (Postgres, MinIO) are configurable.

  • Pipeline concurrency and behavior are tunable per deployment.

  • Defaults are suitable for local development with Docker Compose.

Examples

>>> config = PyfetcherConfig()
>>> config.database_url
'postgresql+asyncpg://pyfetcher:pyfetcher@localhost:5432/pyfetcher'
class pyfetcher.config.PyfetcherConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, database_url='postgresql+asyncpg://pyfetcher:pyfetcher@localhost:5432/pyfetcher', db_pool_size=10, db_max_overflow=20, minio_endpoint='localhost:9000', minio_access_key='minioadmin', minio_secret_key='minioadmin', minio_secure=False, minio_bucket='pyfetcher', crawl_concurrency=10, scrape_concurrency=20, download_concurrency=5, default_crawl_delay_seconds=1.0, max_retries=3)[source]

Centralized configuration for pyfetcher infrastructure and pipeline.

Reads from environment variables prefixed with PYFETCHER_ and from .env files. Defaults are suitable for local development with the provided Docker Compose setup.

Parameters:
  • database_url (str) – SQLAlchemy async connection string for PostgreSQL.

  • db_pool_size (int) – Base connection pool size for asyncpg.

  • db_max_overflow (int) – Maximum overflow connections beyond pool_size.

  • minio_endpoint (str) – MinIO server endpoint (host:port).

  • minio_access_key (str) – MinIO access key.

  • minio_secret_key (str) – MinIO secret key.

  • minio_secure (bool) – Whether to use HTTPS for MinIO connections.

  • minio_bucket (str) – Default bucket name for storing assets.

  • crawl_concurrency (int) – Maximum concurrent crawl workers.

  • scrape_concurrency (int) – Maximum concurrent scrape workers.

  • download_concurrency (int) – Maximum concurrent download workers.

  • default_crawl_delay_seconds (float) – Default politeness delay between requests to the same host.

  • max_retries (int) – Default maximum retry attempts for failed jobs.

  • _case_sensitive (bool | None)

  • _nested_model_default_partial_update (bool | None)

  • _env_prefix (str | None)

  • _env_prefix_target (EnvPrefixTarget | None)

  • _env_file (DotenvType | None)

  • _env_file_encoding (str | None)

  • _env_ignore_empty (bool | None)

  • _env_nested_delimiter (str | None)

  • _env_nested_max_split (int | None)

  • _env_parse_none_str (str | None)

  • _env_parse_enums (bool | None)

  • _cli_prog_name (str | None)

  • _cli_parse_args (bool | list[str] | tuple[str, ...] | None)

  • _cli_settings_source (CliSettingsSource[Any] | None)

  • _cli_parse_none_str (str | None)

  • _cli_hide_none_type (bool | None)

  • _cli_avoid_json (bool | None)

  • _cli_enforce_required (bool | None)

  • _cli_use_class_docs_for_groups (bool | None)

  • _cli_exit_on_error (bool | None)

  • _cli_prefix (str | None)

  • _cli_flag_prefix_char (str | None)

  • _cli_implicit_flags (bool | Literal['dual', 'toggle'] | None)

  • _cli_ignore_unknown_args (bool | None)

  • _cli_kebab_case (bool | Literal['all', 'no_enums'] | None)

  • _cli_shortcuts (Mapping[str, str | list[str]] | None)

  • _secrets_dir (PathType | None)

  • _build_sources (tuple[tuple[PydanticBaseSettingsSource, ...], dict[str, Any]] | None)

model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'PYFETCHER_', 'env_prefix_target': 'variable', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].