haive.core.engine.document.loaders.specific.web¶
Web Loaders for Document Engine.
This module implements specialized web loaders for different types of web content including GitHub, ArXiv, Wikipedia, and general web pages.
Classes¶
ArXiv research paper source. |
|
Basic web source for simple HTML pages. |
|
GitHub repository and content source. |
|
Advanced web source using Playwright for JavaScript-heavy sites. |
|
Wikipedia article source. |
Module Contents¶
- class haive.core.engine.document.loaders.specific.web.ArXivSource(query=None, paper_id=None, max_results=10, **kwargs)[source]¶
Bases:
haive.core.engine.document.loaders.sources.implementation.WebUrlSourceArXiv research paper source.
Init .
- Parameters:
- class haive.core.engine.document.loaders.specific.web.BasicWebSource(web_paths, requests_kwargs=None, **kwargs)[source]¶
Bases:
haive.core.engine.document.loaders.sources.implementation.WebUrlSourceBasic web source for simple HTML pages.
Init .
- Parameters:
- class haive.core.engine.document.loaders.specific.web.GitHubSource(repo_url, file_filter=None, include_issues=False, include_pull_requests=False, **kwargs)[source]¶
Bases:
haive.core.engine.document.loaders.sources.implementation.WebUrlSourceGitHub repository and content source.
Init .
- Parameters:
- create_loader()[source]¶
Create a GitHub loader.
- Return type:
langchain_core.document_loaders.base.BaseLoader | None
- class haive.core.engine.document.loaders.specific.web.PlaywrightWebSource(urls, wait_until='networkidle', headless=True, **kwargs)[source]¶
Bases:
haive.core.engine.document.loaders.sources.implementation.WebUrlSourceAdvanced web source using Playwright for JavaScript-heavy sites.
Init .
- Parameters:
- class haive.core.engine.document.loaders.specific.web.WikipediaSource(query=None, page_title=None, lang='en', load_max_docs=1, **kwargs)[source]¶
Bases:
haive.core.engine.document.loaders.sources.implementation.WebUrlSourceWikipedia article source.
Init .
- Parameters: