haive.mcp.utils.extract_mcp_github_repos¶

Enhanced MCP Repository Extractor with README Processing.

This script: 1. Extracts repository URLs from awesome-mcp-servers 2. Downloads and processes README files 3. Converts to LangChain Documents with metadata 4. Organizes resources for agent access

Classes¶

`ExtractionStats`	Statistics for extraction process.
`MCPCategory`	MCP Server Categories.
`MCPLanguage`	Programming Languages.
`MCPPlatform`	Supported Platforms.
`MCPRepositoryExtractor`	Enhanced MCP Repository Extractor.
`MCPScope`	Server Scope.
`MCPServerDocument`	Complete MCP Server Document.
`MCPServerMetadata`	Metadata for an MCP Server.

Functions¶

`create_agent_loader`([output_dir])	Create a loader function for agents to access MCP documents.
`main`()	Main function.

Module Contents¶

class haive.mcp.utils.extract_mcp_github_repos.ExtractionStats(/, **data)[source]¶

Bases: pydantic.BaseModel

Statistics for extraction process.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

class haive.mcp.utils.extract_mcp_github_repos.MCPCategory[source]¶

Bases: str, enum.Enum

MCP Server Categories.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPLanguage[source]¶

Bases: str, enum.Enum

Programming Languages.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPPlatform[source]¶

Bases: str, enum.Enum

Supported Platforms.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPRepositoryExtractor(output_dir='agent_resources/mcp_servers')[source]¶

Enhanced MCP Repository Extractor.

Init .

Parameters:: output_dir (str) – [TODO: Add description]

async extract_all()[source]¶

Main extraction method.

Return type:: list[MCPServerDocument]

async extract_repositories_from_readme()[source]¶

Extract repository information from the awesome-mcp-servers. README.

Return type:: list[MCPServerMetadata]

async fetch_github_metadata(metadata)[source]¶

Fetch additional metadata from GitHub API.

Parameters:: metadata (MCPServerMetadata)
Return type:: None

async fetch_readme_content(metadata)[source]¶

Fetch README content from GitHub.

Parameters:: metadata (MCPServerMetadata)
Return type:: str | None

generate_statistics_report(documents)[source]¶

Generate statistics report.

Parameters:: documents (list[MCPServerDocument])
Return type:: None

async process_repository(metadata)[source]¶

Process a single repository.

Parameters:: metadata (MCPServerMetadata)
Return type:: MCPServerDocument | None

save_documents(documents)[source]¶

Save documents in various formats.

Parameters:: documents (list[MCPServerDocument])
Return type:: None

class haive.mcp.utils.extract_mcp_github_repos.MCPScope[source]¶

Bases: str, enum.Enum

Server Scope.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPServerDocument(/, **data)[source]¶

Bases: pydantic.BaseModel

Complete MCP Server Document.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

compute_content_hash()[source]¶

Compute SHA256 hash of README content.

Return type:: str

to_langchain_document()[source]¶

Convert to LangChain Document.

Return type:: langchain_core.documents.Document

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class haive.mcp.utils.extract_mcp_github_repos.MCPServerMetadata(/, **data)[source]¶

Bases: pydantic.BaseModel

Metadata for an MCP Server.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

get_unique_id()[source]¶

Generate unique ID for this server.

Return type:: str

to_langchain_metadata()[source]¶

Convert to LangChain Document metadata format.

Return type:: dict[str, Any]

classmethod validate_repo_url(v)[source]¶

Validate GitHub repository URL.

Parameters:: v (str)
Return type:: str

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

haive.mcp.utils.extract_mcp_github_repos.create_agent_loader(output_dir='agent_resources/mcp_servers')[source]¶

Create a loader function for agents to access MCP documents.

Parameters:: output_dir (str)
Return type:: callable

async haive.mcp.utils.extract_mcp_github_repos.main()[source]¶: Main function.