haive.mcp.utils.extract_mcp_github_repos

Enhanced MCP Repository Extractor with README Processing.

This script: 1. Extracts repository URLs from awesome-mcp-servers 2. Downloads and processes README files 3. Converts to LangChain Documents with metadata 4. Organizes resources for agent access

Classes

ExtractionStats

Statistics for extraction process.

MCPCategory

MCP Server Categories.

MCPLanguage

Programming Languages.

MCPPlatform

Supported Platforms.

MCPRepositoryExtractor

Enhanced MCP Repository Extractor.

MCPScope

Server Scope.

MCPServerDocument

Complete MCP Server Document.

MCPServerMetadata

Metadata for an MCP Server.

Functions

create_agent_loader([output_dir])

Create a loader function for agents to access MCP documents.

main()

Main function.

Module Contents

class haive.mcp.utils.extract_mcp_github_repos.ExtractionStats(/, **data)[source]

Bases: pydantic.BaseModel

Statistics for extraction process.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

class haive.mcp.utils.extract_mcp_github_repos.MCPCategory[source]

Bases: str, enum.Enum

MCP Server Categories.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPLanguage[source]

Bases: str, enum.Enum

Programming Languages.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPPlatform[source]

Bases: str, enum.Enum

Supported Platforms.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPRepositoryExtractor(output_dir='agent_resources/mcp_servers')[source]

Enhanced MCP Repository Extractor.

Init .

Parameters:

output_dir (str) – [TODO: Add description]

async extract_all()[source]

Main extraction method.

Return type:

list[MCPServerDocument]

async extract_repositories_from_readme()[source]

Extract repository information from the awesome-mcp-servers. README.

Return type:

list[MCPServerMetadata]

async fetch_github_metadata(metadata)[source]

Fetch additional metadata from GitHub API.

Parameters:

metadata (MCPServerMetadata)

Return type:

None

async fetch_readme_content(metadata)[source]

Fetch README content from GitHub.

Parameters:

metadata (MCPServerMetadata)

Return type:

str | None

generate_statistics_report(documents)[source]

Generate statistics report.

Parameters:

documents (list[MCPServerDocument])

Return type:

None

async process_repository(metadata)[source]

Process a single repository.

Parameters:

metadata (MCPServerMetadata)

Return type:

MCPServerDocument | None

save_documents(documents)[source]

Save documents in various formats.

Parameters:

documents (list[MCPServerDocument])

Return type:

None

class haive.mcp.utils.extract_mcp_github_repos.MCPScope[source]

Bases: str, enum.Enum

Server Scope.

Initialize self. See help(type(self)) for accurate signature.

class haive.mcp.utils.extract_mcp_github_repos.MCPServerDocument(/, **data)[source]

Bases: pydantic.BaseModel

Complete MCP Server Document.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

compute_content_hash()[source]

Compute SHA256 hash of README content.

Return type:

str

to_langchain_document()[source]

Convert to LangChain Document.

Return type:

langchain_core.documents.Document

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class haive.mcp.utils.extract_mcp_github_repos.MCPServerMetadata(/, **data)[source]

Bases: pydantic.BaseModel

Metadata for an MCP Server.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

get_unique_id()[source]

Generate unique ID for this server.

Return type:

str

to_langchain_metadata()[source]

Convert to LangChain Document metadata format.

Return type:

dict[str, Any]

classmethod validate_repo_url(v)[source]

Validate GitHub repository URL.

Parameters:

v (str)

Return type:

str

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

haive.mcp.utils.extract_mcp_github_repos.create_agent_loader(output_dir='agent_resources/mcp_servers')[source]

Create a loader function for agents to access MCP documents.

Parameters:

output_dir (str)

Return type:

callable

async haive.mcp.utils.extract_mcp_github_repos.main()[source]

Main function.