MCP Server
==========

fetchkit ships as a Model Context Protocol (MCP) server, enabling AI agents
to fetch, scrape, extract, and download web content autonomously.

.. mermaid::

   flowchart LR
       subgraph Clients
           Claude["🤖 Claude Desktop"]
           CC["⌨️ Claude Code"]
           LC["🦜 LangChain Agent"]
           Custom["🔧 Any MCP Client"]
       end

       subgraph fetchkit MCP Server
           Tools["16 Tools"]
           Resources["4 Resources"]
           Prompts["4 Prompts"]
       end

       subgraph pyfetcher Core
           Fetch["FetchService"]
           Scrape["Scrape Utils"]
           Headers["Browser Profiles"]
           Extract["Extractors"]
           DL["Downloaders"]
       end

       Claude -- "stdio" --> Tools
       CC -- "stdio" --> Tools
       LC -- "HTTP" --> Tools
       Custom -- "HTTP/stdio" --> Tools

       Tools --> Fetch
       Tools --> Scrape
       Tools --> Headers
       Tools --> Extract
       Tools --> DL

       style Tools fill:#E74C3C,color:#fff
       style Resources fill:#3498DB,color:#fff
       style Prompts fill:#9B59B6,color:#fff
       style Fetch fill:#2ECC71,color:#fff

Installation
------------

.. code-block:: bash

   pip install 'fetchkit[mcp]'

Running
-------

**stdio** (Claude Desktop, Claude Code):

.. code-block:: bash

   pyfetcher-mcp

**HTTP** (LangChain, remote agents):

.. code-block:: bash

   pyfetcher-mcp --http 8000

**Makefile**:

.. code-block:: bash

   make mcp          # stdio
   make mcp-http     # HTTP on port 8000

Tools (16)
----------

.. list-table::
   :header-rows: 1
   :widths: 20 40 20

   * - Tool
     - Description
     - Returns
   * - ``fetch_url``
     - Fetch URL with browser headers
     - FetchResult
   * - ``fetch_multiple``
     - Batch fetch with concurrency
     - list[FetchResult]
   * - ``scrape_css``
     - CSS selector extraction
     - ScrapeResult
   * - ``scrape_links``
     - Link harvesting with classification
     - LinksResult
   * - ``scrape_text``
     - Readable text extraction
     - str
   * - ``scrape_metadata``
     - Page metadata + Open Graph
     - MetadataResult
   * - ``scrape_forms``
     - Form field extraction
     - FormsResult
   * - ``scrape_table``
     - HTML table parsing
     - TableResult
   * - ``check_robots``
     - robots.txt rule checking
     - RobotsResult
   * - ``parse_sitemap``
     - XML sitemap parsing
     - SitemapResult
   * - ``generate_headers``
     - Browser header preview
     - HeadersResult
   * - ``list_profiles``
     - All browser profiles
     - list[ProfileInfo]
   * - ``random_user_agent``
     - Random UA string
     - str
   * - ``extract_article``
     - Article text + markdown
     - ArticleResult
   * - ``convert_html``
     - HTML to markdown/plaintext
     - str
   * - ``download_file``
     - File download with checksum
     - DownloadResult

Resources
---------

- ``pyfetcher://profiles`` -- All browser profiles with details
- ``pyfetcher://profiles/{name}`` -- Headers for a specific profile
- ``pyfetcher://backends`` -- Backend capabilities table
- ``pyfetcher://version`` -- Version and feature info

Prompts
-------

- ``web_research(topic)`` -- Research a topic by fetching and analyzing pages
- ``site_audit(url)`` -- Audit a website's structure, metadata, and links
- ``scrape_guide(url, goal)`` -- Guide for scraping specific data
- ``compare_pages(url1, url2)`` -- Compare two web pages

Claude Desktop Configuration
-----------------------------

Add to ``claude_desktop_config.json``:

.. code-block:: json

   {
     "mcpServers": {
       "pyfetcher": {
         "command": "pyfetcher-mcp",
         "args": []
       }
     }
   }

LangChain Integration
---------------------

.. code-block:: python

   from langchain_mcp_adapters import MultiServerMCPClient
   from langgraph.prebuilt import create_react_agent

   client = MultiServerMCPClient({
       "pyfetcher": {"transport": "http", "url": "http://localhost:8000/mcp"}
   })
   tools = await client.get_tools()
   agent = create_react_agent(model, tools)