Compare commits

..

2 Commits

Author SHA1 Message Date
akadmin 511137edae Fix: Correct authentication and add ping method 2026-06-14 06:08:38 +00:00
akadmin 491745733f Initial commit: MCP Summary Server 2026-06-14 05:57:46 +00:00
7 changed files with 173 additions and 355 deletions
-32
View File
@@ -1,32 +0,0 @@
# MCP Summary Server - Environment Variables
# Server Configuration
PORT=8080
# Authentication (optional)
# If set, requests must include: Authorization: Bearer <API_KEY>
API_KEY=
# LLM Configuration
OPENAPI_URL=http://localhost:8080/v1
OPENAPI_API_KEY=
MODEL_NAME=gpt-4o
# LLM Call Timeout in seconds (increase for large documents)
LLM_TIMEOUT=120
# Summarization Configuration
# Characters per chunk when splitting long text
CHUNK_SIZE=4000
# Characters of overlap between chunks to maintain context
OVERLAP=200
# Target length for intermediate chunk summaries (words)
TARGET_INTERMEDIATE_SUMMARY_LENGTH=150
# Maximum length for final synthesized summary (words)
MAX_DIRECT_SUMMARY_LENGTH=100
# Maximum text length (characters) before chunking is triggered
MAX_DIRECT_TEXT_LENGTH=8000
-37
View File
@@ -1,37 +0,0 @@
# Dockerfile for MCP Summary Server
#
# Usage (from directory containing this Dockerfile and mcp_summary_server.py):
#
# docker build -t mcp-summary .
# docker run -p 8080:8080 --env-file .env mcp-summary
#
FROM python:3.12-slim
WORKDIR /app
# Install runtime dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && rm requirements.txt
# Copy the server script
COPY mcp_summary_server.py /app/mcp_summary_server.py
# Expose HTTP port
EXPOSE 8080
# Environment variables
ENV PORT=8080
ENV OPENAPI_URL=http://localhost:8080/v1
ENV OPENAPI_API_KEY=
ENV MODEL_NAME=gpt-4o
ENV CHUNK_SIZE=4000
ENV OVERLAP=200
ENV TARGET_INTERMEDIATE_SUMMARY_LENGTH=150
ENV MAX_DIRECT_SUMMARY_LENGTH=100
ENV MAX_DIRECT_TEXT_LENGTH=8000
ENV LLM_TIMEOUT=120
ENV API_KEY=
# Start the MCP summary server
ENTRYPOINT ["python", "-u", "/app/mcp_summary_server.py"]
-137
View File
@@ -1,137 +0,0 @@
# MCP Summary Server
An MCP (Model Context Protocol) server for document summarization that keeps full text out of the chat context window.
## Features
- Automatically determines whether to summarize directly or use chunked summarization
- All processing happens server-side
- Returns only the summary to the client
- Configurable chunking parameters
- Bearer token authentication (optional)
## Setup
### Environment Variables
Copy `.env.example` to `.env` and configure:
```bash
cp .env.example .env
```
| Variable | Default | Description |
|----------|---------|-------------|
| PORT | 8080 | HTTP server port |
| API_KEY | (empty) | Bearer token for authentication |
| OPENAPI_URL | http://localhost:8080/v1 | LLM API endpoint |
| OPENAPI_API_KEY | (empty) | LLM API key |
| MODEL_NAME | gpt-4o | LLM model to use |
| LLM_TIMEOUT | 120 | LLM call timeout in seconds |
| CHUNK_SIZE | 4000 | Characters per chunk |
| OVERLAP | 200 | Characters of overlap between chunks |
| TARGET_INTERMEDIATE_SUMMARY_LENGTH | 150 | Words per chunk summary |
| MAX_DIRECT_SUMMARY_LENGTH | 100 | Max final summary length |
| MAX_DIRECT_TEXT_LENGTH | 8000 | Max text length before chunking |
## Running
### Docker
```bash
# Build
docker build -t mcp-summary .
# Run with environment file
docker run -p 8080:8080 --env-file .env mcp-summary
# Run with inline environment variables
docker run -p 8080:8080 \
-e OPENAPI_URL=http://localhost:8080/v1 \
-e OPENAPI_API_KEY=your-key \
-e MODEL_NAME=gpt-4o \
mcp-summary
```
### Python
```bash
pip install -r requirements.txt
python mcp_summary_server.py
```
## Connecting to OpenWebUI
### In OpenWebUI Admin Settings
1. Go to **Admin Settings → External Tools**
2. Click **+ (Add Server)**
3. Set **Type** to **MCP (Streamable HTTP)**
4. Enter your **Server URL**
5. Set **Authentication**:
- **None** if no API key is configured
- **Bearer** if API_KEY is set (provide the key)
6. Save
### Docker Networking
If running both OpenWebUI and MCP Summary in Docker:
```bash
# Use host.docker.internal to reach host machine
docker run -p 8080:8080 \
-e OPENAPI_URL=http://host.docker.internal:3000/v1 \
-e OPENAPI_API_KEY=your-key \
mcp-summary
```
If both containers are on the same Docker network, use the container name directly:
```bash
docker run --network mynetwork -p 8080:8080 \
-e OPENAPI_URL=http://openwebui-container:8080/v1 \
-e OPENAPI_API_KEY=your-key \
mcp-summary
```
## MCP Tool
### summarize_document
Summarizes a document, automatically handling chunking for long text.
**Parameters:**
- `text` (string, required): The document text to summarize
- `max_length` (integer, optional): Maximum summary length in words (default: 100)
**Returns:**
```json
{
"summary": "The summarized text...",
"original_length": 12345,
"method": "direct", // or "chunked"
"chunks": 1 // number of chunks used
}
```
## Troubleshooting
### "Failed to connect to MCP server"
1. **Check authentication**: Ensure you haven't selected `Bearer` without a key. Switch to `None` if no token is needed.
2. **Check network connectivity**: Ensure OpenWebUI can reach the MCP server URL
3. **Check LLM connectivity**: Ensure the MCP server can reach the LLM at OPENAPI_URL
4. **Check timeouts**: Increase LLM_TIMEOUT if summarization takes too long
### Infinite loading screen
This may occur if you configured the server as OpenAPI instead of MCP. Fix by:
1. Opening Admin Settings → External Tools
2. Disabling/deleting the problematic connection
3. Re-adding with **Type** set to **MCP (Streamable HTTP)**
### Slow initialization
If the server takes longer than 10 seconds to initialize:
- Increase `MCP_INITIALIZE_TIMEOUT` in OpenWebUI (default: 10 seconds)
Binary file not shown.
-34
View File
@@ -1,34 +0,0 @@
#!/bin/bash
# Diagnostic script for MCP Summary Server
echo "================================"
echo "MCP Summary Server Diagnostics"
echo "================================"
# Check if server is running
echo -e "\n1. Checking if server process is running..."
ps aux | grep mcp_summary_server || echo "Server process not found"
# Check if port is listening
echo -e "\n2. Checking if port is listening..."
netstat -tlnp 2>/dev/null | grep 8080 || echo "Port 8080 not listening"
# Test basic connectivity
echo -e "\n3. Testing basic connectivity..."
curl -s http://localhost:8080/ || echo "Cannot connect to localhost:8080"
# Test MCP initialize
echo -e "\n4. Testing MCP initialize..."
curl -s -X POST http://localhost:8080/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}' | jq .
# Test tools list
echo -e "\n5. Testing tools list..."
curl -s -X POST http://localhost:8080/ \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | jq .
echo -e "\n================================"
echo "Diagnostics complete"
echo "================================"
+156 -94
View File
@@ -24,11 +24,32 @@ Auth:
import json
import os
import sys
import logging
from http.server import HTTPServer, BaseHTTPRequestHandler
from typing import Any, Dict, Optional
from typing import Any, Dict, List, Optional
import requests
from requests.exceptions import RequestException
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("mcp-summary")
# MCP Server Configuration
API_KEY = os.environ.get("API_KEY", "").strip()
PORT = int(os.environ.get("PORT", "8080"))
# LLM Configuration
OPENAPI_URL = os.environ.get("OPENAPI_URL", "http://localhost:8080/v1")
OPENAPI_API_KEY = os.environ.get("OPENAPI_API_KEY", "")
MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4o")
# Summarization Configuration
CHUNK_SIZE = int(os.environ.get("CHUNK_SIZE", "4000"))
OVERLAP = int(os.environ.get("OVERLAP", "200"))
TARGET_INTERMEDIATE_SUMMARY_LENGTH = int(os.environ.get("TARGET_INTERMEDIATE_SUMMARY_LENGTH", "150"))
MAX_DIRECT_SUMMARY_LENGTH = int(os.environ.get("MAX_DIRECT_SUMMARY_LENGTH", "100"))
MAX_DIRECT_TEXT_LENGTH = int(os.environ.get("MAX_DIRECT_TEXT_LENGTH", "8000"))
LLM_TIMEOUT = int(os.environ.get("LLM_TIMEOUT", "120"))
# Tool definitions
TOOLS_LIST: Dict[str, Any] = {
@@ -64,8 +85,7 @@ def get_bearer_token(headers: Any) -> Optional[str]:
def require_auth(headers: Any) -> bool:
"""Check authentication if API key is configured."""
# If API_KEY is not set, allow unauthenticated access
"""Check authentication. Returns True if auth passes or is not required."""
if not API_KEY:
return True
@@ -75,55 +95,52 @@ def require_auth(headers: Any) -> bool:
return True
def call_llm(text: str, system_prompt: str, max_tokens: int = 2000) -> str:
"""Make an OpenAPI-compatible LLM call."""
openapi_url = os.environ.get("OPENAPI_URL", "http://localhost:8080/v1")
openapi_api_key = os.environ.get("OPENAPI_API_KEY", "")
model_name = os.environ.get("MODEL_NAME", "gpt-4o")
timeout = int(os.environ.get("LLM_TIMEOUT", "120"))
url = f"{openapi_url}/chat/completions"
def call_llm(messages: List[Dict], temperature: float = 0.3) -> str:
"""Make an OpenAPI-compatible LLM call with error handling."""
url = f"{OPENAPI_URL}/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {openapi_api_key}"
"Authorization": f"Bearer {OPENAPI_API_KEY}"
}
payload = {
"model": model_name,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": text}
],
"temperature": 0.3,
"max_tokens": max_tokens,
"model": MODEL_NAME,
"messages": messages,
"temperature": temperature,
"max_tokens": 2000,
"top_p": 0.9
}
response = requests.post(url, headers=headers, json=payload, timeout=timeout)
response.raise_for_status()
try:
logger.info(f"Calling LLM at {OPENAPI_URL} with model {MODEL_NAME}")
response = requests.post(url, headers=headers, json=payload, timeout=LLM_TIMEOUT)
response.raise_for_status()
data = response.json()
return data["choices"][0]["message"]["content"]
data = response.json()
return data["choices"][0]["message"]["content"]
except RequestException as e:
logger.error(f"LLM request failed: {e}")
raise RuntimeError(f"Failed to connect to LLM at {OPENAPI_URL}: {str(e)}")
except Exception as e:
logger.error(f"LLM call failed: {e}")
raise RuntimeError(f"LLM call failed: {str(e)}")
def chunk_text(text: str) -> list:
def chunk_text(text: str) -> List[str]:
"""Split text into chunks with overlap for summarization."""
chunk_size = int(os.environ.get("CHUNK_SIZE", "4000"))
overlap = int(os.environ.get("OVERLAP", "200"))
if len(text) <= chunk_size:
if len(text) <= CHUNK_SIZE:
return [text]
chunks = []
start = 0
while start < len(text):
end = min(start + chunk_size, len(text))
end = min(start + CHUNK_SIZE, len(text))
# Try to break at sentence/paragraph boundary
break_point = end
for marker in ["\n\n", "\n", ". ", "! ", "? "]:
pos = text.rfind(marker, start + chunk_size // 2, end)
pos = text.rfind(marker, start + CHUNK_SIZE // 2, end)
if pos > start:
break_point = pos
break
@@ -132,84 +149,46 @@ def chunk_text(text: str) -> list:
if chunk.strip():
chunks.append(chunk)
start = break_point - overlap if break_point < len(text) else len(text)
start = break_point - OVERLAP if break_point < len(text) else len(text)
if start >= len(text):
break
logger.info(f"Split text into {len(chunks)} chunks")
return chunks
def summarize_document(text: str, max_length: int = 100) -> dict:
"""
Main summarization function.
def summarize_chunk(chunk_text: str, chunk_num: int, total_chunks: int) -> str:
"""Summarize a single chunk of text."""
system_prompt = f"""You are a precise legal assistant creating concise, accurate summaries.
- If text is short, summarize directly
- If text is long, chunk and summarize each chunk, then synthesize
"""
original_length = len(text)
text = text.strip()
if not text:
raise ValueError("Empty text provided")
max_direct_length = int(os.environ.get("MAX_DIRECT_TEXT_LENGTH", "8000"))
intermediate_length = int(os.environ.get("TARGET_INTERMEDIATE_SUMMARY_LENGTH", "150"))
# Direct summarization for shorter texts
if len(text) <= max_direct_length:
system_prompt = f"""You are a precise legal assistant creating concise, accurate summaries.
Create a summary that:
- Is approximately {max_length} words
- Captures key points and important details
- Uses clear, professional language
- Preserves names, dates, and specific facts
Format as plain text without bullet points."""
user_prompt = f"""Summarize the following document:
{text}
Summary:"""
summary = call_llm(user_prompt, system_prompt)
return {
"summary": summary,
"original_length": original_length,
"method": "direct",
"chunks": 1
}
# Chunked summarization for longer texts
chunks = chunk_text(text)
chunk_summaries = []
for i, chunk in enumerate(chunks, 1):
system_prompt = f"""You are a precise legal assistant creating concise, accurate summaries.
You are processing chunk {i} of {len(chunks)} from a larger document.
You are processing chunk {chunk_num} of {total_chunks} from a larger document.
Create a focused summary that:
- Captures key points and important details
- Is approximately {intermediate_length} words
- Is approximately {TARGET_INTERMEDIATE_SUMMARY_LENGTH} words
- Can be combined with other chunk summaries
- Uses clear, professional language
- Preserves names, dates, and specific facts
Respond as plain text without bullet points."""
user_prompt = f"""Summarize this text (chunk {i} of {len(chunks)}):
user_prompt = f"""Summarize this text (chunk {chunk_num} of {total_chunks}):
{chunk}
{chunk_text}
Summary:"""
chunk_summary = call_llm(user_prompt, system_prompt)
chunk_summaries.append(chunk_summary)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
# Synthesize into final summary
logger.info(f"Summarizing chunk {chunk_num}/{total_chunks}")
return call_llm(messages)
def synthesize_summaries(chunk_summaries: List[str]) -> str:
"""Synthesize multiple chunk summaries into a single final summary."""
combined = "\n\n".join(chunk_summaries)
system_prompt = """You are a precise legal assistant creating executive-level summaries.
@@ -230,7 +209,71 @@ Format as a single paragraph of plain text."""
Final summary:"""
final_summary = call_llm(user_prompt, system_prompt)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
logger.info(f"Synthesizing {len(chunk_summaries)} chunk summaries")
return call_llm(messages)
def summarize_document(text: str, max_length: int = MAX_DIRECT_SUMMARY_LENGTH) -> Dict[str, Any]:
"""
Main summarization function.
- If text is short, summarize directly
- If text is long, chunk and summarize each chunk, then synthesize
"""
original_length = len(text)
text = text.strip()
if not text:
raise ValueError("Empty text provided")
logger.info(f"Summarizing text of {original_length} characters")
# Direct summarization for shorter texts
if len(text) <= MAX_DIRECT_TEXT_LENGTH:
system_prompt = f"""You are a precise legal assistant creating concise, accurate summaries.
Create a summary that:
- Is approximately {max_length} words
- Captures key points and important details
- Uses clear, professional language
- Preserves names, dates, and specific facts
Format as plain text without bullet points."""
user_prompt = f"""Summarize the following document:
{text}
Summary:"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
summary = call_llm(messages)
return {
"summary": summary,
"original_length": original_length,
"method": "direct",
"chunks": 1
}
# Chunked summarization for longer texts
chunks = chunk_text(text)
chunk_summaries = []
for i, chunk in enumerate(chunks, 1):
chunk_summary = summarize_chunk(chunk, i, len(chunks))
chunk_summaries.append(chunk_summary)
final_summary = synthesize_summaries(chunk_summaries)
return {
"summary": final_summary,
@@ -244,8 +287,7 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
"""HTTP handler for MCP summary server."""
def log_message(self, format, *args):
# Quiet logs by default
pass
logger.info(format % args)
def _send_json(self, status: int, payload: Any):
"""Send JSON response."""
@@ -256,8 +298,8 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
self.end_headers()
self.wfile.write(body)
def _auth_or_401(self) -> bool:
"""Check authentication if API key is configured."""
def _auth_or_401(self):
"""Check authentication. Returns False if auth fails."""
try:
return require_auth(self.headers)
except PermissionError:
@@ -270,6 +312,8 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
self._send_json(200, {
"service": "mcp-summary",
"transport": "streamable-http",
"model": MODEL_NAME,
"status": "running",
"docs": "Use POST / with MCP JSON-RPC (initialize, tools/list, tools/call)."
})
return
@@ -278,6 +322,7 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
def do_POST(self):
"""Handle MCP JSON-RPC requests."""
# Streamable HTTP MCP endpoint
if self.path not in ("/", "/mcp"):
self.send_error(404, "Not Found")
return
@@ -301,6 +346,8 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
params = req.get("params") or {}
req_id = req.get("id")
logger.info(f"MCP request: method={method}, id={req_id}")
# MCP: initialize
if method == "initialize":
self._send_json(200, {
@@ -319,6 +366,15 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
})
return
# MCP: ping
if method == "ping":
self._send_json(200, {
"jsonrpc": "2.0",
"id": req_id,
"result": {}
})
return
# MCP: tools/list
if method == "tools/list":
self._send_json(200, {
@@ -344,6 +400,7 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
}
})
except Exception as e:
logger.error(f"Tool call failed: {e}", exc_info=True)
self._send_json(200, {
"jsonrpc": "2.0",
"id": req_id,
@@ -364,7 +421,7 @@ class MCPSummaryHandler(BaseHTTPRequestHandler):
if not text:
raise ValueError("Text parameter is required")
max_length = args.get("max_length", 100)
max_length = args.get("max_length", MAX_DIRECT_SUMMARY_LENGTH)
return summarize_document(text, max_length)
raise ValueError(f"Unknown tool: {name}")
@@ -376,6 +433,11 @@ def main():
server = HTTPServer(("0.0.0.0", port), MCPSummaryHandler)
mode = "auth enabled (Bearer)" if API_KEY else "no auth (API_KEY not set)"
print(f"MCP Summary Server listening on 0.0.0.0:{port} [{mode}]")
print(f" - Model: {MODEL_NAME}")
print(f" - LLM URL: {OPENAPI_URL}")
print(f" - Chunk size: {CHUNK_SIZE} characters")
print(f" - Max direct text: {MAX_DIRECT_TEXT_LENGTH} characters")
print(f" - LLM timeout: {LLM_TIMEOUT} seconds")
try:
server.serve_forever()
except KeyboardInterrupt:
-4
View File
@@ -1,4 +0,0 @@
# requirements.txt for MCP Summary Server
# HTTP requests for LLM communication
requests>=2.31.0