# SecureLLM Integration Guide This guide explains how to integrate and configure SecureLLM as a local inference provider in Mortgage-Lite. ## Overview SecureLLM is a local inference server that provides an OpenAI-compatible chat completions API. It allows you to run AI models on your own infrastructure while maintaining full data privacy and control. ### Benefits - **Privacy**: All data stays on your infrastructure - **Control**: Full control over model selection and configuration - **Cost**: No per-token cloud API costs - **Compliance**: Meet data residency requirements - **Performance**: Low latency for local deployments ## Configuration SecureLLM can be configured in three ways: ### 1. Environment Variables (.env) ```bash # SecureLLM Configuration (Kubernetes same namespace: dkubex-apps) SECURELLM_BASE_URL=http://securellm/securellm/v1 SECURELLM_API_KEY=your_api_key_here # Note: Models are discovered automatically from the gateway # No need to specify SECURELLM_MODEL ``` ### 2. Helm Values (values.yaml) ```yaml env: - name: SECURELLM_BASE_URL value: "http://securellm/securellm/v1" - name: SECURELLM_API_KEY value: "your_api_key_here" # Models are auto-discovered from the gateway ``` ### 3. UI Settings Page Navigate to **Settings** in the Mortgage-Lite UI and configure: - **SecureLLM Base URL**: The endpoint URL (e.g., `http://securellm/securellm/v1` for k8s same namespace) - **SecureLLM API Key**: Your API key (optional, depending on your setup) - **Default AI Mode**: Select "SecureLLM (Local Inference)" to use it as default **Note**: Available models are discovered automatically from the SecureLLM gateway. You don't need to configure specific model names. ## API Endpoint SecureLLM uses the OpenAI-compatible chat completions endpoint: ``` POST {SECURELLM_BASE_URL}/chat/completions ``` ### Request Format ```json { "model": "default", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Analyze this mortgage application..." } ], "stream": false } ``` ### Response Format ```json { "choices": [ { "message": { "role": "assistant", "content": "Based on the application..." } } ] } ``` ## Usage in Agents SecureLLM can be used by any agent in the pipeline. Here's how to configure agents to use SecureLLM: ### Option 1: Set as Default AI Mode In the UI Settings, set **Default AI Mode** to "SecureLLM (Local Inference)". All agents will use SecureLLM by default. ### Option 2: Programmatic Selection In agent code, specify the provider: ```python from app.services.ai import chat # Use SecureLLM explicitly result = await chat( system_prompt="You are Ana, the mortgage analyzer...", messages=[{"role": "user", "content": prompt}], provider="securellm" ) ``` ### Option 3: Call Model Dispatcher ```python from app.services.ai import call_model result = await call_model( provider="securellm", model="default", prompt="Analyze this application...", timeout=300 ) ``` ## Agent-Specific Usage ### Ana (Analyzer Agent) Ana processes raw PII data locally. SecureLLM is ideal for this: ```python # In ana.py async def process(self, application: Application, db: AsyncSession) -> str: # Build context with raw data context = await self._build_context(application, db) # Use SecureLLM for privacy-safe local processing result = await chat( system_prompt=self._instructions(application), messages=[{"role": "user", "content": context}], provider="securellm" ) return result ``` ### Claire (Compliance Agent) While Claire typically uses Claude for compliance, you can use SecureLLM for fully local processing: ```python # In claire.py async def process(self, application: Application, db: AsyncSession) -> str: # Get anonymized data anonymized = await self._get_anonymized_data(application, db) # Use SecureLLM instead of Claude result = await chat( system_prompt=self._instructions(application), messages=[{"role": "user", "content": anonymized}], provider="securellm" ) return result ``` ## Authentication SecureLLM supports Bearer token authentication: ```bash # Set API key in environment SECURELLM_API_KEY=your_secret_key ``` The API key is automatically included in requests: ```http POST /securellm/v1/chat/completions Authorization: Bearer your_secret_key Content-Type: application/json ``` If your SecureLLM deployment doesn't require authentication, leave `SECURELLM_API_KEY` empty. ## Network Configuration ### Local Deployment If SecureLLM is on the same machine: ```bash SECURELLM_BASE_URL=http://localhost:8000/v1 ``` ### Remote Deployment (External Access) If SecureLLM is accessed via external IP (not recommended for production): ```bash SECURELLM_BASE_URL=http://external-ip/securellm/v1 ``` ### Kubernetes Cross-Namespace If SecureLLM is in the same namespace (`dkubex-apps`): ```bash SECURELLM_BASE_URL=http://securellm/securellm/v1 ``` For same-namespace services, use just the service name. For cross-namespace: `..svc.cluster.local` ### Kubernetes Deployment If SecureLLM is deployed in the same Kubernetes cluster: ```yaml env: - name: SECURELLM_BASE_URL value: "http://securellm/securellm/v1" ``` ## Dynamic Model Discovery Mortgage-Lite automatically discovers available models from the SecureLLM gateway at runtime. This means: - **No manual configuration needed**: Models are detected automatically - **Real-time availability**: Checks for model availability before each LLM call - **Intelligent fallback**: If a model isn't available in SecureLLM, falls back to Ollama - **Flexible matching**: Supports various model naming conventions ### How It Works 1. **Model Discovery**: When Ana or Rex agents need an LLM, they first query SecureLLM's `/models` endpoint 2. **Availability Check**: The system checks if the required model (e.g., `qwen3.5:35b`) is available 3. **Flexible Matching**: Supports exact and partial matches (e.g., `qwen3.5:35b` matches `shared--qwen3-5-35b`) 4. **Automatic Fallback**: If model not found or request fails, automatically falls back to Ollama ### Supported Models Any model available in your SecureLLM deployment will be automatically detected. Common models: - Qwen family: `qwen3.5:35b`, `qwen2-vl`, `qwen-72b` - Llama family: `llama-3-70b`, `llama3.1:70b` - Mistral family: `mistral-7b`, `mistral-nemo` - Gemma family: `gemma-3-1b`, `gemma-7b` ## Performance Tuning ### Timeout Configuration Adjust timeout for large models or complex prompts: ```python result = await chat( system_prompt=prompt, messages=messages, provider="securellm", timeout=600 # 10 minutes ) ``` ### Concurrent Requests SecureLLM can handle multiple concurrent requests. Configure in `config.py`: ```python max_parallel_documents: int = 4 # Process 4 documents concurrently ``` ## Monitoring ### Health Check Test SecureLLM connectivity: ```bash # From within the same namespace curl -X POST http://securellm/securellm/v1/chat/completions \ -H "Authorization: Bearer your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [{"role": "user", "content": "Hello"}], "stream": false }' ``` ### Token Usage Tracking Mortgage-Lite tracks token usage for all providers including SecureLLM: ```sql SELECT * FROM token_usage WHERE provider = 'securellm'; ``` View in UI: **Metrics** → **Agent Performance** ## Troubleshooting ### Connection Refused ``` RuntimeError: SecureLLM returned 000: Connection refused ``` **Solutions**: - Verify SecureLLM is running: `curl http://securellm/securellm/v1/health` - Check firewall rules - Verify URL is correct (include `/v1` in base URL) ### Authentication Failed ``` RuntimeError: SecureLLM returned 401: Unauthorized ``` **Solutions**: - Verify API key is correct - Check if API key is required (some deployments don't require auth) - Ensure `Authorization: Bearer` header format is correct ### Model Not Found ``` RuntimeError: SecureLLM returned 404: Model not found ``` **Solutions**: - Check available models in SecureLLM - Use `default` model if unsure - Verify model name matches SecureLLM configuration ### Timeout Errors ``` TimeoutError: SecureLLM request timed out after 300s ``` **Solutions**: - Increase timeout: `timeout=600` - Use smaller model - Reduce prompt size - Check SecureLLM server resources ## Migration from Ollama If you're currently using Ollama and want to switch to SecureLLM: ### 1. Update Configuration ```bash # Old (Ollama) OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=qwen3.5:35b # New (SecureLLM - Kubernetes same namespace) SECURELLM_BASE_URL=http://securellm/securellm/v1 SECURELLM_API_KEY=your_key SECURELLM_MODEL=qwen-72b ``` ### 2. Update Default AI Mode In UI Settings: - Change **Default AI Mode** from "Local (Ollama)" to "SecureLLM (Local Inference)" ### 3. Test Run a test application through the pipeline to verify SecureLLM is working correctly. ## Best Practices ### 1. Use for Privacy-Sensitive Data SecureLLM is ideal for processing raw PII data in Ana agent: - All data stays on your infrastructure - No data sent to cloud providers - Full audit trail ### 2. Model Selection Choose appropriate model size: - **Small models (7B-13B)**: Fast, lower resource usage, good for simple tasks - **Medium models (30B-40B)**: Balanced performance and quality - **Large models (70B+)**: Best quality, higher resource requirements ### 3. Caching Enable response caching for repeated queries: - Reduces inference time - Lowers resource usage - Improves user experience ### 4. Load Balancing For high-volume deployments: - Deploy multiple SecureLLM instances - Use load balancer - Configure in Kubernetes with multiple replicas ## Security Considerations ### 1. API Key Management - Store API keys in Kubernetes secrets - Never commit API keys to version control - Rotate keys regularly ### 2. Network Security - Use TLS/SSL for production deployments - Restrict network access to SecureLLM - Use VPN or private network ### 3. Data Privacy - SecureLLM processes data locally - No data leaves your infrastructure - Compliant with GDPR, HIPAA, etc. ## Example Deployment ### Docker Compose ```yaml version: '3.8' services: securellm: image: securellm/server:latest ports: - "8000:8000" environment: - MODEL_NAME=qwen-72b - API_KEY=your_secret_key volumes: - ./models:/models deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] mortgage-lite: image: mortgage-lite:latest ports: - "5300:5300" environment: - SECURELLM_BASE_URL=http://securellm:8000/v1 - SECURELLM_API_KEY=your_secret_key - SECURELLM_MODEL=qwen-72b depends_on: - securellm ``` ### Kubernetes ```yaml apiVersion: v1 kind: Service metadata: name: securellm spec: selector: app: securellm ports: - port: 8000 targetPort: 8000 --- apiVersion: apps/v1 kind: Deployment metadata: name: securellm spec: replicas: 2 selector: matchLabels: app: securellm template: metadata: labels: app: securellm spec: containers: - name: securellm image: securellm/server:latest ports: - containerPort: 8000 env: - name: MODEL_NAME value: "qwen-72b" - name: API_KEY valueFrom: secretKeyRef: name: securellm-secret key: api-key resources: limits: nvidia.com/gpu: 1 ``` ## Support For SecureLLM-specific issues: - Check SecureLLM documentation - Verify API endpoint is accessible - Test with curl before integrating For Mortgage-Lite integration issues: - Check logs: `tail -f logs/mortgage-lite.log` - Verify configuration in UI Settings - Test with different AI modes to isolate issue ## Summary SecureLLM integration provides: - ✅ Full data privacy and control - ✅ OpenAI-compatible API - ✅ Configurable via .env, Helm, and UI - ✅ Support for multiple models - ✅ Production-ready deployment options Configure SecureLLM in Settings and start processing mortgage applications with complete data privacy!