SecureLLM Integration Guide

This guide explains how to integrate and configure SecureLLM as a local inference provider in Mortgage-Lite.

Overview

SecureLLM is a local inference server that provides an OpenAI-compatible chat completions API. It allows you to run AI models on your own infrastructure while maintaining full data privacy and control.

Benefits

  • Privacy: All data stays on your infrastructure

  • Control: Full control over model selection and configuration

  • Cost: No per-token cloud API costs

  • Compliance: Meet data residency requirements

  • Performance: Low latency for local deployments

Configuration

SecureLLM can be configured in three ways:

1. Environment Variables (.env)

# SecureLLM Configuration (Kubernetes same namespace: dkubex-apps)
SECURELLM_BASE_URL=http://securellm/securellm/v1
SECURELLM_API_KEY=your_api_key_here

# Note: Models are discovered automatically from the gateway
# No need to specify SECURELLM_MODEL

2. Helm Values (values.yaml)

env:
  - name: SECURELLM_BASE_URL
    value: "http://securellm/securellm/v1"
  - name: SECURELLM_API_KEY
    value: "your_api_key_here"
  # Models are auto-discovered from the gateway

3. UI Settings Page

Navigate to Settings in the Mortgage-Lite UI and configure:

  • SecureLLM Base URL: The endpoint URL (e.g., http://securellm/securellm/v1 for k8s same namespace)

  • SecureLLM API Key: Your API key (optional, depending on your setup)

  • Default AI Mode: Select “SecureLLM (Local Inference)” to use it as default

Note: Available models are discovered automatically from the SecureLLM gateway. You don’t need to configure specific model names.

API Endpoint

SecureLLM uses the OpenAI-compatible chat completions endpoint:

POST {SECURELLM_BASE_URL}/chat/completions

Request Format

{
  "model": "default",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Analyze this mortgage application..."
    }
  ],
  "stream": false
}

Response Format

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the application..."
      }
    }
  ]
}

Usage in Agents

SecureLLM can be used by any agent in the pipeline. Here’s how to configure agents to use SecureLLM:

Option 1: Set as Default AI Mode

In the UI Settings, set Default AI Mode to “SecureLLM (Local Inference)”. All agents will use SecureLLM by default.

Option 2: Programmatic Selection

In agent code, specify the provider:

from app.services.ai import chat

# Use SecureLLM explicitly
result = await chat(
    system_prompt="You are Ana, the mortgage analyzer...",
    messages=[{"role": "user", "content": prompt}],
    provider="securellm"
)

Option 3: Call Model Dispatcher

from app.services.ai import call_model

result = await call_model(
    provider="securellm",
    model="default",
    prompt="Analyze this application...",
    timeout=300
)

Agent-Specific Usage

Ana (Analyzer Agent)

Ana processes raw PII data locally. SecureLLM is ideal for this:

# In ana.py
async def process(self, application: Application, db: AsyncSession) -> str:
    # Build context with raw data
    context = await self._build_context(application, db)
    
    # Use SecureLLM for privacy-safe local processing
    result = await chat(
        system_prompt=self._instructions(application),
        messages=[{"role": "user", "content": context}],
        provider="securellm"
    )
    
    return result

Claire (Compliance Agent)

While Claire typically uses Claude for compliance, you can use SecureLLM for fully local processing:

# In claire.py
async def process(self, application: Application, db: AsyncSession) -> str:
    # Get anonymized data
    anonymized = await self._get_anonymized_data(application, db)
    
    # Use SecureLLM instead of Claude
    result = await chat(
        system_prompt=self._instructions(application),
        messages=[{"role": "user", "content": anonymized}],
        provider="securellm"
    )
    
    return result

Authentication

SecureLLM supports Bearer token authentication:

# Set API key in environment
SECURELLM_API_KEY=your_secret_key

The API key is automatically included in requests:

POST /securellm/v1/chat/completions
Authorization: Bearer your_secret_key
Content-Type: application/json

If your SecureLLM deployment doesn’t require authentication, leave SECURELLM_API_KEY empty.

Network Configuration

Local Deployment

If SecureLLM is on the same machine:

SECURELLM_BASE_URL=http://localhost:8000/v1

Remote Deployment (External Access)

If SecureLLM is accessed via external IP (not recommended for production):

SECURELLM_BASE_URL=http://external-ip/securellm/v1

Kubernetes Cross-Namespace

If SecureLLM is in the same namespace (dkubex-apps):

SECURELLM_BASE_URL=http://securellm/securellm/v1

For same-namespace services, use just the service name. For cross-namespace: <service-name>.<namespace>.svc.cluster.local

Kubernetes Deployment

If SecureLLM is deployed in the same Kubernetes cluster:

env:
  - name: SECURELLM_BASE_URL
    value: "http://securellm/securellm/v1"

Dynamic Model Discovery

Mortgage-Lite automatically discovers available models from the SecureLLM gateway at runtime. This means:

  • No manual configuration needed: Models are detected automatically

  • Real-time availability: Checks for model availability before each LLM call

  • Intelligent fallback: If a model isn’t available in SecureLLM, falls back to Ollama

  • Flexible matching: Supports various model naming conventions

How It Works

  1. Model Discovery: When Ana or Rex agents need an LLM, they first query SecureLLM’s /models endpoint

  2. Availability Check: The system checks if the required model (e.g., qwen3.5:35b) is available

  3. Flexible Matching: Supports exact and partial matches (e.g., qwen3.5:35b matches shared--qwen3-5-35b)

  4. Automatic Fallback: If model not found or request fails, automatically falls back to Ollama

Supported Models

Any model available in your SecureLLM deployment will be automatically detected. Common models:

  • Qwen family: qwen3.5:35b, qwen2-vl, qwen-72b

  • Llama family: llama-3-70b, llama3.1:70b

  • Mistral family: mistral-7b, mistral-nemo

  • Gemma family: gemma-3-1b, gemma-7b

Performance Tuning

Timeout Configuration

Adjust timeout for large models or complex prompts:

result = await chat(
    system_prompt=prompt,
    messages=messages,
    provider="securellm",
    timeout=600  # 10 minutes
)

Concurrent Requests

SecureLLM can handle multiple concurrent requests. Configure in config.py:

max_parallel_documents: int = 4  # Process 4 documents concurrently

Monitoring

Health Check

Test SecureLLM connectivity:

# From within the same namespace
curl -X POST http://securellm/securellm/v1/chat/completions \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

Token Usage Tracking

Mortgage-Lite tracks token usage for all providers including SecureLLM:

SELECT * FROM token_usage WHERE provider = 'securellm';

View in UI: MetricsAgent Performance

Troubleshooting

Connection Refused

RuntimeError: SecureLLM returned 000: Connection refused

Solutions:

  • Verify SecureLLM is running: curl http://securellm/securellm/v1/health

  • Check firewall rules

  • Verify URL is correct (include /v1 in base URL)

Authentication Failed

RuntimeError: SecureLLM returned 401: Unauthorized

Solutions:

  • Verify API key is correct

  • Check if API key is required (some deployments don’t require auth)

  • Ensure Authorization: Bearer header format is correct

Model Not Found

RuntimeError: SecureLLM returned 404: Model not found

Solutions:

  • Check available models in SecureLLM

  • Use default model if unsure

  • Verify model name matches SecureLLM configuration

Timeout Errors

TimeoutError: SecureLLM request timed out after 300s

Solutions:

  • Increase timeout: timeout=600

  • Use smaller model

  • Reduce prompt size

  • Check SecureLLM server resources

Migration from Ollama

If you’re currently using Ollama and want to switch to SecureLLM:

1. Update Configuration

# Old (Ollama)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3.5:35b

# New (SecureLLM - Kubernetes same namespace)
SECURELLM_BASE_URL=http://securellm/securellm/v1
SECURELLM_API_KEY=your_key
SECURELLM_MODEL=qwen-72b

2. Update Default AI Mode

In UI Settings:

  • Change Default AI Mode from “Local (Ollama)” to “SecureLLM (Local Inference)”

3. Test

Run a test application through the pipeline to verify SecureLLM is working correctly.

Best Practices

1. Use for Privacy-Sensitive Data

SecureLLM is ideal for processing raw PII data in Ana agent:

  • All data stays on your infrastructure

  • No data sent to cloud providers

  • Full audit trail

2. Model Selection

Choose appropriate model size:

  • Small models (7B-13B): Fast, lower resource usage, good for simple tasks

  • Medium models (30B-40B): Balanced performance and quality

  • Large models (70B+): Best quality, higher resource requirements

3. Caching

Enable response caching for repeated queries:

  • Reduces inference time

  • Lowers resource usage

  • Improves user experience

4. Load Balancing

For high-volume deployments:

  • Deploy multiple SecureLLM instances

  • Use load balancer

  • Configure in Kubernetes with multiple replicas

Security Considerations

1. API Key Management

  • Store API keys in Kubernetes secrets

  • Never commit API keys to version control

  • Rotate keys regularly

2. Network Security

  • Use TLS/SSL for production deployments

  • Restrict network access to SecureLLM

  • Use VPN or private network

3. Data Privacy

  • SecureLLM processes data locally

  • No data leaves your infrastructure

  • Compliant with GDPR, HIPAA, etc.

Example Deployment

Docker Compose

version: '3.8'

services:
  securellm:
    image: securellm/server:latest
    ports:
      - "8000:8000"
    environment:
      - MODEL_NAME=qwen-72b
      - API_KEY=your_secret_key
    volumes:
      - ./models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  mortgage-lite:
    image: mortgage-lite:latest
    ports:
      - "5300:5300"
    environment:
      - SECURELLM_BASE_URL=http://securellm:8000/v1
      - SECURELLM_API_KEY=your_secret_key
      - SECURELLM_MODEL=qwen-72b
    depends_on:
      - securellm

Kubernetes

apiVersion: v1
kind: Service
metadata:
  name: securellm
spec:
  selector:
    app: securellm
  ports:
    - port: 8000
      targetPort: 8000

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: securellm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: securellm
  template:
    metadata:
      labels:
        app: securellm
    spec:
      containers:
        - name: securellm
          image: securellm/server:latest
          ports:
            - containerPort: 8000
          env:
            - name: MODEL_NAME
              value: "qwen-72b"
            - name: API_KEY
              valueFrom:
                secretKeyRef:
                  name: securellm-secret
                  key: api-key
          resources:
            limits:
              nvidia.com/gpu: 1

Support

For SecureLLM-specific issues:

  • Check SecureLLM documentation

  • Verify API endpoint is accessible

  • Test with curl before integrating

For Mortgage-Lite integration issues:

  • Check logs: tail -f logs/mortgage-lite.log

  • Verify configuration in UI Settings

  • Test with different AI modes to isolate issue

Summary

SecureLLM integration provides:

  • ✅ Full data privacy and control

  • ✅ OpenAI-compatible API

  • ✅ Configurable via .env, Helm, and UI

  • ✅ Support for multiple models

  • ✅ Production-ready deployment options

Configure SecureLLM in Settings and start processing mortgage applications with complete data privacy!