Skip to content

Architecture Decisions

Key architectural decisions and their rationale.

Overview

Merlya is built on a modular architecture designed for extensibility, security, and ease of use. Version 0.8.3 introduces the specialist agent architecture where a single MerlyaAgent delegates to focused specialist agents for different operation types.

graph TB
    subgraph "User Interface"
        CLI[CLI / REPL]
    end

    subgraph "Parsing Layer"
        SmartExtractor[SmartExtractor<br/>Host Detection]
    end

    subgraph "Agent"
        MerlyaAgent[MerlyaAgent<br/>Orchestration + Delegation]
    end

    subgraph "Specialist Agents"
        DiagSpec[DiagnosticSpecialist<br/>Read-Only]
        ExecSpec[ExecutionSpecialist<br/>Mutations + HITL]
        SecSpec[SecuritySpecialist<br/>Security Audits]
        QuerySpec[QuerySpecialist<br/>Inventory Queries]
    end

    subgraph "Pipelines"
        Ansible[AnsiblePipeline]
        Terraform[TerraformPipeline]
        K8s[KubernetesPipeline]
        Bash[BashPipeline]
    end

    subgraph "Infrastructure"
        SSH[SSH Pool]
        Capabilities[Capability Detector]
    end

    subgraph "LLM Providers"
        Brain[Brain Model<br/>Reasoning]
        Fast[Fast Model<br/>Routing]
    end

    CLI --> SmartExtractor
    SmartExtractor --> MerlyaAgent
    MerlyaAgent -->|delegate_diagnostic| DiagSpec
    MerlyaAgent -->|delegate_execution| ExecSpec
    MerlyaAgent -->|delegate_security| SecSpec
    MerlyaAgent -->|delegate_query| QuerySpec
    DiagSpec --> SSH
    ExecSpec --> Capabilities
    Capabilities --> Ansible
    Capabilities --> Terraform
    Capabilities --> K8s
    Capabilities --> Bash
    MerlyaAgent --> Brain
    DiagSpec --> Brain
    ExecSpec --> Brain

ADR-001: PydanticAI for Agent Framework

Status: Accepted

Context: We needed a framework for building LLM-powered agents with tool calling capabilities.

Decision: Use PydanticAI as the agent framework.

Rationale: - Type-safe with Pydantic models - Native async support - Clean tool definition API - Multi-provider support - Active development and community

Consequences: - Dependency on PydanticAI - Benefits from upstream improvements - May need to adapt to API changes


ADR-002: AsyncSSH for SSH Connections

Status: Accepted

Context: SSH connectivity is core to Merlya's functionality.

Decision: Use asyncssh for all SSH operations.

Rationale: - Pure Python, no external dependencies - Async-native for concurrent connections - Full SSH2 protocol support - Jump host support built-in - Active maintenance

Consequences: - Async-only API - Connection pooling needed for performance - Memory overhead for many connections


ADR-003: Connection Pooling

Status: Accepted

Context: Frequent SSH connections are slow and resource-intensive.

Decision: Implement connection pooling with automatic cleanup.

Rationale: - Avoid connection overhead for repeated commands - Limit concurrent connections - Automatic cleanup of idle connections - Graceful shutdown handling

Implementation:

class SSHPool:
    max_connections: int = 10
    idle_timeout: int = 300  # 5 minutes

    async def get_connection(host) -> SSHConnection
    async def release_connection(conn)
    async def cleanup_idle()


ADR-004: Keyring for Credential Storage

Status: Accepted

Context: API keys and credentials need secure storage.

Decision: Use the system keyring (via keyring library).

Rationale: - OS-level security (Keychain, Secret Service, Credential Manager) - No plaintext secrets in config files - Standard cross-platform solution - Fallback to in-memory with warning

Consequences: - Requires keyring backend on Linux - May need user interaction for first-time setup - Headless servers need alternative setup


ADR-005: Local Intent Classification

Status: Superseded by ADR-013 (v0.8.0)

Context: Not all user inputs require LLM processing.

Decision: Use pattern-based local classifier for intent routing with LLM fallback.

Note: This ADR has been superseded by ADR-013 (DIAGNOSTIC/CHANGE Center Architecture) which provides a more sophisticated routing system with CenterClassifier.


ADR-006: YAML Configuration with SQLite Storage

Status: Accepted

Context: Configuration needs to be human-readable and editable. Host inventory requires structured storage with querying capabilities.

Decision: Use YAML for configuration files and SQLite for host inventory.

Rationale: - YAML: Human-friendly syntax, widely used in DevOps - YAML: Native support for complex structures - SQLite: Fast querying for host lookups - SQLite: Supports tagging, filtering, and search - SQLite: Single file, no server needed

File Locations: - ~/.merlya/config.yaml - Main configuration - ~/.merlya/merlya.db - Host inventory (SQLite) - ~/.merlya/logs/ - Log files - ~/.merlya/history - Command history

Configuration Example:

general:
  language: en
  log_level: info

model:
  provider: openrouter
  model: amazon/nova-2-lite-v1:free


ADR-007: Plugin Architecture

Status: Proposed

Context: Users need to extend Merlya with custom tools.

Decision: Entry-point based plugin system.

Rationale: - Standard Python packaging approach - Easy distribution via PyPI - Namespace isolation - Lazy loading

Proposed Interface:

# pyproject.toml
[project.entry-points."merlya.tools"]
my_tool = "my_package:MyTool"

# Implementation
class MyTool(MerlyaTool):
    name = "my_tool"
    description = "Does something useful"

    async def execute(self, params: dict) -> ToolResult:
        ...


ADR-008: Loguru for Logging

Status: Accepted

Context: Debugging and monitoring require good logging with minimal configuration.

Decision: Use loguru for logging.

Rationale: - Zero-configuration out of the box - Automatic rotation and retention - Human-readable colored output for development - Exception catching with full traceback - Easy to use API (logger.info(), logger.error()) - Async-safe

Implementation:

from loguru import logger

logger.info("Operation completed successfully")
logger.warning("Connection retry required")
logger.error("SSH connection failed: {error}", error=e)

Log Levels: - DEBUG: Detailed execution flow - INFO: Key operations and results - WARNING: Recoverable issues - ERROR: Failures requiring attention


ADR-009: Multi-Provider LLM Support

Status: Accepted

Context: Users have different LLM provider preferences and cost constraints.

Decision: Abstract LLM provider interface with multiple implementations. OpenRouter as default for free tier access.

Supported Providers: - OpenRouter (default) - 100+ models, free tier available - OpenAI (GPT-4o, GPT-4o-mini) - Anthropic (Claude 3.5 Sonnet, Haiku) - Ollama (local models, no API key needed)

Interface:

class LLMProvider(Protocol):
    async def generate(prompt: str) -> str
    async def generate_with_tools(prompt: str, tools: list) -> ToolCall


ADR-010: Security Model

Status: Accepted

Context: Merlya executes commands on remote systems.

Decision: Implement defense-in-depth security model.

Layers: 1. Credential Protection - Keyring storage 2. Command Review - User confirmation for destructive commands 3. Input Validation - Pydantic models for all inputs 4. Audit Logging - All commands logged 5. Principle of Least Privilege - Minimal permissions

Dangerous Command Detection:

DANGEROUS_PATTERNS = [
    r"rm\s+-rf",
    r"mkfs",
    r"dd\s+if=",
    r">\s*/dev/",
    r"chmod\s+777",
]


ADR-011: Non-Interactive Mode Credential Handling

Status: Accepted (v0.7.8)

Context: When running in non-interactive mode (merlya run --yes), the agent cannot prompt users for credentials. Previously, this led to infinite retry loops when sudo/su commands required passwords.

Decision: Fail-fast with clear error messages when credentials are needed but cannot be obtained.

Rationale:

  • Immediate failure prevents wasted API calls and timeouts
  • Clear error messages guide users to proper solutions
  • Three resolution paths documented: keyring, NOPASSWD, interactive mode
  • permanent_failure flag tells agent to stop retrying

Implementation:

# In request_credentials() and ssh_execute()
if ctx.auto_confirm and missing_credentials:
    return CommandResult(
        success=False,
        message="Cannot obtain credentials in non-interactive mode",
        data={
            "non_interactive": True,
            "permanent_failure": True,  # Signal: do not retry
        }
    )

Solutions for users:

  1. Store credentials in keyring: merlya secret set sudo:host:password
  2. Configure NOPASSWD sudo on target hosts
  3. Run in interactive mode (without --yes)

Consequences:

  • Clear failure instead of timeout loops
  • Better CI/CD integration (fast failure)
  • Requires pre-configuration for automated elevated commands

ADR-012: ElevationMethod Enum for Host Configuration

Status: Accepted (v0.7.8)

Context: Elevation method was stored as strings with inconsistent validation, causing NULL values and validation errors.

Decision: Use ElevationMethod enum with explicit values and proper NULL handling.

Enum Values:

class ElevationMethod(str, Enum):
    NONE = "none"              # No elevation
    SUDO = "sudo"              # sudo (NOPASSWD)
    SUDO_PASSWORD = "sudo_password"  # sudo with password
    DOAS = "doas"              # doas (NOPASSWD)
    DOAS_PASSWORD = "doas_password"  # doas with password
    SU = "su"                  # su with password

Handling:

  • NULL in database → ElevationMethod.NONE
  • Invalid strings → ElevationMethod.NONE
  • /hosts edit uses enum mapping
  • Import (TOML/CSV) maps strings to enum values

Consequences:

  • Type-safe elevation configuration
  • No more validation errors on NULL
  • Consistent behavior across all code paths

ADR-013: DIAGNOSTIC/CHANGE Center Architecture

Status: Superseded by ADR-018 (v0.8.3)

Context: Merlya needed a clear separation between read-only investigation and state-changing operations to improve safety and provide appropriate guardrails for each type of operation.

Decision: Implement two operational centers with different security models:

Center Purpose Risk Level HITL Required
DIAGNOSTIC Read-only investigation LOW No
CHANGE Controlled mutations HIGH Yes

Note: The Centers architecture was replaced in v0.8.3 by the specialist agent model (ADR-018). The DiagnosticCenter and ChangeCenter classes no longer exist; their responsibilities are now handled by DiagnosticSpecialist and ExecutionSpecialist respectively.


ADR-014: Pipeline System for CHANGE Operations

Status: Accepted (v0.8.0)

Context: All state-changing operations need consistent validation, preview, approval, and rollback capabilities.

Decision: Implement mandatory pipeline stages for all CHANGE operations:

Plan → Diff/Dry-run → Summary → HITL → Apply → Post-check → Rollback

Pipeline Types:

Pipeline Dry-run Command Use Case
AnsiblePipeline ansible-playbook --check --diff Configuration, packages, services
TerraformPipeline terraform plan Cloud infrastructure
KubernetesPipeline kubectl diff Container orchestration
BashPipeline Preview only Fallback for simple commands

Implementation:

class AbstractPipeline(ABC):
    @abstractmethod
    async def plan(self) -> PlanResult: ...

    @abstractmethod
    async def diff(self) -> DiffResult: ...

    @abstractmethod
    async def apply(self) -> ApplyResult: ...

    @abstractmethod
    async def rollback(self) -> RollbackResult: ...

    async def execute(self) -> PipelineResult:
        """Execute full pipeline with HITL."""
        plan = await self.plan()
        diff = await self.diff()
        summary = self._generate_summary(plan, diff)

        if not await self._request_hitl(summary):
            return PipelineResult(aborted=True)

        result = await self.apply()
        post_check = await self._post_check()

        if not post_check.success:
            await self.rollback()

        return result

Consequences:

  • Consistent change management across all tools
  • Mandatory preview before apply
  • Automatic rollback on failure
  • Full audit trail

ADR-015: CenterClassifier for Intent Routing

Status: Superseded by ADR-018 (v0.8.3)

Context: User requests need to be routed to the appropriate center (DIAGNOSTIC or CHANGE) based on intent.

Decision: Use a hybrid pattern-matching + LLM classifier:

  1. Pattern Matching (Fast Path): Regex patterns for clear intents
  2. LLM Fallback: Fast model for ambiguous cases
  3. Clarification: User prompt when confidence < 0.7

Note: The CenterClassifier was removed in v0.8.3. Routing is now handled by the MerlyaAgent system prompt, which instructs the agent directly on which specialist to delegate to based on request semantics. See ADR-018.


ADR-016: Brain/Fast Model Configuration

Status: Accepted (v0.8.0)

Context: Different tasks require different model capabilities and cost trade-offs.

Decision: Two model roles with separate configuration:

Role Purpose Example Models
brain Complex reasoning, planning, analysis Claude Sonnet, GPT-4o
fast Routing, fingerprinting, quick decisions Claude Haiku, GPT-4o-mini

Configuration:

model:
  provider: anthropic
  brain: claude-sonnet-4-5-20250514
  fast: claude-haiku-4-5-20250514

CLI Commands:

/model brain claude-sonnet-4    # Set brain model
/model fast claude-haiku        # Set fast model
/model show                     # Show current config

Usage in Code:

# Fast model for quick decisions
model = ctx.config.get_model("fast")

# Brain model for reasoning and agent loops
model = ctx.config.get_model("brain")

Consequences:

  • Cost optimization (fast model for simple tasks)
  • Better performance (quick routing)
  • Flexibility (different models per role)

ADR-017: Capability Detection

Status: Accepted (v0.8.0)

Context: The execution path needs to know which tools are available on target hosts to select the appropriate pipeline.

Decision: Implement capability detection module (merlya/capabilities/):

class CapabilityDetector:
    async def detect_host(self, host: Host) -> HostCapabilities:
        return HostCapabilities(
            ssh=await self._detect_ssh(host),
            tools=[
                await self._detect_ansible(),
                await self._detect_terraform(),
                await self._detect_kubectl(),
                await self._detect_git(),
            ],
            web_access=await self._detect_web_access(),
        )

Detected Capabilities:

Tool Detection Config Validation
Ansible which ansible Inventory exists
Terraform which terraform .tf files present
kubectl which kubectl kubeconfig valid
git which git .git directory

Caching: - TTL-based cache (24h default) - Invalidated on host changes - Per-host capability storage

Consequences:

  • Automatic pipeline selection
  • No manual tool configuration
  • Graceful fallback to BashPipeline

ADR-018: Specialist-Based Agent Architecture (v0.8.3)

Status: Accepted (v0.8.3)

Context: The v0.8.0 Centers architecture (DiagnosticCenter, ChangeCenter) was never called from the REPL. The main MerlyaAgent had all tools registered flat, and the Orchestrator agent was dead code. This created 3 conflicting loop-prevention mechanisms (thresholds 25/50/500) and a bloated tool surface.

Decision: Replace Centers and Orchestrator with a single MerlyaAgent that delegates to specialist agents:

Specialist Purpose HITL
DiagnosticSpecialist Read-only investigation No
ExecutionSpecialist Mutations (write/restart) Yes
SecuritySpecialist Security audits No
QuerySpecialist Inventory queries No

Architecture:

MerlyaAgent (system prompt: when to delegate)
├── delegate_diagnostic(target, task) → DiagnosticSpecialist (blocked_commands enforced)
├── delegate_execution(target, task)  → ExecutionSpecialist (HITL mandatory)
├── delegate_security(target, task)   → SecuritySpecialist
├── delegate_query(question)          → QuerySpecialist
└── list_hosts / get_host / ask_user  (direct tools)

Rationale: - Clear separation of concerns without dead code paths - Guards enforced at the specialist level (HITL, blocked commands) - Single routing mechanism (agent system prompt) - Rationalized limits: DEFAULT_TOOL_RETRIES=3, DEFAULT_TOOL_CALLS_LIMIT=50

Consequences: - Centers code removed (~500 lines) - Orchestrator dead code removed - Simpler debugging: single agent entry point - Specialists remain independently testable


ADR-019: In-Memory Observability (v0.8.3)

Status: Accepted (v0.8.3)

Context: No visibility into operational metrics during a session.

Decision: Add merlya/core/metrics.py (in-memory metrics) and merlya/core/resilience.py (circuit breaker + retry), with a /metrics slash command.

Metrics tracked: - merlya_commands_total — executions by type/status - merlya_ssh_duration_seconds — SSH latency histogram - merlya_llm_calls_total — LLM API calls by provider/model - merlya_pipeline_executions — pipeline runs by type/status - merlya_retry_attempts_total — retry counts for observability

Resilience patterns: - @circuit_breaker(failure_threshold=5, recovery_timeout=60s) — opens after 5 consecutive failures, auto-recovers after 60s - @retry(max_attempts=3, exponential_base=2.0) — exponential backoff retries

Design choices: - No external backend (Prometheus/Grafana deferred to V2.0) - Thread-safe with threading.Lock (works sync and async) - Histogram uses sliding window (max 10k observations) to prevent memory leak

Consequences: - Operational visibility via /metrics command - Circuit breakers prevent cascading failures on SSH/LLM - Zero external dependencies


Future Considerations

Under Evaluation

  • Fingerprint Module - Semantic signature extraction for command approval
  • Knowledge Base - Three-tier knowledge system (general/validated/observed)
  • ElevationManager Refactor - Simplified explicit elevation (no auto-detection)

Implemented (v0.8.3)

  • Kubernetes integration → KubernetesPipeline
  • Terraform integration → TerraformPipeline
  • Ansible integration → AnsiblePipeline
  • Centers architecture → Specialist agent model (ADR-018)
  • No metrics visibility → In-memory observability (ADR-019)

Rejected

  • GUI Application - Focus on CLI/API for automation
  • Custom LLM training - Too resource-intensive for scope
  • Multi-tenant SaaS - Security complexity, out of scope
  • ONNX embeddings - Removed in v0.8.0 for simplicity (see ADR-005)