Architecture Decisions¶
Key architectural decisions and their rationale.
Overview¶
Merlya is built on a modular architecture designed for extensibility, security, and ease of use. Version 0.8.3 introduces the specialist agent architecture where a single MerlyaAgent delegates to focused specialist agents for different operation types.
graph TB
subgraph "User Interface"
CLI[CLI / REPL]
end
subgraph "Parsing Layer"
SmartExtractor[SmartExtractor<br/>Host Detection]
end
subgraph "Agent"
MerlyaAgent[MerlyaAgent<br/>Orchestration + Delegation]
end
subgraph "Specialist Agents"
DiagSpec[DiagnosticSpecialist<br/>Read-Only]
ExecSpec[ExecutionSpecialist<br/>Mutations + HITL]
SecSpec[SecuritySpecialist<br/>Security Audits]
QuerySpec[QuerySpecialist<br/>Inventory Queries]
end
subgraph "Pipelines"
Ansible[AnsiblePipeline]
Terraform[TerraformPipeline]
K8s[KubernetesPipeline]
Bash[BashPipeline]
end
subgraph "Infrastructure"
SSH[SSH Pool]
Capabilities[Capability Detector]
end
subgraph "LLM Providers"
Brain[Brain Model<br/>Reasoning]
Fast[Fast Model<br/>Routing]
end
CLI --> SmartExtractor
SmartExtractor --> MerlyaAgent
MerlyaAgent -->|delegate_diagnostic| DiagSpec
MerlyaAgent -->|delegate_execution| ExecSpec
MerlyaAgent -->|delegate_security| SecSpec
MerlyaAgent -->|delegate_query| QuerySpec
DiagSpec --> SSH
ExecSpec --> Capabilities
Capabilities --> Ansible
Capabilities --> Terraform
Capabilities --> K8s
Capabilities --> Bash
MerlyaAgent --> Brain
DiagSpec --> Brain
ExecSpec --> Brain ADR-001: PydanticAI for Agent Framework¶
Status: Accepted
Context: We needed a framework for building LLM-powered agents with tool calling capabilities.
Decision: Use PydanticAI as the agent framework.
Rationale: - Type-safe with Pydantic models - Native async support - Clean tool definition API - Multi-provider support - Active development and community
Consequences: - Dependency on PydanticAI - Benefits from upstream improvements - May need to adapt to API changes
ADR-002: AsyncSSH for SSH Connections¶
Status: Accepted
Context: SSH connectivity is core to Merlya's functionality.
Decision: Use asyncssh for all SSH operations.
Rationale: - Pure Python, no external dependencies - Async-native for concurrent connections - Full SSH2 protocol support - Jump host support built-in - Active maintenance
Consequences: - Async-only API - Connection pooling needed for performance - Memory overhead for many connections
ADR-003: Connection Pooling¶
Status: Accepted
Context: Frequent SSH connections are slow and resource-intensive.
Decision: Implement connection pooling with automatic cleanup.
Rationale: - Avoid connection overhead for repeated commands - Limit concurrent connections - Automatic cleanup of idle connections - Graceful shutdown handling
Implementation:
class SSHPool:
max_connections: int = 10
idle_timeout: int = 300 # 5 minutes
async def get_connection(host) -> SSHConnection
async def release_connection(conn)
async def cleanup_idle()
ADR-004: Keyring for Credential Storage¶
Status: Accepted
Context: API keys and credentials need secure storage.
Decision: Use the system keyring (via keyring library).
Rationale: - OS-level security (Keychain, Secret Service, Credential Manager) - No plaintext secrets in config files - Standard cross-platform solution - Fallback to in-memory with warning
Consequences: - Requires keyring backend on Linux - May need user interaction for first-time setup - Headless servers need alternative setup
ADR-005: Local Intent Classification¶
Status: Superseded by ADR-013 (v0.8.0)
Context: Not all user inputs require LLM processing.
Decision: Use pattern-based local classifier for intent routing with LLM fallback.
Note: This ADR has been superseded by ADR-013 (DIAGNOSTIC/CHANGE Center Architecture) which provides a more sophisticated routing system with CenterClassifier.
ADR-006: YAML Configuration with SQLite Storage¶
Status: Accepted
Context: Configuration needs to be human-readable and editable. Host inventory requires structured storage with querying capabilities.
Decision: Use YAML for configuration files and SQLite for host inventory.
Rationale: - YAML: Human-friendly syntax, widely used in DevOps - YAML: Native support for complex structures - SQLite: Fast querying for host lookups - SQLite: Supports tagging, filtering, and search - SQLite: Single file, no server needed
File Locations: - ~/.merlya/config.yaml - Main configuration - ~/.merlya/merlya.db - Host inventory (SQLite) - ~/.merlya/logs/ - Log files - ~/.merlya/history - Command history
Configuration Example:
ADR-007: Plugin Architecture¶
Status: Proposed
Context: Users need to extend Merlya with custom tools.
Decision: Entry-point based plugin system.
Rationale: - Standard Python packaging approach - Easy distribution via PyPI - Namespace isolation - Lazy loading
Proposed Interface:
# pyproject.toml
[project.entry-points."merlya.tools"]
my_tool = "my_package:MyTool"
# Implementation
class MyTool(MerlyaTool):
name = "my_tool"
description = "Does something useful"
async def execute(self, params: dict) -> ToolResult:
...
ADR-008: Loguru for Logging¶
Status: Accepted
Context: Debugging and monitoring require good logging with minimal configuration.
Decision: Use loguru for logging.
Rationale: - Zero-configuration out of the box - Automatic rotation and retention - Human-readable colored output for development - Exception catching with full traceback - Easy to use API (logger.info(), logger.error()) - Async-safe
Implementation:
from loguru import logger
logger.info("Operation completed successfully")
logger.warning("Connection retry required")
logger.error("SSH connection failed: {error}", error=e)
Log Levels: - DEBUG: Detailed execution flow - INFO: Key operations and results - WARNING: Recoverable issues - ERROR: Failures requiring attention
ADR-009: Multi-Provider LLM Support¶
Status: Accepted
Context: Users have different LLM provider preferences and cost constraints.
Decision: Abstract LLM provider interface with multiple implementations. OpenRouter as default for free tier access.
Supported Providers: - OpenRouter (default) - 100+ models, free tier available - OpenAI (GPT-4o, GPT-4o-mini) - Anthropic (Claude 3.5 Sonnet, Haiku) - Ollama (local models, no API key needed)
Interface:
class LLMProvider(Protocol):
async def generate(prompt: str) -> str
async def generate_with_tools(prompt: str, tools: list) -> ToolCall
ADR-010: Security Model¶
Status: Accepted
Context: Merlya executes commands on remote systems.
Decision: Implement defense-in-depth security model.
Layers: 1. Credential Protection - Keyring storage 2. Command Review - User confirmation for destructive commands 3. Input Validation - Pydantic models for all inputs 4. Audit Logging - All commands logged 5. Principle of Least Privilege - Minimal permissions
Dangerous Command Detection:
ADR-011: Non-Interactive Mode Credential Handling¶
Status: Accepted (v0.7.8)
Context: When running in non-interactive mode (merlya run --yes), the agent cannot prompt users for credentials. Previously, this led to infinite retry loops when sudo/su commands required passwords.
Decision: Fail-fast with clear error messages when credentials are needed but cannot be obtained.
Rationale:
- Immediate failure prevents wasted API calls and timeouts
- Clear error messages guide users to proper solutions
- Three resolution paths documented: keyring, NOPASSWD, interactive mode
permanent_failureflag tells agent to stop retrying
Implementation:
# In request_credentials() and ssh_execute()
if ctx.auto_confirm and missing_credentials:
return CommandResult(
success=False,
message="Cannot obtain credentials in non-interactive mode",
data={
"non_interactive": True,
"permanent_failure": True, # Signal: do not retry
}
)
Solutions for users:
- Store credentials in keyring:
merlya secret set sudo:host:password - Configure NOPASSWD sudo on target hosts
- Run in interactive mode (without
--yes)
Consequences:
- Clear failure instead of timeout loops
- Better CI/CD integration (fast failure)
- Requires pre-configuration for automated elevated commands
ADR-012: ElevationMethod Enum for Host Configuration¶
Status: Accepted (v0.7.8)
Context: Elevation method was stored as strings with inconsistent validation, causing NULL values and validation errors.
Decision: Use ElevationMethod enum with explicit values and proper NULL handling.
Enum Values:
class ElevationMethod(str, Enum):
NONE = "none" # No elevation
SUDO = "sudo" # sudo (NOPASSWD)
SUDO_PASSWORD = "sudo_password" # sudo with password
DOAS = "doas" # doas (NOPASSWD)
DOAS_PASSWORD = "doas_password" # doas with password
SU = "su" # su with password
Handling:
- NULL in database →
ElevationMethod.NONE - Invalid strings →
ElevationMethod.NONE /hosts edituses enum mapping- Import (TOML/CSV) maps strings to enum values
Consequences:
- Type-safe elevation configuration
- No more validation errors on NULL
- Consistent behavior across all code paths
ADR-013: DIAGNOSTIC/CHANGE Center Architecture¶
Status: Superseded by ADR-018 (v0.8.3)
Context: Merlya needed a clear separation between read-only investigation and state-changing operations to improve safety and provide appropriate guardrails for each type of operation.
Decision: Implement two operational centers with different security models:
| Center | Purpose | Risk Level | HITL Required |
|---|---|---|---|
| DIAGNOSTIC | Read-only investigation | LOW | No |
| CHANGE | Controlled mutations | HIGH | Yes |
Note: The Centers architecture was replaced in v0.8.3 by the specialist agent model (ADR-018). The DiagnosticCenter and ChangeCenter classes no longer exist; their responsibilities are now handled by DiagnosticSpecialist and ExecutionSpecialist respectively.
ADR-014: Pipeline System for CHANGE Operations¶
Status: Accepted (v0.8.0)
Context: All state-changing operations need consistent validation, preview, approval, and rollback capabilities.
Decision: Implement mandatory pipeline stages for all CHANGE operations:
Pipeline Types:
| Pipeline | Dry-run Command | Use Case |
|---|---|---|
| AnsiblePipeline | ansible-playbook --check --diff | Configuration, packages, services |
| TerraformPipeline | terraform plan | Cloud infrastructure |
| KubernetesPipeline | kubectl diff | Container orchestration |
| BashPipeline | Preview only | Fallback for simple commands |
Implementation:
class AbstractPipeline(ABC):
@abstractmethod
async def plan(self) -> PlanResult: ...
@abstractmethod
async def diff(self) -> DiffResult: ...
@abstractmethod
async def apply(self) -> ApplyResult: ...
@abstractmethod
async def rollback(self) -> RollbackResult: ...
async def execute(self) -> PipelineResult:
"""Execute full pipeline with HITL."""
plan = await self.plan()
diff = await self.diff()
summary = self._generate_summary(plan, diff)
if not await self._request_hitl(summary):
return PipelineResult(aborted=True)
result = await self.apply()
post_check = await self._post_check()
if not post_check.success:
await self.rollback()
return result
Consequences:
- Consistent change management across all tools
- Mandatory preview before apply
- Automatic rollback on failure
- Full audit trail
ADR-015: CenterClassifier for Intent Routing¶
Status: Superseded by ADR-018 (v0.8.3)
Context: User requests need to be routed to the appropriate center (DIAGNOSTIC or CHANGE) based on intent.
Decision: Use a hybrid pattern-matching + LLM classifier:
- Pattern Matching (Fast Path): Regex patterns for clear intents
- LLM Fallback: Fast model for ambiguous cases
- Clarification: User prompt when confidence < 0.7
Note: The CenterClassifier was removed in v0.8.3. Routing is now handled by the MerlyaAgent system prompt, which instructs the agent directly on which specialist to delegate to based on request semantics. See ADR-018.
ADR-016: Brain/Fast Model Configuration¶
Status: Accepted (v0.8.0)
Context: Different tasks require different model capabilities and cost trade-offs.
Decision: Two model roles with separate configuration:
| Role | Purpose | Example Models |
|---|---|---|
| brain | Complex reasoning, planning, analysis | Claude Sonnet, GPT-4o |
| fast | Routing, fingerprinting, quick decisions | Claude Haiku, GPT-4o-mini |
Configuration:
CLI Commands:
/model brain claude-sonnet-4 # Set brain model
/model fast claude-haiku # Set fast model
/model show # Show current config
Usage in Code:
# Fast model for quick decisions
model = ctx.config.get_model("fast")
# Brain model for reasoning and agent loops
model = ctx.config.get_model("brain")
Consequences:
- Cost optimization (fast model for simple tasks)
- Better performance (quick routing)
- Flexibility (different models per role)
ADR-017: Capability Detection¶
Status: Accepted (v0.8.0)
Context: The execution path needs to know which tools are available on target hosts to select the appropriate pipeline.
Decision: Implement capability detection module (merlya/capabilities/):
class CapabilityDetector:
async def detect_host(self, host: Host) -> HostCapabilities:
return HostCapabilities(
ssh=await self._detect_ssh(host),
tools=[
await self._detect_ansible(),
await self._detect_terraform(),
await self._detect_kubectl(),
await self._detect_git(),
],
web_access=await self._detect_web_access(),
)
Detected Capabilities:
| Tool | Detection | Config Validation |
|---|---|---|
| Ansible | which ansible | Inventory exists |
| Terraform | which terraform | .tf files present |
| kubectl | which kubectl | kubeconfig valid |
| git | which git | .git directory |
Caching: - TTL-based cache (24h default) - Invalidated on host changes - Per-host capability storage
Consequences:
- Automatic pipeline selection
- No manual tool configuration
- Graceful fallback to BashPipeline
ADR-018: Specialist-Based Agent Architecture (v0.8.3)¶
Status: Accepted (v0.8.3)
Context: The v0.8.0 Centers architecture (DiagnosticCenter, ChangeCenter) was never called from the REPL. The main MerlyaAgent had all tools registered flat, and the Orchestrator agent was dead code. This created 3 conflicting loop-prevention mechanisms (thresholds 25/50/500) and a bloated tool surface.
Decision: Replace Centers and Orchestrator with a single MerlyaAgent that delegates to specialist agents:
| Specialist | Purpose | HITL |
|---|---|---|
| DiagnosticSpecialist | Read-only investigation | No |
| ExecutionSpecialist | Mutations (write/restart) | Yes |
| SecuritySpecialist | Security audits | No |
| QuerySpecialist | Inventory queries | No |
Architecture:
MerlyaAgent (system prompt: when to delegate)
├── delegate_diagnostic(target, task) → DiagnosticSpecialist (blocked_commands enforced)
├── delegate_execution(target, task) → ExecutionSpecialist (HITL mandatory)
├── delegate_security(target, task) → SecuritySpecialist
├── delegate_query(question) → QuerySpecialist
└── list_hosts / get_host / ask_user (direct tools)
Rationale: - Clear separation of concerns without dead code paths - Guards enforced at the specialist level (HITL, blocked commands) - Single routing mechanism (agent system prompt) - Rationalized limits: DEFAULT_TOOL_RETRIES=3, DEFAULT_TOOL_CALLS_LIMIT=50
Consequences: - Centers code removed (~500 lines) - Orchestrator dead code removed - Simpler debugging: single agent entry point - Specialists remain independently testable
ADR-019: In-Memory Observability (v0.8.3)¶
Status: Accepted (v0.8.3)
Context: No visibility into operational metrics during a session.
Decision: Add merlya/core/metrics.py (in-memory metrics) and merlya/core/resilience.py (circuit breaker + retry), with a /metrics slash command.
Metrics tracked: - merlya_commands_total — executions by type/status - merlya_ssh_duration_seconds — SSH latency histogram - merlya_llm_calls_total — LLM API calls by provider/model - merlya_pipeline_executions — pipeline runs by type/status - merlya_retry_attempts_total — retry counts for observability
Resilience patterns: - @circuit_breaker(failure_threshold=5, recovery_timeout=60s) — opens after 5 consecutive failures, auto-recovers after 60s - @retry(max_attempts=3, exponential_base=2.0) — exponential backoff retries
Design choices: - No external backend (Prometheus/Grafana deferred to V2.0) - Thread-safe with threading.Lock (works sync and async) - Histogram uses sliding window (max 10k observations) to prevent memory leak
Consequences: - Operational visibility via /metrics command - Circuit breakers prevent cascading failures on SSH/LLM - Zero external dependencies
Future Considerations¶
Under Evaluation¶
- Fingerprint Module - Semantic signature extraction for command approval
- Knowledge Base - Three-tier knowledge system (general/validated/observed)
- ElevationManager Refactor - Simplified explicit elevation (no auto-detection)
Implemented (v0.8.3)¶
Kubernetes integration→ KubernetesPipelineTerraform integration→ TerraformPipelineAnsible integration→ AnsiblePipelineCenters architecture→ Specialist agent model (ADR-018)No metrics visibility→ In-memory observability (ADR-019)
Rejected¶
- GUI Application - Focus on CLI/API for automation
- Custom LLM training - Too resource-intensive for scope
- Multi-tenant SaaS - Security complexity, out of scope
- ONNX embeddings - Removed in v0.8.0 for simplicity (see ADR-005)