11 KiB
Architecture At A Glance
Borealis.ps1is the starting point for every aspect of Borealis. It bootstraps dependencies, configures bundled Python virtual environments, and deploys the agents and server from a singular script.- Bundled assets live under
Data/Agent,Data/Server, andDependencies. Launching an agent or server copies the necessary data from theseData/directories into siblingAgent/andServer/directories at runtime so the development tree stays clean and the runtime stays portable. - The server stack spans NodeJS + Vite for live development and Python Flask (
Data/Server/server.py) for the production frontend (when not using the Vite dev server) and for API endpoints to the Borealis Server backend. Thescript_engines.pyhelper exposes a PowerShell runner for potential server-side orchestration, but no current Flask route invokes it; agent-side script execution lives under the roles inData/Agent. - Agents run inside the packaged Python venv (
Data/Agentmirrored toAgent/).agent.pyhandles the primary connection and hot-loads roles fromData/Agent/Rolesat agent startup.
Logging Policy (Centralized, Rotated)
- Log Locations
- Agent:
<ProjectRoot>/Logs/Agent/<service>.log - Server:
<ProjectRoot>/Logs/Server/<service>.log
- Agent:
- General-Purpose Logs
- Agent:
agent.log - Server:
server.log
- Agent:
- Dedicated Logs
- Subsystems with significant surface area must use their own
<service>.log- Examples:
ansible.log,webrtc.log,scheduler.log
- Examples:
- Subsystems with significant surface area must use their own
- Installation / Bootstrap Logs
- Agent install:
Logs/Agent/install.log - Server install:
Logs/Server/install.log
- Agent install:
- Rotation Policy
- All log writers must rotate daily.
- On day rollover, rename:
<service>.log→<service>.log.YYYY-MM-DD
- Append only to the current day’s log.
- Do not auto-delete rotated logs.
- Restrictions
- Logs must only be written under the project root.
- Never write logs to:
ProgramDataAppData- User profiles
- System temp directories
- No alternative log fan-out (e.g., per-component folders) unless explicitly coordinated.
Prefer single log files per service.
- Convergence
- This policy applies to all new contributions.
- When modifying existing code, migrate ad-hoc logging into this pattern.
- Troubleshooting Issues via Logs with the Operator
- When troubleshooting issues with the operator, you must ensure that you are adding extensive logging to whatever feature you are trying to troubleshoot
- If the operator reports successfully resolving an issue, you are to ask them if they want you to remove the extra logging functionality or if they want to keep it.
- When troubleshooting, logs will have the -- structure to every line of the logs.
Dependencies & Packaging
Dependencies/ holds the installers/download payloads Borealis bootstraps on first launch: Python, 7-Zip, AutoHotkey, and NodeJS. Versions are hard-pinned in Borealis.ps1; upgrading any runtime requires updating those version constants before repackaging. Nothing self-updates, so Codex should coordinate dependency bumps carefully and test both server and agent bootstrap paths.
Security Breakdowns
The process that agents go through when authenticating securely with a Borealis server can be a little complex, so I have included a few sequence diagrams below along with a summary of the (current) security posture of Borealis to go over the core systems so you can visually understand what is going on behind-the-scenes.
Security Overview
Overall
- Borealis enforces mutual trust: each agent presents a unique Ed25519 identity to the server, the server issues EdDSA-signed (Ed25519) access tokens bound to that fingerprint, and both sides pin the generated Borealis root CA.
- End-to-end TLS everywhere: the server ships an ECDSA P-384 root + leaf chain and only serves TLS 1.3; agents require TLS 1.2+ and "pin" (store the server certificate for future verification) the delivered bundle for both REST and WebSocket traffic, eliminating Man-in-the-middle avenues.
- Device enrollment is gated by enrollment/installer codes (They have configurable expiration and usage limits) and an operator approval queue; replay-resistant nonces plus rate limits (40 req/min/IP, 12 req/min/fingerprint) prevent brute force or code reuse.
- All device APIs now require Authorization: Bearer headers and a service-context (e.g. SYSTEM or CURRENTUSER) marker; missing, expired, mismatched, or revoked credentials are rejected before any business logic runs. Operator-driven revoking / device quarantining logic is not yet implemented.
- Replay and credential theft defenses layer in DPoP proof validation (thumbprint binding) on the server side and short-lived access tokens (15 min) with 30-day refresh tokens hashed via SHA-256.
- Centralized logging under Logs/Server and Logs/Agent captures enrollment approvals, rate-limit hits, signature failures, and auth anomalies for post-incident review.
Server Security
- Auto-manages PKI: a persistent Borealis root CA (ECDSA SECP384R1) signs leaf certificates that include localhost SANs, tightened filesystem permissions, and a combined bundle for agent identity / cert pinning.
- Script delivery is code-signed with an Ed25519 key stored under Certificates/Server/Code-Signing; agents refuse any payload whose signature or hash does not match the pinned public key.
- Device authentication checks GUID normalization, SSL fingerprint matches, token version counters, and quarantine flags before admitting requests; missing rows with valid tokens auto-recover into placeholder records to avoid accidental lockouts.
- Refresh tokens are never stored in cleartext, only SHA-256 hashes plus DPoP bindings land in SQLite, and reuse after revocation/expiry returns explicit error codes.
- Enrollment workflow queues approvals, detects hostname/fingerprint conflicts, offers merge/overwrite options, and records auditor identities so trust decisions are traceable.
- Background jobs prune expired enrollment codes and refresh tokens, keeping the attack surface small without silently deleting active credentials.
Agent
- Generates device-wide Ed25519 key pairs on first launch, storing them under Certificates/Agent/Identity/ with DPAPI protection on Windows (chmod 600 elsewhere) and persisting the server-issued GUID alongside.
- Stores refresh/access tokens encrypted (DPAPI) with companion metadata that pins them to the expected server certificate fingerprint; mismatches or refresh failures trigger a clean re-enrollment.
- Imports the server’s TLS bundle into a dedicated ssl.SSLContext, reuses it for the REST session, and injects it into the Socket.IO engine so WebSockets enjoy the same pinning and hostname checks.
- Treats every script payload as hostile until verified: only Ed25519 signatures from the server are accepted, missing/invalid signatures are logged and dropped, and the trusted signing key is updated only after successful verification between the agent and the server.
- Operates outbound-only; there are no listener ports, and every API/WebSocket call flows through AgentHttpClient.ensure_authenticated, forcing token refresh logic before retrying.
- Logs bootstrap, enrollment, token refresh, and signature events to daily-rotated files under Logs/Agent, giving operators visibility without leaking secrets outside the project root.
Execution Contexts
The agent runs in the interactive user session. SYSTEM-level script execution is provided by the ScriptExec SYSTEM role using ephemeral scheduled tasks; no separate supervisor or watchdog is required.
Roles & Extensibility
- Roles live under
Data/Agent/Roles/and are auto‑discovered at startup; no changes are needed inagent.pywhen adding new roles. - Naming convention:
role_<Purpose>.pyper role. - Role interface (per module):
ROLE_NAME: canonical role name used by config (e.g.,screenshot,script_exec_system).ROLE_CONTEXTS: list of contexts this role runs in (interactive,system).class Role(ctx): optional hooks the agent loader will call:register_events(): bind any Socket.IO listeners.on_config(roles: List[dict]): start/stop per‑role tasks based on server config.stop_all(): cancel tasks and cleanup.
- Standard roles currently shipped:
role_DeviceInventory.py— collects and periodically posts device inventory/summary.role_Screenshot.py— region overlay + periodic capture with WebSocket updates.role_ScriptExec_CURRENTUSER.py— runs PowerShell in the logged‑in session and provides the tray icon (restart/quit).role_ScriptExec_SYSTEM.py— runs PowerShell as SYSTEM via ephemeral Scheduled Tasks.role_Macro.py— macro and key/text send helpers.
- Considerations:
- SYSTEM role requires administrative rights to create/run scheduled tasks as SYSTEM. If elevation is unavailable or policies restrict task creation, SYSTEM jobs will fail gracefully and report errors to the server.
- Roles are “hot‑loaded” on startup only (no dynamic import while running).
- Roles must avoid blocking the main event loop and be resilient to restarts.
Platform Parity
Windows is the reference environment today. Borealis.ps1 owns the full deployment story, while Borealis.sh lags significantly and lacks the same packaging logic. Linux support needs feature parity (virtual environments, supervisor equivalents, and role loading) before macOS work resumes.
Ansible Support (Unfinished — Do Not Use)
Important: The Ansible integration is not production‑ready. Do not rely on it for jobs, quick jobs, or troubleshooting. The current implementation is a work‑in‑progress and will change.
-
Status
- Agent and server contain early scaffolding for running playbooks and posting recap‑style output, but behavior is not reliable across Windows hosts.
- Expect playbooks to stall, fail silently, or never deliver recaps/cancel events. Cancellation controls and live output are not guaranteed to function.
- Packaging of Ansible dependencies and Windows collections is incomplete. Connection modes (local/PSRP/WinRM) are not fully exposed or managed.
-
Known blockers (Windows)
- ansible.windows.* modules require remoting (PSRP/WinRM) and typically cannot run with
connection: localon the controller. - The SYSTEM service context is a poor fit for loopback remoting without explicit credentials/policy; this leads to no‑ops and “forever running” jobs.
- Collection availability (e.g.,
ansible.windows) and interpreter/paths vary and are not yet normalized across agent installs.
- ansible.windows.* modules require remoting (PSRP/WinRM) and typically cannot run with
-
Near‑term guidance
- Assume all Ansible and playbook‑related features are disabled for operational purposes.
- Do not file bug reports for Ansible behavior; it is intentionally unfinished and unsupported at this time.
-
Future direction (not started)
- Database‑fed credential management (per device/site/global), stored securely and surfaced to playbook runs.
- First‑class selection of connection types (local | PSRP | WinRM) from the UI and scheduler, with per‑run credential binding.
- Reliable live output and cancel semantics; hardened recap ingestion and history.
- Verified packaging of required Ansible components and Windows collections inside the agent venv.