Files
Borealis-Github-Replica/AGENTS.md

121 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Architecture At A Glance
- `Borealis.ps1` is the starting point for every aspect of Borealis. It bootstraps dependencies, configures bundled Python virtual environments, and deploys the agents and server from a singular script.
- Bundled assets live under `Data/Agent`, `Data/Server`, and `Dependencies`. Launching an agent or server copies the necessary data from these `Data/` directories into sibling `Agent/` and `Server/` directories at runtime so the development tree stays clean and the runtime stays portable.
- The server stack spans NodeJS + Vite for live development and Python Flask (`Data/Server/server.py`) for the production frontend (when not using the Vite dev server) and for API endpoints to the Borealis Server backend.
The `script_engines.py` helper exposes a PowerShell runner for potential server-side orchestration, but no current Flask route invokes it; agent-side script execution lives under the roles in `Data/Agent`.
- Agents run inside the packaged Python venv (`Data/Agent` mirrored to `Agent/`). `agent.py` handles the primary connection and hot-loads roles from `Data/Agent/Roles` at agent startup.
## Logging Policy (Centralized, Rotated)
- **Log Locations**
- Agent: `<ProjectRoot>/Logs/Agent/<service>.log`
- Server: `<ProjectRoot>/Logs/Server/<service>.log`
- **General-Purpose Logs**
- Agent: `agent.log`
- Server: `server.log`
- **Dedicated Logs**
- Subsystems with significant surface area must use their own `<service>.log`
- Examples: `ansible.log`, `webrtc.log`, `scheduler.log`
- **Installation / Bootstrap Logs**
- Agent install: `Logs/Agent/install.log`
- Server install: `Logs/Server/install.log`
- **Rotation Policy**
- All log writers must rotate daily.
- On day rollover, rename:
- `<service>.log``<service>.log.YYYY-MM-DD`
- Append only to the current days log.
- **Do not** auto-delete rotated logs.
- **Restrictions**
- Logs must **only** be written under the project root.
- Never write logs to:
- `ProgramData`
- `AppData`
- User profiles
- System temp directories
- No alternative log fan-out (e.g., per-component folders) unless explicitly coordinated.
Prefer single log files per service.
- **Convergence**
- This policy applies to all new contributions.
- When modifying existing code, migrate ad-hoc logging into this pattern.
- **Troubleshooting Issues via Logs with the Operator**
- When troubleshooting issues with the operator, you must ensure that you are adding extensive logging to whatever feature you are trying to troubleshoot
- If the operator reports successfully resolving an issue, you are to ask them if they want you to remove the extra logging functionality or if they want to keep it.
- When troubleshooting, logs will have the <timestamp>-<service-name>-<log-data> structure to every line of the logs.
## Dependencies & Packaging
`Dependencies/` holds the installers/download payloads Borealis bootstraps on first launch: Python, 7-Zip, AutoHotkey, and NodeJS. Versions are hard-pinned in `Borealis.ps1`; upgrading any runtime requires updating those version constants before repackaging. Nothing self-updates, so Codex should coordinate dependency bumps carefully and test both server and agent bootstrap paths.
## Security Breakdowns
The process that agents go through when authenticating securely with a Borealis server can be a little complex, so I have included a few sequence diagrams below along with a summary of the (current) security posture of Borealis to go over the core systems so you can visually understand what is going on behind-the-scenes.
### Security Overview
#### Overall
- Borealis enforces mutual trust: each agent presents a unique Ed25519 identity to the server, the server issues EdDSA-signed (Ed25519) access tokens bound to that fingerprint, and both sides pin the generated Borealis root CA.
- End-to-end TLS everywhere: the server ships an ECDSA P-384 root + leaf chain and only serves TLS1.3; agents require TLS1.2+ and "pin" (store the server certificate for future verification) the delivered bundle for both REST and WebSocket traffic, eliminating Man-in-the-middle avenues.
- Device enrollment is gated by enrollment/installer codes (*They have configurable expiration and usage limits*) and an operator approval queue; replay-resistant nonces plus rate limits (40req/min/IP, 12req/min/fingerprint) prevent brute force or code reuse.
- All device APIs now require Authorization: Bearer headers and a service-context (e.g. SYSTEM or CURRENTUSER) marker; missing, expired, mismatched, or revoked credentials are rejected before any business logic runs. Operator-driven revoking / device quarantining logic is not yet implemented.
- Replay and credential theft defenses layer in DPoP proof validation (thumbprint binding) on the server side and short-lived access tokens (15min) with 30-day refresh tokens hashed via SHA-256.
- Centralized logging under Logs/Server and Logs/Agent captures enrollment approvals, rate-limit hits, signature failures, and auth anomalies for post-incident review.
#### Server Security
- Auto-manages PKI: a persistent Borealis root CA (ECDSA SECP384R1) signs leaf certificates that include localhost SANs, tightened filesystem permissions, and a combined bundle for agent identity / cert pinning.
- Script delivery is code-signed with an Ed25519 key stored under Certificates/Server/Code-Signing; agents refuse any payload whose signature or hash does not match the pinned public key.
- Device authentication checks GUID normalization, SSL fingerprint matches, token version counters, and quarantine flags before admitting requests; missing rows with valid tokens auto-recover into placeholder records to avoid accidental lockouts.
- Refresh tokens are never stored in cleartext, only SHA-256 hashes plus DPoP bindings land in SQLite, and reuse after revocation/expiry returns explicit error codes.
- Enrollment workflow queues approvals, detects hostname/fingerprint conflicts, offers merge/overwrite options, and records auditor identities so trust decisions are traceable.
- Background jobs prune expired enrollment codes and refresh tokens, keeping the attack surface small without silently deleting active credentials.
#### Agent
- Generates device-wide Ed25519 key pairs on first launch, storing them under Certificates/Agent/Identity/ with DPAPI protection on Windows (chmod600 elsewhere) and persisting the server-issued GUID alongside.
- Stores refresh/access tokens encrypted (DPAPI) with companion metadata that pins them to the expected server certificate fingerprint; mismatches or refresh failures trigger a clean re-enrollment.
- Imports the servers TLS bundle into a dedicated ssl.SSLContext, reuses it for the REST session, and injects it into the Socket.IO engine so WebSockets enjoy the same pinning and hostname checks.
- Treats every script payload as hostile until verified: only Ed25519 signatures from the server are accepted, missing/invalid signatures are logged and dropped, and the trusted signing key is updated only after successful verification between the agent and the server.
- Operates outbound-only; there are no listener ports, and every API/WebSocket call flows through AgentHttpClient.ensure_authenticated, forcing token refresh logic before retrying.
- Logs bootstrap, enrollment, token refresh, and signature events to daily-rotated files under Logs/Agent, giving operators visibility without leaking secrets outside the project root.
### Execution Contexts
The agent runs in the interactive user session. SYSTEM-level script execution is provided by the ScriptExec SYSTEM role using ephemeral scheduled tasks; no separate supervisor or watchdog is required.
## Roles & Extensibility
- Roles live under `Data/Agent/Roles/` and are autodiscovered at startup; no changes are needed in `agent.py` when adding new roles.
- Naming convention: `role_<Purpose>.py` per role.
- Role interface (per module):
- `ROLE_NAME`: canonical role name used by config (e.g., `screenshot`, `script_exec_system`).
- `ROLE_CONTEXTS`: list of contexts this role runs in (`interactive`, `system`).
- `class Role(ctx)`: optional hooks the agent loader will call:
- `register_events()`: bind any Socket.IO listeners.
- `on_config(roles: List[dict])`: start/stop perrole tasks based on server config.
- `stop_all()`: cancel tasks and cleanup.
- Standard roles currently shipped:
- `role_DeviceInventory.py` — collects and periodically posts device inventory/summary.
- `role_Screenshot.py` — region overlay + periodic capture with WebSocket updates.
- `role_ScriptExec_CURRENTUSER.py` — runs PowerShell in the loggedin session and provides the tray icon (restart/quit).
- `role_ScriptExec_SYSTEM.py` — runs PowerShell as SYSTEM via ephemeral Scheduled Tasks.
- `role_Macro.py` — macro and key/text send helpers.
- Considerations:
- SYSTEM role requires administrative rights to create/run scheduled tasks as SYSTEM. If elevation is unavailable or policies restrict task creation, SYSTEM jobs will fail gracefully and report errors to the server.
- Roles are “hotloaded” on startup only (no dynamic import while running).
- Roles must avoid blocking the main event loop and be resilient to restarts.
## Platform Parity
Windows is the reference environment today. `Borealis.ps1` owns the full deployment story, while `Borealis.sh` lags significantly and lacks the same packaging logic. Linux support needs feature parity (virtual environments, supervisor equivalents, and role loading) before macOS work resumes.
## Ansible Support (Unfinished — Do Not Use)
Important: The Ansible integration is not productionready. Do not rely on it for jobs, quick jobs, or troubleshooting. The current implementation is a workinprogress and will change.
- Status
- Agent and server contain early scaffolding for running playbooks and posting recapstyle output, but behavior is not reliable across Windows hosts.
- Expect playbooks to stall, fail silently, or never deliver recaps/cancel events. Cancellation controls and live output are not guaranteed to function.
- Packaging of Ansible dependencies and Windows collections is incomplete. Connection modes (local/PSRP/WinRM) are not fully exposed or managed.
- Known blockers (Windows)
- ansible.windows.* modules require remoting (PSRP/WinRM) and typically cannot run with `connection: local` on the controller.
- The SYSTEM service context is a poor fit for loopback remoting without explicit credentials/policy; this leads to noops and “forever running” jobs.
- Collection availability (e.g., `ansible.windows`) and interpreter/paths vary and are not yet normalized across agent installs.
- Nearterm guidance
- Assume all Ansible and playbookrelated features are disabled for operational purposes.
- Do not file bug reports for Ansible behavior; it is intentionally unfinished and unsupported at this time.
- Future direction (not started)
- Databasefed credential management (per device/site/global), stored securely and surfaced to playbook runs.
- Firstclass selection of connection types (local | PSRP | WinRM) from the UI and scheduler, with perrun credential binding.
- Reliable live output and cancel semantics; hardened recap ingestion and history.
- Verified packaging of required Ansible components and Windows collections inside the agent venv.