Borealis-Github-Replica/AGENTS.md

## Architecture At A Glance
- `Borealis.ps1` is the starting point for every aspect of Borealis. It bootstraps dependencies, configures bundled Python virtual environments, and deploys the agents and server from a singular script.
- Bundled assets live under `Data/Agent`, `Data/Server`, and `Dependencies`. Launching an agent or server copies the necessary data from these `Data/` directories into sibling `Agent/` and `Server/` directories at runtime so the development tree stays clean and the runtime stays portable.
- The server stack spans NodeJS + Vite for live development and Python Flask (`Data/Server/server.py`) for the production frontend (when not using the Vite dev server) and for API endpoints to the Borealis Server backend.
The `script_engines.py` helper exposes a PowerShell runner for potential server-side orchestration, but no current Flask route invokes it; agent-side script execution lives under the roles in `Data/Agent`.
- Agents run inside the packaged Python venv (`Data/Agent` mirrored to `Agent/`). `agent.py` handles the primary connection and hot-loads roles from `Data/Agent/Roles` at agent startup.

## Logging Policy (Centralized, Rotated)
- **Log Locations**
  - Agent: `<ProjectRoot>/Logs/Agent/<service>.log`
  - Server: `<ProjectRoot>/Logs/Server/<service>.log`
- **General-Purpose Logs**
  - Agent: `agent.log`
  - Server: `server.log`
- **Dedicated Logs**
  - Subsystems with significant surface area must use their own `<service>.log`
    - Examples: `ansible.log`, `webrtc.log`, `scheduler.log`
- **Installation / Bootstrap Logs**
  - Agent install: `Logs/Agent/install.log`
  - Server install: `Logs/Server/install.log`
- **Rotation Policy**
  - All log writers must rotate daily.
  - On day rollover, rename:
    - `<service>.log` → `<service>.log.YYYY-MM-DD`
  - Append only to the current day’s log.
  - **Do not** auto-delete rotated logs.
- **Restrictions**
  - Logs must **only** be written under the project root.
  - Never write logs to:
    - `ProgramData`
    - `AppData`
    - User profiles
    - System temp directories
  - No alternative log fan-out (e.g., per-component folders) unless explicitly coordinated.
    Prefer single log files per service.
- **Convergence**
  - This policy applies to all new contributions.
  - When modifying existing code, migrate ad-hoc logging into this pattern.
- **Troubleshooting Issues via Logs with the Operator**
  - When troubleshooting issues with the operator, you must ensure that you are adding extensive logging to whatever feature you are trying to troubleshoot
  - If the operator reports successfully resolving an issue, you are to ask them if they want you to remove the extra logging functionality or if they want to keep it.
  - When troubleshooting, logs will have the <timestamp>-<service-name>-<log-data> structure to every line of the logs.

## Dependencies & Packaging
`Dependencies/` holds the installers/download payloads Borealis bootstraps on first launch: Python, 7-Zip, AutoHotkey, and NodeJS. Versions are hard-pinned in `Borealis.ps1`; upgrading any runtime requires updating those version constants before repackaging. Nothing self-updates, so Codex should coordinate dependency bumps carefully and test both server and agent bootstrap paths.

## Security Breakdowns
The process that agents go through when authenticating securely with a Borealis server can be a little complex, so I have included a few sequence diagrams below along with a summary of the (current) security posture of Borealis to go over the core systems so you can visually understand what is going on behind-the-scenes.

### Security Overview
#### Overall
- Borealis enforces mutual trust: each agent presents a unique Ed25519 identity to the server, the server issues EdDSA-signed (Ed25519) access tokens bound to that fingerprint, and both sides pin the generated Borealis root CA.
- End-to-end TLS everywhere: the server ships an ECDSA P-384 root + leaf chain and only serves TLS 1.3; agents require TLS 1.2+ and "pin" (store the server certificate for future verification) the delivered bundle for both REST and WebSocket traffic, eliminating Man-in-the-middle avenues.
- Device enrollment is gated by enrollment/installer codes (*They have configurable expiration and usage limits*) and an operator approval queue; replay-resistant nonces plus rate limits (40 req/min/IP, 12 req/min/fingerprint) prevent brute force or code reuse.
- All device APIs now require Authorization: Bearer headers and a service-context (e.g. SYSTEM or CURRENTUSER) marker; missing, expired, mismatched, or revoked credentials are rejected before any business logic runs.  Operator-driven revoking / device quarantining logic is not yet implemented.
- Replay and credential theft defenses layer in DPoP proof validation (thumbprint binding) on the server side and short-lived access tokens (15 min) with 30-day refresh tokens hashed via SHA-256.
- Centralized logging under Logs/Server and Logs/Agent captures enrollment approvals, rate-limit hits, signature failures, and auth anomalies for post-incident review.
#### Server Security
- Auto-manages PKI: a persistent Borealis root CA (ECDSA SECP384R1) signs leaf certificates that include localhost SANs, tightened filesystem permissions, and a combined bundle for agent identity / cert pinning.
- Script delivery is code-signed with an Ed25519 key stored under Certificates/Server/Code-Signing; agents refuse any payload whose signature or hash does not match the pinned public key.
- Device authentication checks GUID normalization, SSL fingerprint matches, token version counters, and quarantine flags before admitting requests; missing rows with valid tokens auto-recover into placeholder records to avoid accidental lockouts.
- Refresh tokens are never stored in cleartext, only SHA-256 hashes plus DPoP bindings land in SQLite, and reuse after revocation/expiry returns explicit error codes.
- Enrollment workflow queues approvals, detects hostname/fingerprint conflicts, offers merge/overwrite options, and records auditor identities so trust decisions are traceable.
- Background jobs prune expired enrollment codes and refresh tokens, keeping the attack surface small without silently deleting active  credentials.
#### Agent
- Generates device-wide Ed25519 key pairs on first launch, storing them under Certificates/Agent/Identity/ with DPAPI protection on Windows (chmod 600 elsewhere) and persisting the server-issued GUID alongside.
- Stores refresh/access tokens encrypted (DPAPI) with companion metadata that pins them to the expected server certificate fingerprint; mismatches or refresh failures trigger a clean re-enrollment.
- Imports the server’s TLS bundle into a dedicated ssl.SSLContext, reuses it for the REST session, and injects it into the Socket.IO engine so WebSockets enjoy the same pinning and hostname checks.
- Treats every script payload as hostile until verified: only Ed25519 signatures from the server are accepted, missing/invalid signatures are logged and dropped, and the trusted signing key is updated only after successful verification between the agent and the server.
- Operates outbound-only; there are no listener ports, and every API/WebSocket call flows through AgentHttpClient.ensure_authenticated, forcing token refresh logic before retrying.
- Logs bootstrap, enrollment, token refresh, and signature events to daily-rotated files under Logs/Agent, giving operators visibility without leaking secrets outside the project root.

### Execution Contexts
The agent runs in the interactive user session. SYSTEM-level script execution is provided by the ScriptExec SYSTEM role using ephemeral scheduled tasks; no separate supervisor or watchdog is required.

## Roles & Extensibility
- Roles live under `Data/Agent/Roles/` and are auto‑discovered at startup; no changes are needed in `agent.py` when adding new roles.
- Naming convention: `role_<Purpose>.py` per role.
- Role interface (per module):
  - `ROLE_NAME`: canonical role name used by config (e.g., `screenshot`, `script_exec_system`).
  - `ROLE_CONTEXTS`: list of contexts this role runs in (`interactive`, `system`).
  - `class Role(ctx)`: optional hooks the agent loader will call:
    - `register_events()`: bind any Socket.IO listeners.
    - `on_config(roles: List[dict])`: start/stop per‑role tasks based on server config.
    - `stop_all()`: cancel tasks and cleanup.
- Standard roles currently shipped:
  - `role_DeviceInventory.py` — collects and periodically posts device inventory/summary.
  - `role_Screenshot.py` — region overlay + periodic capture with WebSocket updates.
  - `role_ScriptExec_CURRENTUSER.py` — runs PowerShell in the logged‑in session and provides the tray icon (restart/quit).
  - `role_ScriptExec_SYSTEM.py` — runs PowerShell as SYSTEM via ephemeral Scheduled Tasks.
  - `role_Macro.py` — macro and key/text send helpers.
- Considerations:
  - SYSTEM role requires administrative rights to create/run scheduled tasks as SYSTEM. If elevation is unavailable or policies restrict task creation, SYSTEM jobs will fail gracefully and report errors to the server.
  - Roles are “hot‑loaded” on startup only (no dynamic import while running).
  - Roles must avoid blocking the main event loop and be resilient to restarts.

## Platform Parity
Windows is the reference environment today. `Borealis.ps1` owns the full deployment story, while `Borealis.sh` lags significantly and lacks the same packaging logic. Linux support needs feature parity (virtual environments, supervisor equivalents, and role loading) before macOS work resumes.

## Ansible Support (Unfinished — Do Not Use)
Important: The Ansible integration is not production‑ready. Do not rely on it for jobs, quick jobs, or troubleshooting. The current implementation is a work‑in‑progress and will change.

- Status
  - Agent and server contain early scaffolding for running playbooks and posting recap‑style output, but behavior is not reliable across Windows hosts.
  - Expect playbooks to stall, fail silently, or never deliver recaps/cancel events. Cancellation controls and live output are not guaranteed to function.
  - Packaging of Ansible dependencies and Windows collections is incomplete. Connection modes (local/PSRP/WinRM) are not fully exposed or managed.

- Known blockers (Windows)
  - ansible.windows.* modules require remoting (PSRP/WinRM) and typically cannot run with `connection: local` on the controller.
  - The SYSTEM service context is a poor fit for loopback remoting without explicit credentials/policy; this leads to no‑ops and “forever running” jobs.
  - Collection availability (e.g., `ansible.windows`) and interpreter/paths vary and are not yet normalized across agent installs.

- Near‑term guidance
  - Assume all Ansible and playbook‑related features are disabled for operational purposes.
  - Do not file bug reports for Ansible behavior; it is intentionally unfinished and unsupported at this time.

- Future direction (not started)
  - Database‑fed credential management (per device/site/global), stored securely and surfaced to playbook runs.
  - First‑class selection of connection types (local | PSRP | WinRM) from the UI and scheduler, with per‑run credential binding.
  - Reliable live output and cancel semantics; hardened recap ingestion and history.
  - Verified packaging of required Ansible components and Windows collections inside the agent venv.