Documentation Restructure
All checks were successful
GitOps Automatic Documentation Deployment / Sync Docs to https://kb.bunny-lab.io (push) Successful in 4s
GitOps Automatic Documentation Deployment / Sync Docs to https://docs.bunny-lab.io (push) Successful in 6s

This commit is contained in:
2026-01-27 05:25:22 -07:00
parent 3ea11e04ff
commit e73bb0376f
205 changed files with 469 additions and 146 deletions

View File

@@ -0,0 +1,72 @@
**Purpose**:
This document will outline the general workflow of using Visual Studio Code to author and update custom containers and push them to a container registry hosted in Gitea. This will be referencing the `git-repo-updater` project throughout.
!!! note "Assumptions"
This document assumes you are authoring the containers in Microsoft Windows, and does not include the fine-tuning necessary to work in Linux or MacOS environments. You are on your own if you want to author containers in Linux.
## Install Visual Studio Code
The management of the Gitea repositories, Dockerfile building, and pushing container images to the Gitea container registry will all involve using just Visual Studio Code. You can download Visual Studio Code from this [direct download link](https://code.visualstudio.com/docs/?dv=win64user).
## Configure Required Docker Extensions
You will need to locate and install the `Dev Containers`, `Docker`, and `WSL` extensions in Visual Studio Code to move forward. This may request that you install Docker Desktop onto your computer as part of the installation process. Proceed to do so, then when the Docker "Engine" is running, you can proceed to the next step.
!!! warning
You need to have Docker Desktop "Engine" running whenever working with containers, as it is necessary to build the images. VSCode will complain if it is not running.
## Add Gitea Container Registry
At this point, we need to add a registry to Visual Studio Code so it can proceed with pulling down the repository data.
- Click the Docker icon on the left-hand toolbar
- Under "**Registries**", click "**Connect Registry...**"
- In the dropdown menu that appears, click "**Generic Registry V2**"
- Enter `https://git.bunny-lab.io/container-registry`
- Registry Username: `nicole.rappe`
- Registry Password or Personal Access Token: `Personal Access API Token You Generated in Gitea`
- You will now see a sub-listing named "**Generic Registry V2**"
- If you click the dropdown, you will see "**https://git.bunny-lab.io/container-registry**"
- Under this section, you will see any containers in the registry that you have access to, in this case, you will see `container-registry/git-repo-updater`
## Add Source Control Repository
Now it is time to pull down the repository where the container's core elements are stored on Gitea.
- Click the "**Source Control**" button on the left-hand menu then click the "**Clone Repository**" button
- Enter `https://git.bunny-lab.io/container-registry/git-repo-updater.git`
- Click the dropdown menu option "**Clone from URL**" then choose a location to locally store the repository on your computer
- When prompted with "**Would you like to open the cloned repository**", click the "**Open**" button
## Making Changes
You will be presented with four files in this specific repository. `.env`, `docker-compose.yml`, `Dockerfile`, and `repo_watcher.sh`
- `.env` is the environment variables passed to the container to tell it which ntfy server to talk to, which credentials to use with Gitea, and which repositories to download and push into production servers
- `docker-compose.yml` is an example docker-compose file that can be used in Portainer to deploy the server along with the contents of the `.env` file
- `Dockerfile` is the base of the container, telling docker what operating system to use and how to start the script in the container
- `repo_watcher.sh` is the script called by the `Dockerfile` which loops checking for updates in Gitea repositories that were configured in the `.env` file
### Push to Repository
When you make any changes, you will need to first commit them to the repository
- Save all of the edited files
- Click the "**Source Control**" button in the toolbar
- Write a message about what you changed in the commit description field
- Click the "**Commit**" button
- Click the "**Sync Changes**" button that appears
- You may be presented with various dialogs, just click the equivalant of "**Yes/OK**" to each of them
### Build the Dockerfile
At this point, we need to build the dockerfile, which takes all of the changes and packages it into a container image
- Navigate back to the file explorer inside of Visual Studio Code
- Right-click the `Dockerfile`, then click "**Build Image...**"
- In the "Tag Image As..." window, type in `git.bunny-lab.io/container-registry/git-repo-updater:latest`
- When you navigate back to the Docker menu, you will see a new image appear under the "**Images**" section
- You should see something similar to "Latest - X Seconds Ago` indicating this is the image you just built
- Delete the older image(s) by right-clicking on them and selecting "**Remove...**"
- Push the image to the container registry in Gitea by right-clicking the latest image, and selecting "**Push...**"
- In the dropdown menu that appears, enter `git.bunny-lab.io/container-registry/git-repo-updater:latest`
- You can confirm if it was successful by navigating to the [Gitea Container Webpage](https://git.bunny-lab.io/container-registry/-/packages/container/git-repo-updater/latest) and seeing if it says "**Published Now**" or "**Published 1 Minute Ago**"
!!! warning "CRLF End of Line Sequences"
When you are editing files in the container's repository, you need to ensure that Visual Studio Code is editing that file in "**LF**" mode and not "**CRLF**". You can find this toggle at the bottom-right of the VSCode window. Simply clicking on the letters "**CRLF**" will let you toggle the file to "**LF**". If you do not make this change, the container will misunderstand the dockerfile and/or scripts inside of the container and have runtime errors.
## Deploy the Container
You can now use the `.env` file along with the `docker-compose.yml` file inside of Portainer to deploy a stack using the container you just built / updated.

View File

@@ -0,0 +1,107 @@
**Purpose**: Docker container running Alpine Linux that automates and improves upon much of the script mentioned in the [Git Repo Updater](../../../../reference/bash/git-repo-updater.md) document. It offers the additional benefits of checking for updates every 5 seconds instead of every 60 seconds. It also accepts environment variables to provide credentials and notification settings, and can have an infinite number of monitored repositories.
### Deployment
You can find the current up-to-date Gitea repository that includes the `docker-compose.yml` and `.env` files that you need to deploy everything [here](https://git.bunny-lab.io/container-registry/-/packages/container/git-repo-updater/latest)
```jsx title="docker-compose.yml"
version: '3.3'
services:
git-repo-updater:
privileged: true
container_name: git-repo-updater
env_file:
- stack.env
image: git.bunny-lab.io/container-registry/git-repo-updater:latest
volumes:
- /srv/containers:/srv/containers
- /srv/containers/git-repo-updater/Repo_Cache:/root/Repo_Cache
restart: always
```
```jsx title=".env"
# Gitea Credentials
GIT_USERNAME=nicole.rappe
GIT_PASSWORD=USE-AN-APP-PASSWORD
# NTFY Push Notification Server URL
NTFY_URL=https://ntfy.cyberstrawberry.net/git-repo-updater
# Repository/Destination Pairs (Add as Many as Needed)
REPO_01="https://${GIT_USERNAME}:${GIT_PASSWORD}@git.bunny-lab.io/bunny-lab/docs.git,/srv/containers/material-mkdocs/docs/docs"
REPO_02="https://${GIT_USERNAME}:${GIT_PASSWORD}@git.bunny-lab.io/GitOps/servers.bunny-lab.io.git,/srv/containers/homepage-docker"
```
### Build / Development
If you want to learn how the container was assembled, the related build files are located [here](https://git.cyberstrawberry.net/container-registry/git-repo-updater)
```jsx title="Dockerfile"
# Use Alpine as the base image of the container
FROM alpine:latest
# Install necessary packages
RUN apk --no-cache add git curl rsync
# Add script
COPY repo_watcher.sh /repo_watcher.sh
RUN chmod +x /repo_watcher.sh
#Create Directory to store Repositories
RUN mkdir -p /root/Repo_Cache
# Start script (Alpine uses /bin/sh instead of /bin/bash)
CMD ["/bin/sh", "-c", "/repo_watcher.sh"]
```
```jsx title="repo_watcher.sh"
#!/bin/sh
# Function to process each repo-destination pair
process_repo() {
FULL_REPO_URL=$1
DESTINATION=$2
# Extract the URL without credentials for logging and notifications
CLEAN_REPO_URL=$(echo "$FULL_REPO_URL" | sed 's/https:\/\/[^@]*@/https:\/\//')
# Directory to hold the repository locally
REPO_DIR="/root/Repo_Cache/$(basename $CLEAN_REPO_URL .git)"
# Clone the repo if it doesn't exist, or navigate to it if it does
if [ ! -d "$REPO_DIR" ]; then
curl -d "Cloning: $CLEAN_REPO_URL" $NTFY_URL
git clone "$FULL_REPO_URL" "$REPO_DIR" > /dev/null 2>&1
fi
cd "$REPO_DIR" || exit
# Fetch the latest changes
git fetch origin main > /dev/null 2>&1
# Check if the local repository is behind the remote
LOCAL=$(git rev-parse @)
REMOTE=$(git rev-parse @{u})
if [ "$LOCAL" != "$REMOTE" ]; then
curl -d "Updating: $CLEAN_REPO_URL" $NTFY_URL
git pull origin main > /dev/null 2>&1
rsync -av --delete --exclude '.git/' ./ "$DESTINATION" > /dev/null 2>&1
fi
}
# Main loop
while true; do
# Iterate over each environment variable matching 'REPO_[0-9]+'
env | grep '^REPO_[0-9]\+=' | while IFS='=' read -r name value; do
# Split the value by comma and read into separate variables
OLD_IFS="$IFS" # Save the original IFS
IFS=',' # Set IFS to comma for splitting
set -- $value # Set positional parameters ($1, $2, ...)
REPO_URL="$1" # Assign first parameter to REPO_URL
DESTINATION="$2" # Assign second parameter to DESTINATION
IFS="$OLD_IFS" # Restore original IFS
process_repo "$REPO_URL" "$DESTINATION"
done
# Wait for 5 seconds before the next iteration
sleep 5
done
```

View File

@@ -0,0 +1,56 @@
### Update The Package Manager
We need to update the server before installing Docker
=== "Ubuntu Server"
``` sh
sudo apt update
sudo apt upgrade -y
```
=== "Rocky Linux"
``` sh
sudo dnf check-update
```
### Deploy Docker
Install Docker then deploy Portainer
Convenience Script:
```
curl -fsSL https://get.docker.com | sudo sh
dockerd-rootless-setuptool.sh install
```
Alternative Methods:
=== "Ubuntu Server"
``` sh
sudo apt install docker.io -y
docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v /srv/containers/portainer:/data portainer/portainer-ee:latest # (1)
```
1. Be sure to set the `-v /srv/containers/portainer:/data` value to a safe place that gets backed up regularily.
=== "Rocky Linux"
``` sh
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable docker --now
docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v /srv/containers/portainer:/data portainer/portainer-ee:latest # (2)
```
1. This is needed to ensure that docker starts automatically every time the server is turned on.
2. Be sure to set the `-v /srv/containers/portainer:/data` value to a safe place that gets backed up regularily.
### Configure Docker Network
I highly recomment setting up a [Dedicated Docker MACVLAN Network](../../../networking/docker-networking/docker-networking.md). You can use it to keep your containers on their own subnet.
### Access Portainer WebUI
You will be able to access the Portainer WebUI at the following address: `https://<IP Address>:9443`
!!! warning
You need to be quick, as there is a timeout period where you wont be able to onboard / provision Portainer and will be forced to restart it's container. If this happens, you can find the container using `sudo docker container ls` proceeded by `sudo docker restart <ID of Portainer Container>`.

View File

@@ -0,0 +1,187 @@
# Deploy Generic Kubernetes
The instructions outlined below assume you are deploying the environment using Ansible Playbooks either via Ansible's CLI or AWX.
### Deploy K8S User
```jsx title="01-deploy-k8s-user.yml"
- hosts: 'controller-nodes, worker-nodes'
become: yes
tasks:
- name: create the k8sadmin user account
user: name=k8sadmin append=yes state=present createhome=yes shell=/bin/bash
- name: allow 'k8sadmin' to use sudo without needing a password
lineinfile:
dest: /etc/sudoers
line: 'k8sadmin ALL=(ALL) NOPASSWD: ALL'
validate: 'visudo -cf %s'
- name: set up authorized keys for the k8sadmin user
authorized_key: user=k8sadmin key="{{item}}"
with_file:
- ~/.ssh/id_rsa.pub
```
### Install K8S
```jsx title="02-install-k8s.yml"
---
- hosts: "controller-nodes, worker-nodes"
remote_user: nicole
become: yes
become_method: sudo
become_user: root
gather_facts: yes
connection: ssh
tasks:
- name: Create containerd config file
file:
path: "/etc/modules-load.d/containerd.conf"
state: "touch"
- name: Add conf for containerd
blockinfile:
path: "/etc/modules-load.d/containerd.conf"
block: |
overlay
br_netfilter
- name: modprobe
shell: |
sudo modprobe overlay
sudo modprobe br_netfilter
- name: Set system configurations for Kubernetes networking
file:
path: "/etc/sysctl.d/99-kubernetes-cri.conf"
state: "touch"
- name: Add conf for containerd
blockinfile:
path: "/etc/sysctl.d/99-kubernetes-cri.conf"
block: |
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
- name: Apply new settings
command: sudo sysctl --system
- name: install containerd
shell: |
sudo apt-get update && sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
- name: disable swap
shell: |
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
- name: install and configure dependencies
shell: |
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
- name: Create kubernetes repo file
file:
path: "/etc/apt/sources.list.d/kubernetes.list"
state: "touch"
- name: Add K8s Source
blockinfile:
path: "/etc/apt/sources.list.d/kubernetes.list"
block: |
deb https://apt.kubernetes.io/ kubernetes-xenial main
- name: Install Kubernetes
shell: |
sudo apt-get update
sudo apt-get install -y kubelet=1.20.1-00 kubeadm=1.20.1-00 kubectl=1.20.1-00
sudo apt-mark hold kubelet kubeadm kubectl
```
### Configure ControlPlanes
```jsx title="03-configure-controllers.yml"
- hosts: controller-nodes
become: yes
tasks:
- name: Initialize the K8S Cluster
shell: kubeadm init --pod-network-cidr=10.244.0.0/16
args:
chdir: $HOME
creates: cluster_initialized.txt
- name: Create .kube directory
become: yes
become_user: k8sadmin
file:
path: /home/k8sadmin/.kube
state: directory
mode: 0755
- name: Copy admin.conf to user's kube config
copy:
src: /etc/kubernetes/admin.conf
dest: /home/k8sadmin/.kube/config
remote_src: yes
owner: k8sadmin
- name: Install the Pod Network
become: yes
become_user: k8sadmin
shell: kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
args:
chdir: $HOME
- name: Get the token for joining the worker nodes
become: yes
become_user: k8sadmin
shell: kubeadm token create --print-join-command
register: kubernetes_join_command
- name: Output Join Command to the Screen
debug:
msg: "{{ kubernetes_join_command.stdout }}"
- name: Copy join command to local file.
become: yes
local_action: copy content="{{ kubernetes_join_command.stdout_lines[0] }}" dest="/tmp/kubernetes_join_command" mode=0777
```
### Join Worker Node(s)
```jsx title="04-join-worker-nodes.yml"
- hosts: worker-nodes
become: yes
gather_facts: yes
tasks:
- name: Copy join command from Ansible host to the worker nodes.
become: yes
copy:
src: /tmp/kubernetes_join_command
dest: /tmp/kubernetes_join_command
mode: 0777
- name: Join the Worker nodes to the cluster.
become: yes
command: sh /tmp/kubernetes_join_command
register: joined_or_not
```
### Host Inventory File Template
```jsx title="hosts"
[controller-nodes]
k8s-ctrlr-01 ansible_host=192.168.3.6 ansible_user=nicole
[worker-nodes]
k8s-node-01 ansible_host=192.168.3.4 ansible_user=nicole
k8s-node-02 ansible_host=192.168.3.5 ansible_user=nicole
[all:vars]
ansible_become_user=root
ansible_become_method=sudo
```

View File

@@ -0,0 +1,218 @@
# Deploy RKE2 Cluster
Deploying a Rancher RKE2 Cluster is fairly straightforward. Just run the commands in-order and pay attention to which steps apply to all machines in the cluster, the controlplanes, and the workers.
!!! note "Prerequisites"
This document assumes you are running **Ubuntu Server 24.04.3 LTS**. It also assumes that every node in the cluster has a unique hostname.
## All Cluster Nodes
Assume all commands are running as root moving forward. (e.g. `sudo su`)
### Run Updates
You will need to run these commands on every server that participates in the cluster then perform a reboot of the server **PRIOR** to moving onto the next section.
``` sh
apt update && apt upgrade -y
apt install nfs-common iptables nano htop -y
echo "Adding 15 Second Delay to Ensure Previous Commands finish running"
sleep 15
apt autoremove -y
reboot
```
!!! tip
If this is a virtual machine, now would be the best time to take a checkpoint / snapshot of the VM before moving forward, in case you need to perform rollbacks of the server(s) if you accidentally misconfigure something.
## Initial ControlPlane Node
When you are starting a brand new cluster, you need to create what is referred to as the "Initial ControlPlane". This node is responsible for bootstrapping the entire cluster together in the beginning, and will eventually assist in handling container workloads and orchestrating operations in the cluster.
!!! warning
You only want to follow the instructions for the **initial** controlplane once. Running it on another machine to create additional controlplanes will cause the cluster to try to set up two different clusters, wrecking havok. Instead, follow the instructions in the next section to add redundant controlplanes.
### Download the Run Server Deployment Script
```
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
```
### Enable & Configure Services
``` sh
# Start and Enable the Kubernetes Service
systemctl enable --now rke2-server.service
# Symlink the Kubectl Management Command
ln -s $(find /var/lib/rancher/rke2/data/ -name kubectl) /usr/local/bin/kubectl
# Temporarily Export the Kubeconfig to manage the cluster from CLI during initial deployment.
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
# Add a Delay to Allow Cluster to Finish Initializing / Get Ready
echo "Adding 60 Second Delay to Ensure Cluster is Ready - Run (kubectl get node) if the server is still not ready to know when to proceed."
sleep 60
# Check that the Cluster Node is Running and Ready
kubectl get node
```
!!! example
When the cluster is ready, you should see something like this when you run `kubectl get node`
This may be a good point to step away for 5 minutes, get a cup of coffee, and come back so it has a little extra time to be fully ready before moving on.
```
root@awx:/home/nicole# kubectl get node
NAME STATUS ROLES AGE VERSION
awx Ready control-plane,etcd,master 3m21s v1.26.12+rke2r1
```
### Install Helm, Rancher, CertManager, Jetstack, Rancher, and Longhorn
``` sh
# Install Helm
curl -L https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-4 | bash
# Install Necessary Helm Repositories
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm repo add jetstack https://charts.jetstack.io
helm repo add longhorn https://charts.longhorn.io
helm repo update
# Install Cert-Manager via Helm
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.19.2/cert-manager.crds.yaml
# Install Jetstack via Helm
helm upgrade -i cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace
# Install Rancher via Helm
helm upgrade -i rancher rancher-latest/rancher --create-namespace --namespace cattle-system --set hostname=rke2-cluster.bunny-lab.io --set bootstrapPassword=bootStrapAllTheThings --set replicas=1
# Install Longhorn via Helm
helm upgrade -i longhorn longhorn/longhorn --namespace longhorn-system --create-namespace
```
!!! example "Be Patient - Come back in 20 Minutes"
Rancher is going to take a while to fully set itself up, things will appear broken. Depending on how many resources you gave the cluster, it may take longer or shorter. A good ballpark is giving it at least 20 minutes to deploy itself before attempting to log into the webUI at https://awx.bunny-lab.io.
If you want to keep an eye on the deployment progress, you need to run the following command: `KUBECONFIG=/etc/rancher/rke2/rke2.yaml kubectl get pods --all-namespaces`
The output should look like how it does below:
```
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-system fleet-controller-59cdb866d7-94r2q 1/1 Running 0 4m31s
cattle-fleet-system gitjob-f497866f8-t726l 1/1 Running 0 4m31s
cattle-provisioning-capi-system capi-controller-manager-6f87d6bd74-xx22v 1/1 Running 0 55s
cattle-system helm-operation-28dcp 0/2 Completed 0 109s
cattle-system helm-operation-f9qww 0/2 Completed 0 4m39s
cattle-system helm-operation-ft8gq 0/2 Completed 0 26s
cattle-system helm-operation-m27tq 0/2 Completed 0 61s
cattle-system helm-operation-qrgj8 0/2 Completed 0 5m11s
cattle-system rancher-64db9f48c-qm6v4 1/1 Running 3 (8m8s ago) 13m
cattle-system rancher-webhook-65f5455d9c-tzbv4 1/1 Running 0 98s
cert-manager cert-manager-55cf8685cb-86l4n 1/1 Running 0 14m
cert-manager cert-manager-cainjector-fbd548cb8-9fgv4 1/1 Running 0 14m
cert-manager cert-manager-webhook-655b4d58fb-s2cjh 1/1 Running 0 14m
kube-system cloud-controller-manager-awx 1/1 Running 5 (3m37s ago) 19m
kube-system etcd-awx 1/1 Running 0 19m
kube-system helm-install-rke2-canal-q9vm6 0/1 Completed 0 19m
kube-system helm-install-rke2-coredns-q8w57 0/1 Completed 0 19m
kube-system helm-install-rke2-ingress-nginx-54vgk 0/1 Completed 0 19m
kube-system helm-install-rke2-metrics-server-87zhw 0/1 Completed 0 19m
kube-system helm-install-rke2-snapshot-controller-crd-q6bh6 0/1 Completed 0 19m
kube-system helm-install-rke2-snapshot-controller-tjk5f 0/1 Completed 0 19m
kube-system helm-install-rke2-snapshot-validation-webhook-r9pcn 0/1 Completed 0 19m
kube-system kube-apiserver-awx 1/1 Running 0 19m
kube-system kube-controller-manager-awx 1/1 Running 5 (3m37s ago) 19m
kube-system kube-proxy-awx 1/1 Running 0 19m
kube-system kube-scheduler-awx 1/1 Running 5 (3m35s ago) 19m
kube-system rke2-canal-gm45f 2/2 Running 0 19m
kube-system rke2-coredns-rke2-coredns-565dfc7d75-qp64p 1/1 Running 0 19m
kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-fclz5 1/1 Running 0 19m
kube-system rke2-ingress-nginx-controller-lhjwq 1/1 Running 0 17m
kube-system rke2-metrics-server-c9c78bd66-fnvx8 1/1 Running 0 18m
kube-system rke2-snapshot-controller-6f7bbb497d-dw6v4 1/1 Running 4 (6m17s ago) 18m
kube-system rke2-snapshot-validation-webhook-65b5675d5c-tdfcf 1/1 Running 0 18m
longhorn-system csi-attacher-785fd6545b-6jfss 1/1 Running 1 (6m17s ago) 9m39s
longhorn-system csi-attacher-785fd6545b-k7jdh 1/1 Running 0 9m39s
longhorn-system csi-attacher-785fd6545b-rr6k4 1/1 Running 0 9m39s
longhorn-system csi-provisioner-8658f9bd9c-58dc8 1/1 Running 0 9m38s
longhorn-system csi-provisioner-8658f9bd9c-g8cv2 1/1 Running 0 9m38s
longhorn-system csi-provisioner-8658f9bd9c-mbwh2 1/1 Running 0 9m38s
longhorn-system csi-resizer-68c4c75bf5-d5vdd 1/1 Running 0 9m36s
longhorn-system csi-resizer-68c4c75bf5-r96lf 1/1 Running 0 9m36s
longhorn-system csi-resizer-68c4c75bf5-tnggs 1/1 Running 0 9m36s
longhorn-system csi-snapshotter-7c466dd68f-5szxn 1/1 Running 0 9m30s
longhorn-system csi-snapshotter-7c466dd68f-w96lw 1/1 Running 0 9m30s
longhorn-system csi-snapshotter-7c466dd68f-xt42z 1/1 Running 0 9m30s
longhorn-system engine-image-ei-68f17757-jn986 1/1 Running 0 10m
longhorn-system instance-manager-fab02be089480f35c7b2288110eb9441 1/1 Running 0 10m
longhorn-system longhorn-csi-plugin-5j77p 3/3 Running 0 9m30s
longhorn-system longhorn-driver-deployer-75fff9c757-dps2j 1/1 Running 0 13m
longhorn-system longhorn-manager-2vfr4 1/1 Running 4 (10m ago) 13m
longhorn-system longhorn-ui-7dc586665c-hzt6k 1/1 Running 0 13m
longhorn-system longhorn-ui-7dc586665c-lssfj 1/1 Running 0 13m
```
!!! note
Be sure to write down the "*bootstrapPassword*" variable for when you log into Rancher later. In this example, the password is `bootStrapAllTheThings`.
Also be sure to adjust the "*hostname*" variable to reflect the FQDN of the cluster. You can leave it default like this and change it upon first login if you want. This is important for the last step where you adjust DNS. The example given is `rke2-cluster.bunny-lab.io`.
### Log into webUI
At this point, you can log into the webUI at https://rke2-cluster.bunny-lab.io using the default `bootStrapAllTheThings` password, or whatever password you configured, you can change the password after logging in if you need to by navigating to **Home > Users & Authentication > "..." > Edit Config > "New Password" > Save**. From here, you can deploy more nodes, or deploy single-node workloads such as an Ansible AWX Operator.
### Rebooting the ControlNode
If you ever find yourself needing to reboot the ControlNode, and need to run kubectl CLI commands, you will need to run the command below to import the cluster credentials upon every reboot. Reboots should take much less time to get the cluster ready again as compared to the original deployments.
```
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
```
## Create Additional ControlPlane Node(s)
This is the part where you can add additional controlplane nodes to add additional redundancy to the RKE2 Cluster. This is important for high-availability environments.
### Download the Server Deployment Script
``` sh
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
```
### Configure and Connect to Existing/Initial ControlPlane Node
``` sh
# Symlink the Kubectl Management Command
ln -s $(find /var/lib/rancher/rke2/data/ -name kubectl) /usr/local/bin/kubectl
# Manually Create a Rancher-Kubernetes-Specific Config File
mkdir -p /etc/rancher/rke2/
# Inject IP of Initial ControlPlane Node into Config File
echo "server: https://192.168.3.69:9345" > /etc/rancher/rke2/config.yaml
# Inject the Initial ControlPlane Node trust token into the config file
# You can get the token by running the following command on the first node in the cluster: `cat /var/lib/rancher/rke2/server/node-token`
echo "token: K10aa0632863da4ae4e2ccede0ca6a179f510a0eee0d6d6eb53dca96050048f055e::server:3b130ceebfbb7ed851cd990fe55e6f3a" >> /etc/rancher/rke2/config.yaml
# Start and Enable the Kubernetes Service
systemctl enable --now rke2-server.service
```
!!! note
Be sure to change the IP address of the initial controlplane node provided in the example above to match your environment.
## Add Worker Node(s)
Worker nodes are the bread-and-butter of a Kubernetes cluster. They handle running container workloads, and acting as storage for the cluster (this can be configured to varying degrees based on your needs).
### Download the Server Worker Script
``` sh
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
```
### Configure and Connect to RKE2 Cluster
``` sh
# Manually Create a Rancher-Kubernetes-Specific Config File
mkdir -p /etc/rancher/rke2/
# Inject IP of Initial ControlPlane Node into Config File
echo "server: https://192.168.3.21:9345" > /etc/rancher/rke2/config.yaml
# Inject the Initial ControlPlane Node trust token into the config file
# You can get the token by running the following command on the first node in the cluster: `cat /var/lib/rancher/rke2/server/node-token`
echo "token: K10aa0632863da4ae4e2ccede0ca6a179f510a0eee0d6d6eb53dca96050048f055e::server:3b130ceebfbb7ed851cd990fe55e6f3a" >> /etc/rancher/rke2/config.yaml
# Start and Enable the Kubernetes Service**
systemctl enable --now rke2-agent.service
```
## DNS Server Record
You will need to set up some kind of DNS server record to point the FQDN of the cluster (e.g. `rke2-cluster.bunny-lab.io`) to the IP address of the Initial ControlPlane. This can be achieved in a number of ways, such as editing the Windows `HOSTS` file, Linux's `/etc/resolv.conf` file, a Windows DNS Server "A" Record, or an NGINX/Traefik Reverse Proxy.
Once you have added the DNS record, you should be able to access the login page for the Rancher RKE2 Kubernetes cluster. Use the `bootstrapPassword` mentioned previously to log in, then change it immediately from the user management area of Rancher.
| TYPE OF ACCESS | FQDN | IP ADDRESS |
| -------------- | ------------------------------------- | ------------ |
| HOST FILE | rke2-cluster.bunny-lab.io | 192.168.3.69 |
| REVERSE PROXY | http://rke2-cluster.bunny-lab.io:80 | 192.168.5.29 |
| DNS RECORD | A Record: rke2-cluster.bunny-lab.io | 192.168.3.69 |

View File

@@ -0,0 +1,28 @@
AWX:
enabled: true
name: awx
postgres:
dbName: Unset
enabled: false
host: Unset
password: Unset
port: 5678
sslmode: prefer
type: unmanaged
username: admin
spec:
admin_user: admin
admin_email: cyberstrawberry101@gmail.com
auto_upgrade: true
hostname: awx.cyberstrawberry.net
ingress_path: /
ingress_path_type: Prefix
ingress_type: ingress
ipv6_disabled: true
projects_persistence: true
projects_storage_class: longhorn
projects_storage_size: 32Gi
task_privileged: true
global:
cattle:
systemProjectId: p-78f96

View File

@@ -0,0 +1,2 @@
awx-operator
https://ansible.github.io/awx-operator/

View File

@@ -0,0 +1,25 @@
krb5.conf
--------------------------------------------
[libdefaults]
default_realm = MOONGATE.LOCAL
dns_lookup_realm = true
dns_lookup_kdc = true
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_ccache_name = KEYRING:persistent:%{uid}
[realms]
MOONGATE.LOCAL = {
kdc = NEXUS-DC-01.MOONGATE.LOCAL
admin_server = NEXUS-DC-01.MOONGATE.LOCAL
}
[domain_realm]
.moongate.local = MOONGATE.LOCAL
moongate.local = MOONGATE.LOCAL
--------------------------------------------

View File

@@ -0,0 +1,158 @@
affinity: {}
checkDeprecation: true
clusterDomain: cluster.local
containerSecurityContext: {}
dnsConfig: {}
extraContainerVolumeMounts: []
extraInitVolumeMounts: []
extraVolumeMounts: []
extraVolumes: []
gitea:
additionalConfigFromEnvs:
- name: ENV_TO_INI__SERVER__ROOT_URL
value: https://git.cyberstrawberry.net
additionalConfigSources: []
admin:
email: cyberstrawberry101@gmail.com
existingSecret: null
password: SUPER-SECRET-ADMIN-PASSWORD-THAT-NOONE-WILL-GUESS
username: nicole.rappe
config:
APP_NAME: "CyberStrawberry"
ldap: []
livenessProbe:
enabled: true
failureThreshold: 10
initialDelaySeconds: 200
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
metrics:
enabled: false
serviceMonitor:
enabled: false
oauth: []
podAnnotations: {}
readinessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
ssh:
logLevel: INFO
startupProbe:
enabled: false
failureThreshold: 10
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
global:
hostAliases: []
imagePullSecrets: []
imageRegistry: ''
storageClass: longhorn
image:
pullPolicy: Always
registry: ''
repository: gitea/gitea
rootless: false
tag: ''
imagePullSecrets: []
ingress:
annotations: {}
className: null
enabled: false
hosts:
- host: git.cyberstrawberry.net
paths:
- path: /
pathType: Prefix
tls: []
initPreScript: ''
memcached:
enabled: true
service:
ports:
memcached: 11211
nodeSelector: {}
persistence:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: true
existingClaim: null
labels: {}
size: 32Gi
storageClass: null
subPath: null
podSecurityContext:
fsGroup: 1000
postgresql:
enabled: true
global:
postgresql:
auth:
database: gitea
password: gitea
username: gitea
service:
ports:
postgresql: 5432
primary:
persistence:
size: 32Gi
replicaCount: 1
resources: {}
schedulerName: ''
securityContext: {}
service:
http:
annotations: {}
clusterIP: None
externalIPs: null
externalTrafficPolicy: null
ipFamilies: null
ipFamilyPolicy: null
loadBalancerIP: null
loadBalancerSourceRanges: []
nodePort: null
port: 3000
type: ClusterIP
ssh:
annotations: {}
clusterIP: None
externalIPs: null
externalTrafficPolicy: null
hostPort: null
ipFamilies: null
ipFamilyPolicy: null
loadBalancerIP: null
loadBalancerSourceRanges: []
nodePort: null
port: 22
type: ClusterIP
signing:
enabled: false
existingSecret: ''
gpgHome: /data/git/.gnupg
privateKey: ''
statefulset:
annotations: {}
env: []
labels: {}
terminationGracePeriodSeconds: 60
test:
enabled: true
image:
name: busybox
tag: latest
tolerations: []

View File

@@ -0,0 +1,194 @@
affinity: {}
cronjob:
enabled: false
lifecycle: {}
resources: {}
securityContext: {}
deploymentAnnotations: {}
deploymentLabels: {}
externalDatabase:
database: nextcloud
enabled: true
existingSecret:
enabled: false
host: cluster-nextcloud-postgresql
password: SecurePasswordGoesHere
type: postgresql
user: nextcloud
fullnameOverride: ''
hpa:
cputhreshold: 60
enabled: false
maxPods: 10
minPods: 1
image:
pullPolicy: IfNotPresent
repository: nextcloud
ingress:
annotations: {}
enabled: false
labels: {}
path: /
pathType: Prefix
internalDatabase:
enabled: false
name: nextcloud
lifecycle: {}
livenessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
mariadb:
architecture: standalone
auth:
database: nextcloud
password: changeme
username: nextcloud
enabled: false
primary:
persistence:
accessMode: ReadWriteOnce
enabled: false
size: 8Gi
metrics:
enabled: false
https: false
image:
pullPolicy: IfNotPresent
repository: xperimental/nextcloud-exporter
tag: 0.6.0
replicaCount: 1
service:
annotations:
prometheus.io/port: '9205'
prometheus.io/scrape: 'true'
labels: {}
type: ClusterIP
serviceMonitor:
enabled: false
interval: 30s
jobLabel: ''
labels: {}
namespace: ''
scrapeTimeout: ''
timeout: 5s
tlsSkipVerify: false
token: ''
nameOverride: ''
nextcloud:
configs: {}
datadir: /var/www/html/data
defaultConfigs:
.htaccess: true
apache-pretty-urls.config.php: true
apcu.config.php: true
apps.config.php: true
autoconfig.php: true
redis.config.php: true
smtp.config.php: true
existingSecret:
enabled: false
extraEnv: null
extraInitContainers: []
extraSidecarContainers: []
extraVolumeMounts: null
extraVolumes: null
host: storage.cyberstrawberry.net
mail:
domain: domain.com
enabled: false
fromAddress: user
smtp:
authtype: LOGIN
host: domain.com
name: user
password: pass
port: 465
secure: ssl
password: SUPER-SECRET-PASSWORD-FOR-ADMIN
persistence:
subPath: null
phpConfigs: {}
podSecurityContext: {}
securityContext: {}
strategy:
type: Recreate
update: 0
username: Nicole
nginx:
config:
default: true
enabled: false
image:
pullPolicy: IfNotPresent
repository: nginx
tag: alpine
resources: {}
securityContext: {}
nodeSelector: {}
persistence:
accessMode: ReadWriteOnce
annotations: {}
enabled: true
nextcloudData:
accessMode: ReadWriteOnce
annotations: {}
enabled: true
size: 800Gi
subPath: null
size: 16Gi
phpClientHttpsFix:
enabled: true
protocol: https
podAnnotations: {}
postgresql:
enabled: true
global:
postgresql:
auth:
database: nextcloud
password: SUPER-SECRET-PASSWORD-FOR-DB
username: nextcloud
primary:
persistence:
enabled: true
rbac:
enabled: false
serviceaccount:
annotations: {}
create: true
name: nextcloud-serviceaccount
readinessProbe:
enabled: true
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
redis:
auth:
enabled: true
password: changeme
enabled: false
replicaCount: 1
resources: {}
securityContext: {}
service:
loadBalancerIP: nil
nodePort: nil
port: 8080
type: ClusterIP
startupProbe:
enabled: false
failureThreshold: 30
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
tolerations: []
global:
cattle:
systemProjectId: p-78f96

View File

@@ -0,0 +1,268 @@
# Migrating `docker-compose.yml` to Rancher RKE2 Cluster
You may be comfortable operating with Portainer or `docker-compose`, but there comes a point where you might want to migrate those existing workloads to a Kubernetes cluster as easily-as-possible. Lucklily, there is a way to do this using a tool called "**Kompose**'. Follow the instructions seen below to convert and deploy your existing `docker-compose.yml` into a Kubernetes cluster such as Rancher RKE2.
!!! info "RKE2 Cluster Deployment"
This document assumes that you have an existing Rancher RKE2 cluster deployed. If not, you can deploy one following the [Deploy RKE2 Cluster](./deployment/rancher-rke2.md) documentation.
We also assume that the cluster name within Rancher RKE2 is named `local`, which is the default cluster name when setting up a Kubernetes Cluster in the way seen in the above documentation.
## Installing Kompose
The first step involves downloading Kompose from https://kompose.io/installation. Once you have it downloaded and installed onto your environment of choice, save a copy of your `docker-compose.yml` file somewhere on-disk, then open up a terminal and run the following command:
```sh
kompose --file docker-compose.yaml convert --stdout > ntfy-k8s.yaml
```
This will attempt to convert the `docker-compose.yml` file into a Kubernetes manifest YAML file. The Before and after example can be seen below:
=== "(Original) docker-compose.yml"
``` yaml
version: "2.1"
services:
ntfy:
image: binwiederhier/ntfy
container_name: ntfy
command:
- serve
environment:
- NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
- NTFY_BASE_URL=https://ntfy.bunny-lab.io
- TZ=America/Denver # optional: Change to your desired timezone
#user: UID:GID # optional: Set custom user/group or uid/gid
volumes:
- /srv/containers/ntfy/cache:/var/cache/ntfy
- /srv/containers/ntfy/etc:/etc/ntfy
ports:
- 80:80
restart: always
networks:
docker_network:
ipv4_address: 192.168.5.45
networks:
default:
external:
name: docker_network
docker_network:
external: true
```
=== "(Converted) ntfy-k8s.yaml"
``` yaml
---
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe --file ntfy-k8s.yaml convert --stdout
kompose.version: 1.37.0 (fb0539e64)
labels:
io.kompose.service: ntfy
name: ntfy
spec:
ports:
- name: "80"
port: 80
targetPort: 80
selector:
io.kompose.service: ntfy
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe --file ntfy-k8s.yaml convert --stdout
kompose.version: 1.37.0 (fb0539e64)
labels:
io.kompose.service: ntfy
name: ntfy
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: ntfy
strategy:
type: Recreate
template:
metadata:
annotations:
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe --file ntfy-k8s.yaml convert --stdout
kompose.version: 1.37.0 (fb0539e64)
labels:
io.kompose.service: ntfy
spec:
containers:
- args:
- serve
env:
- name: NTFY_ATTACHMENT_CACHE_DIR
value: /var/lib/ntfy/attachments
- name: NTFY_BASE_URL
value: https://ntfy.bunny-lab.io
- name: TZ
value: America/Denver
image: binwiederhier/ntfy
name: ntfy
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- mountPath: /var/cache/ntfy
name: ntfy-claim0
- mountPath: /etc/ntfy
name: ntfy-claim1
restartPolicy: Always
volumes:
- name: ntfy-claim0
persistentVolumeClaim:
claimName: ntfy-claim0
- name: ntfy-claim1
persistentVolumeClaim:
claimName: ntfy-claim1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
io.kompose.service: ntfy-claim0
name: ntfy-claim0
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
io.kompose.service: ntfy-claim1
name: ntfy-claim1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
```
## Deploy Workload into Rancher RKE2 Cluster
At this point, you need to import the yaml file you created into the Kubernetes cluster. This will occur in four sequential stages:
- Setting up a "**Project**" to logically organize your containers
- Setting up a "**Namespace**" for your container to isolate it from other containers in your Kubernetes cluster
- Importing the YAML file into the aforementioned namespace
- Configuring Ingress to allow external access to the container / service stack.
### Create a Project
The purpose of the project is to logically organize your services together. This can be something like `Home Automation`, `Log Analysis Systems`, `Network Tools`, etc. You can do this by logging into your Rancher RKE2 cluster (e.g. https://rke2-cluster.bunny-lab.io). This Project name is unique to Rancher and purely used for organizational purposes and does not affect the namespaces / containers in any way.
- Navigate to: **Clusters > `local` > Cluster > Projects/Namespaces > "Create Project"**
- **Name**: <Friendly Name> (e.g. `Home Automation`)
- **Description**: <Useful Description for the Group of Services> (e.g. `Various services that automate things within Bunny Lab`)
- Click the "**Create**" button
### Create a Namespace within the Project
At this point, we need to create a namespace. This basically isolates the networking, credentials, secrets, and storage between the services/stacks. This ensures that if someone exploits one of your services, they will not be able to laterally move into another service within the same Kubernetes cluster.
- Navigate to: **Clusters > `local` > Cluster > Projects/Namespaces > <ProjectName> > "Create Namespace"**
- The name for the namespace should be named based on its operational-context, such as `prod-ntfy` or `dev-ntfy`.
### Import Converted YAML Manifest into Namespace
At this point, we can now proceed to import the YAML file we generated in the beginning of this document.
- Navigate to: **Clusters > `local` > Cluster > Projects/Namespaces**
- At the top-right of the screen will be an upload / up-arrow button with tooltip text stating "Import YAML" > Click on this button
- Click the "**Read from File**" button
- Navigate to your `ntfy-k8s.yaml` file. (Name will differ from your actual converted file) > then click the "**Open**" button.
- On the top-right of the dialog box will be a "**Default Namespace**" dropdown menu, select the `prod-ntfy` namespace we created earlier.
- Click the blue "**Import** button at the bottom of the dialog box.
!!! warning "Be Patient"
This part of the process can take a while depending on the container stack and complexity of the service. It has to download container images and deploy them into newly spun-up pods within Kubernetes. Just be patient and click on the `prod-ntfy` namespace, then look at the "**Workloads**" tab to see if the "ntfy" service exists and is Active, then you can move onto the next step.
### Configuring Ingress
This final step within Kubernetes itself involves reconfiguring the container to list via a "NodePort" instead of "ClusterIP". Don't worry, you do not have to mangle with the ports that the container uses, this is entirely within Kubernetes itself and does not make changes to the original `docker-compose.yml` ports of the container(s) you imported.
- Navigate to: **Clusters > `local` > Service Discovery > Services > ntfy**
- On the top-right, click on the blue "**Show Configuration**" button
- On the bottom-right, click the blue "**Edit Config**" button
- On the bottom-right, click the "**Edit as YAML**" button
- Within the yaml editor, you will see a section named `spec:`, within that section is a subsection named `type:`. You will see a value of `type: ClusterIP` > You want to change that to `type: NodePort`
- On the bottom-right, click the blue "**Save**" button and wait for the process to finish.
- On the new page that appears, click on the `ntfy` service again
- Click on the "**Ports**" tab
- You will see a column of the table labeled "Node Port" with a number in the 30,000s such as `30996`. This will be important for later.
!!! success "Verifying Access Before Configuring Reverse Proxy"
At this point, you will want to verify that you can access the service via the cluster node IP addresses such as the examples seen below, all of the cluster nodes should route the traffic to the container's service and will be used for load-balancing later in the reverse proxy configuration file.
- http://192.168.3.69:30996
- http://192.168.3.70:30996
- http://192.168.3.71:30996
- http://192.168.3.72:30996
## Configuring Reverse Proxy
If you were able to successfully verify access to the service when talking to it directly via one of the cluster node IP addresses with its given Node Port port number, then you can proceed to creating a reverse proxy configuration file for the service. This will be very similar to the original `docker-compose.yml` version of the reverse proxy configuration file, but with additional IP addresses to load-balance across the Kubernetes cluster nodes.
!!! info "Section Considerations"
This section of the document does not (*currently*) cover the process of setting up health checks to ensure that the load-balanced server destinations in the reverse proxy are online before redirecting traffic to them. This is on my to-do list of things to implement to further harden the deployment process.
This section also does not cover the process of setting up a reverse proxy. If you want to follow along with this document, you can deploy a Traefik reverse proxy via the [Traefik](../../../services/edge/traefik.md) deployment documentation.
With the above considerations in-mind, we just need to make some small changes to the existing Traefik configuration file to ensure that it load-balanced across every node of the cluster to ensure high-availability functions as-expected.
=== "(Original) ntfy.bunny-lab.io.yml"
``` yaml
http:
routers:
ntfy:
entryPoints:
- websecure
tls:
certResolver: letsencrypt
service: ntfy
rule: Host(`ntfy.bunny-lab.io`)
services:
ntfy:
loadBalancer:
passHostHeader: true
servers:
- url: http://192.168.5.45:80
```
=== "(Updated) ntfy.bunny-lab.io.yml"
``` yaml
http:
routers:
ntfy:
entryPoints:
- websecure
tls:
certResolver: letsencrypt
service: ntfy
rule: Host(`ntfy.bunny-lab.io`)
services:
ntfy:
loadBalancer:
passHostHeader: true
servers:
- url: http://192.168.3.69:30996
- url: http://192.168.3.70:30996
- url: http://192.168.3.71:30996
- url: http://192.168.3.72:30996
```
!!! success "Verify Access via Reverse Proxy"
If everything worked, you should be able to access the service at https://ntfy.bunny-lab.io, and if one of the cluster nodes goes offline, Rancher will automatically migrate the load to another cluster node which will take over the web request.

30
platforms/index.md Normal file
View File

@@ -0,0 +1,30 @@
# Platforms
## Purpose
Virtualization and containerization platforms, cluster builds, and base OS images.
## Includes
- Hypervisors and virtualization stacks
- Kubernetes and Docker foundations
- Base image and cluster provisioning patterns
## New Document Template
````markdown
# <Document Title>
## Purpose
<what this platform doc exists to describe>
!!! info "Assumptions"
- <OS / platform version>
- <privilege assumptions>
## Architectural Overview
<ASCII diagram or concise flow>
## Procedure
```sh
# Commands (grouped and annotated)
```
## Validation
- <command + expected result>
````

View File

@@ -0,0 +1,119 @@
**Purpose**: Deploying a Windows Server Node into the Hyper-V Failover Cluster is an essential part of rebuilding and expanding the backbone of my homelab. The documentation below goes over the process of setting up a bare-metal host from scratch and integrating it into the Hyper-V Failover Cluster.
!!! note "Prerequisites & Assumptions"
This document assumes you are have installed and are running a bare-metal Hewlett-Packard Enterprise server with iLO (Integrated Lights Out) with the latest build of **Windows Server 2022 Datacenter (Desktop Experience)**.
This document also assumes that you are adding an additional server node to an existing Hyper-V Failover Cluster. This document does not outline the exact process of setting up a Hyper-V Failover Cluster from-scratch, setting up a domain, DNS server, etc. Those are assumed to already exist in the environment. Your domain controller(s) need to be online and accessible from the Failover Cluster node you are building for things to work correctly.
Download the newest build ISO of Windows Server 2022 at the [Microsoft Evaluation Center](https://go.microsoft.com/fwlink/p/?linkid=2195686&clcid=0x409&culture=en-us&country=us)
### Enable Remote Desktop
Enable remote desktop however you can, but just be sure to disable NLA, see the notes below for details.
!!! warning "Disable NLA (Network Level Authentication)"
Ensure that "Allow Connections only from computers running Remote Desktop with Network Level Authentication" is un-checked. This is important because if you are running a Hyper-V Failover Cluster, if the domain controller(s) are not running, you may be effectively locked out from using Remote Desktop to access the failover cluster's nodes, forcing you to use iLO or a physical console into the server to log in and bootstrap the cluster's Guest VMs online.
This step can be disregarded if the domain controller(s) exist outside of the Hyper-V Failover Cluster.
``` powershell
# Enable Remote Desktop (NLA-Disabled)
Set-ItemProperty -Path "HKLM:\System\CurrentControlSet\Control\Terminal Server" -Name "fDenyTSConnections" -Value 0
Set-ItemProperty -Path "HKLM:\System\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp" -Name "UserAuthentication" -Value 0 Enable-NetFirewallRule -DisplayGroup "Remote Desktop"
```
### Provision Server Roles, Activate, and Domain Join
``` powershell
# Rename the server
Rename-Computer BUNNY-NODE-02
# Install Hyper-V, Failover, and MPIO Server Roles
Install-WindowsFeature -Name Hyper-V, Failover-Clustering, Multipath-IO -IncludeManagementTools
# Change edition of Windows (Then Reboot)
irm https://get.activated.win | iex
# Force activate server (KMS38)
irm https://get.activated.win | iex
# Configure DNS Servers
Get-NetAdapter | Where-Object { $_.Status -eq 'Up' } | ForEach-Object { Set-DnsClientServerAddress -InterfaceIndex $_.InterfaceIndex -ServerAddresses ("192.168.3.25","192.168.3.26") }
# Domain-join the server
Add-Computer BUNNY-LAB.io
# Restart the Server
Restart-Computer
```
## Failover Cluster Configuration
### Configure Cluster SET Networking
!!! note "Disable Embedded Ports"
We want to only use the 10GbE Cluster_SET network for both virtual machines and the virtualization host itself. This ensures that **all** traffic goes through the 10GbE team. Disable all other non-10GbE network adapters.
You will need to start off by configuring a Switch Embedded Teaming (SET) team. This is the backbone that the server will use for all Guest VM traffic as well as remote-desktop access to the server node itself. You will need to rename the network adapters to make management easier.
- Navigate to "Network Connections" then "Change Adapter Options"
* Rename the network adapters with simpler names. e.g. (`Ethernet 1` becomes `Port_1`)
* For the sake of demonstration, assume there are 2 10GbE NICs (`Port_1` and `Port_2`)
``` powershell
# Create Switch Embedded Teaming (SET) team
New-VMSwitch -Name Cluster_SET -NetAdapterName Port_1, Port_2 -EnableEmbeddedTeaming $true
# Disable IPv4 and IPv6 on all other network adapters
Get-NetAdapter | Where-Object { $_.Name -ne "vEthernet (Cluster_SET)" } | ForEach-Object { Set-NetAdapterBinding -Name $_.Name -ComponentID "ms_tcpip" -Enabled $false; Set-NetAdapterBinding -Name $_.Name -ComponentID "ms_tcpip6" -Enabled $false }
# Set IP Address of Cluster_SET for host-access and clustering
New-NetIPAddress -InterfaceAlias "vEthernet (Cluster_SET)" -IPAddress 192.168.3.5 -PrefixLength 24 -DefaultGateway 192.168.3.1
Set-DnsClientServerAddress -InterfaceAlias "vEthernet (Cluster_SET)" -ServerAddresses ("192.168.3.25","192.168.3.26")
```
### Configure iSCSI Initiator to Connect to TrueNAS Core Server
At this point, now that we have verified that the 10GbE NICs can ping their respective iSCSI target server IP addresses, we can add them to the iSCSI Initiator in Server Manager which will allow us to mount the cluster storage for the Hyper-V Failover Cluster.
- Open **Server Manager > MPIO**
* Navigate to the "Discover Multi-Paths" tab
* Check the "Add support for iSCSI devices" checkbox
* Click the "Add" button
- Open **TrueNAS Core Server**
* Navigate to the [TrueNAS Core server](http://192.168.3.3) and add the "Initiator Name" seen on the "Configuration" tab of the iSCSI Initiator on the Virtualization Host to the `Sharing > iSCSI > Initiator Groups` > "iSCSI-Connected Servers"
- Open **iSCSI Initiator**
* Click on the "Discovery" tab
* Click the "Discover Portal" button
* Enter the IP addresses of "192.168.3.3". Leave the port as "3260".
* Example Initiator Name: `iqn.1991-05.com.microsoft:bunny-node-02.bunny-lab.io`
* Click the "Targets" tab to go back to the main page
* Click the "Refresh" button to display available iSCSI Targets
* Click on the first iSCSI Target `iqn.2005-10.org.moon-storage-01.ctl:iscsi-cluster-storage` then click the "Connect" button
* Check the "Enable Multi-Path" checkbox
* Click the "Advanced" button
* Click the "OK" button
* Navigate to "Disk Management" to bring the iSCSI drives "Online" (Dont do anything after this in Disk Management)
## Initialize and Join to Existing Failover-Cluster
### Validate Server is Ready to Join Cluster
Now it is time to set up the Failover Cluster itself so we can join the server to the existing cluster.
- Open **Server Manager**
* Click on the "Tools" dropdown menu
* Click on "Failover Cluster Manager"
* Click the "Validate Configuration" button in the middle of the window that appears
* Click "Next"
* Enter Server Name: `BUNNY-NODE-02.bunny-lab.io`
* Click the "Add" button, then "Next"
* Ensure "Run All Tests (Recommended)" is selected, then click "Next", then click "Next" to start.
### Join Server to Failover Cluster
* On the left-hand side, right-click on the "Failover Cluster Manager" in the tree
* Click on "Connect to Cluster"
* Enter `USAGI-CLUSTER.bunny-lab.io`
* Click "OK"
* Expand "USAGI-CLUSTER.bunny-lab.io" on the left-hand tree
* Right-click on "Nodes"
* Click "Add Node..."
* Click "Next"
* Enter Server Name: `BUNNY-NODE-02.bunny-lab.io`
* Click the "Add" button, then "Next"
* Ensure that "Run Configuration Validation Tests" radio box is checked, then click "Next"
* Validate that the node was successfully added to the Hyper-V Failover Cluster
## Cleanup & Final Touches
Ensure that you run all available Windows Updates before delegating guest VM roles to the new server in the failover cluster. This ensures you are up-to-date before you become reliant on the server for production operations.

View File

@@ -0,0 +1,80 @@
**Purpose**: If you run an environment with multiple Hyper-V: Failover Clusters, for the purpose of Hyper-V: Failover Cluster Replication via a `Hyper-V Replica Broker` role installed on a host within the Failover Cluster, sometimes a GuestVM will fail to replicate itself to the replica cluster, and in those cases, it may not be able to recover on its own. This guide attempts to outline the process to rebuild replication for GuestVMs on a one-by-one basis.
!!! note "Assumptions"
This guide assumes you have two Hyper-V Failover Clusters, for the sake of the guide, we will refer to the Production cluster as `CLUSTER-01` and the Replication cluster as `CLUSTER-02`. This guide also assumes that Replication was set up beforehand, and does not include instructions on how to deploy a Replica Broker (at this time).
## Production Cluster - CLUSTER-01
### Locate the GuestVM
You need to start by locating the GuestVM in the Production cluster, CLUSTER-01. You will know you found the VM if the "Replication Health" is either `Unhealthy`, `Warning`, or `Critical`.
### Remove Replication from GuestVM
- Within a node of the Hyper-V: Failover Cluster Manager
- Right-Click the GuestVM
- Navigate to "**Replication > Remove Replication**"
- Confirm the removal by clicking the "**Yes**" button. You will know if it removed replication when the "Replication State" of the GuestVM is `Not enabled`
## Replication Cluster - CLUSTER-02
### Note the storage GUID of the GuestVM in the replication cluster
- Within a node of the replication cluster's Hyper-V: Failover Cluster Manager
- Right-Click the same GuestVM and click "Manage..." `This will open Hyper-V Manager`
- Right-Click the GuestVM and click "Settings..."
- Navigate to "**ISCSI Controller**"
- Click on one of the Virtual Disks attached to the replica VM, and note the full folder path for later. e.g. `C:\ClusterStorage\Volume1\HYPER-V REPLICA\VIRTUAL HARD DISKS\020C9A30-EB02-41F3-8D8B-3561C4521182`
!!! warning "Noting the GUID of the GuestVM"
You need to note the folder location so you have the GUID. Without the GUID, cleaning up the old storage associated with the GuestVM replica files will be much more difficult / time-consuming. Note it down somewhere safe, and reference it later in this guide.
### Delete the GuestVM from the Replication Cluster
Now that you have noted the GUID of the storage folder of the GuestVM, we can safely move onto removing the GuestVM from the replication cluster.
- Within a node of the replication cluster's Hyper-V: Failover Cluster Manager
- Right-Click the GuestVM
- Navigate to "**Replication > Remove Replication**"
- Confirm the removal by clicking the "**Yes**" button. You will know if it removed replication when the "Replication State" of the GuestVM is `Not enabled`
- Right-Click the GuestVM (again) `You will see that "Enable Replication" is an option now, indicating it was successfully removed.`
!!! note "Replica Checkpoint Merges"
When you removed replication, there may have been replication checkpoints that automatically try to merge together with a `Merge in Progress` status. Just let it finish before moving forward.
- Within the same node of the replication cluster's Hyper-V: Failover Cluster Manager `Switch back from Hyper-V Manager`
- Right-Click the GuestVM and click "**Remove**"
- Confirm the action by clicking the "**Yes**" button
### Delete the GuestVM manually from Hyper-V Manager on all replication cluster hosts
At this point in time, we need to remove the GuestVM from all of the servers in the cluster. Just because we removed it from the Hyper-V: Failover Cluster did not remove it from the cluster's nodes. We can automate part of this work by opening Hyper-V Manager on the same Failover Node we have been working on thus far, and from there we can connect the rest of the replication nodes to the manager to have one place to connect to all of the nodes, avoiding hopping between servers.
- Open Hyper-V Manager
- Right-Click "Hyper-V Manager" on the left-hand navigation menu
- Click "Connect to Server..."
- Type the names of every node in the replication cluster to connect to each of them, repeating the two steps above for every node
- Remove GuestVM from the node it appears on
- On one of the replication cluster nodes, we will see the GuestVM listed, we are going to Right-Click the GuestVM and select "**Delete**"
### Delete the GuestVM's replicated VHDX storage from replication ClusterStorage
Now we need to clean up the storage left behind by the replication cluster.
- Within a node of the replication cluster
- Navigate to `C:\ClusterStorage\Volume1\HYPER-V REPLICA\VIRTUAL HARD DISKS`
- Delete the entire GUID folder noted in the previous steps. `e.g. 020C9A30-EB02-41F3-8D8B-3561C4521182`
## Production Cluster - CLUSTER-01
### Re-Enable Replication on GuestVM in Cluster-01 (Production Cluster)
At this point, we have disabled replication for the GuestVM and cleaned up traces of it in the replication cluster. Now we need to re-enable replication on the GuestVM back in the production cluster.
- Within a node of the production Hyper-V: Failover Cluster Manager
- Right-Click the GuestVM
- Navigate to "**Replication > Enable Replication...**"
- Click "Next"
- For the "**Replica Server**", enter the name of the role of the Hyper-V Replica Broker role in the (replication cluster's) Failover Cluster. `e.g. CLUSTER-02-REPL`, then click "Next"
- Click the "Select Certificate" button, since the Broker was configured with Certificate-based authentication instead of Kerberos (in this example environment). It will prompt you to accept the certificate by clicking "OK". (e.g. `HV Replica Root CA`), then click "Next"
- Make sure every drive you want replicated is checked, then click "Next"
- Replication Frequency: `5 Minutes`, then click "Next"
- Additional Recovery Points: `Maintain only the latest recovery point`, then click "Next"
- Initial Replication Method: `Send initial copy over the network`
- Schedule Initial Replication: `Start replication immediately`
- Click "Next"
- Click "Finish"
!!! success "Replication Enabled"
If everything was successful, you will see a dialog box named "Enable replication for `<GuestVM>`" with a message similar to the following: "Replica virtual machine `<GuestVM>` was successfully created on the specified Replica server `<Node-in-Replication-Cluster>`.
At this point, you can click "Close" to finish the process. Under the GuestVM details, you will see "Replication State": `Initial Replication in Progress`.

View File

@@ -0,0 +1,35 @@
**Purpose**:
You may find that you want to be able to live-migrate guestVMs on a Hyper-V environment that is not clustered as a Hyper-V Failover Cluster, you will have permission issues. One way to work around this is to use CredSSP as the authentication mechanism, which is not ideal but useful in a pinch, or you can use Kerberos-based authentication.
This document will cover both scenarios.
=== "Kerberos Authentication (*Preferred*)"
- Log into a domain controller that both Hyper-V hosts are capable of communicating with
- Open "**Server Manager > Tools " Active Directory Users & Computers**"
- Locate the computer objects representing both of the Hyper-V servers and repeat the steps below for each Hyper-V computer object:
- Right-Click > "**Properties**"
- Click on the "**Delegation**" Tab
- Check the radiobox for the open "**Trust this computer for delegation to specified services only.**"
- Ensure that "**User Kerberos Only** is checked
- Click on the "**Add**" button
- Click the "**Users or Computers...**" button
- Within the object search field, type in the name of the Hyper-V server you want to delegate access to (this will be the opposite host. e.g. VIRT-NODE-02, then repeat these steps later to delegate access for VIRT-NODE-01, etc)
- You will see a list of services that you can allow delegation to, add the following services:
- `cisvc`
- `mcsvc`
- `cifs`
- `Virtual Machine Migration Service`
- `Microsoft Virtualization Console`
- Click the "**Apply**" button, then click the "**OK**" button to finalize these changes.
- Repeat the above steps for the opposite Hyper-V host. This way both hosts are delegated to eachother
- e.g. `VIRT-NODE-01 <---(delegation)---> VIRT-NODE-02`
=== "CredSSP Authentication"
- Log into both Hyper-V Hosts as the same administrative user. Preferrably a domain administrator
- From the Hyper-V host currently running the GuestVM that needs to be migrated, open Hyper-V Manager and right-click > "**Move**" the guestVM.
- Select the destination by providing the fully-qualified domain name of the destination server (or in some cases the shorthand hostname of the destination server)
- It should begin the migration process.
**Note**: Do not perform a "Pull" from source to the destination. You want to always "Push" the VM to its destination. It will generally fail if you try to "Pull" the VM to its destination due to the way that CredSSP works in this context.

View File

@@ -0,0 +1,150 @@
!!! warning "Document Under Construction"
This document is very unfinished and should **NOT** be followed by anyone for deployment at this time.
**Purpose**: Deploying OpenStack via Ansible.
## Required Hardware/Infrastructure Breakdown
Every node in the OpenStack environment (including the deployment node) will be running Rocky Linux 9.5, as OpenStack Ansible only supports CentOS/RHEL/Rocky for its deployment.
| **Hostname** | **IP** | **Storage** | **Memory** | **CPU** | **Network** | **Purpose** |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| OPENSTACK-BOOTSTRAPPER | 192.168.3.46 (eth0) | 32GB (OS) | 4GB | 4-Cores | eth0 | OpenStack Ansible Playbook Deployment Node |
| OPENSTACK-NODE-01 | 192.168.3.43 (eth0) | 250GB (OS), 500GB (Ceph Storage) | 32GB | 16-Cores | eth0, eth1 | OpenStack Cluster/Target Node |
| OPENSTACK-NODE-02 | 192.168.3.44 (eth0) | 250GB (OS), 500GB (Ceph Storage) | 32GB | 16-Cores | eth0, eth1 | OpenStack Cluster/Target Node |
| OPENSTACK-NODE-03 | 192.168.3.45 (eth0) | 250GB (OS), 500GB (Ceph Storage) | 32GB | 16-Cores | eth0, eth1 | OpenStack Cluster/Target Node |
## Configure Hard-Coded DNS for Cluster Nodes
We want to ensure everything works even if the nodes have no internet access. By hardcoding the FQDNs, this protects us against several possible stupid situations.
Run the following script to add the DNS entries.
```sh
# Make yourself root
sudo su
```
!!! note "Run `sudo su` Separately"
When I ran `sudo su` and the echo commands below as one block of commands, it did not correctly write the changes to the `/etc/hosts` file. Just run `sudo su` by itself, then you can copy paste the codeblock below for all of the echo lines for each DNS entry.
```sh
# Add the OpenStack node entries to /etc/hosts
echo "192.168.3.43 OPENSTACK-NODE-01.bunny-lab.io OPENSTACK-NODE-01" >> /etc/hosts
echo "192.168.3.44 OPENSTACK-NODE-02.bunny-lab.io OPENSTACK-NODE-02" >> /etc/hosts
echo "192.168.3.45 OPENSTACK-NODE-03.bunny-lab.io OPENSTACK-NODE-03" >> /etc/hosts
```
### Validate DNS Entries Added
```sh
cat /etc/hosts
```
!!! example "/etc/hosts Example Contents"
When you run `cat /etc/hosts`, you should see output similar to the following:
```ini title="/etc/hosts"
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.3.43 OPENSTACK-NODE-01.bunny-lab.io OPENSTACK-NODE-01
192.168.3.44 OPENSTACK-NODE-02.bunny-lab.io OPENSTACK-NODE-02
192.168.3.45 OPENSTACK-NODE-03.bunny-lab.io OPENSTACK-NODE-03
```
## OpenStack Deployment Node
The "Deployment" node / bootstrapper is responsible for running Ansible playbooks against the cluster nodes that will eventually be running OpenStack. [Original Deployment Node Documentation](https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/deploymenthost.html)
### Install Necessary Software
```sh
sudo su
dnf upgrade
dnf install -y git chrony openssh-server python3-devel sudo
dnf group install -y "Development Tools"
```
### Configure SSH keys
Ansible uses SSH with public key authentication to connect the deployment host and target hosts. Run the following commands to configure this.
!!! warning "Do not run as root"
You want to make sure you run these commands as a normal user. (e.g. `nicole`).
``` sh
# Generate SSH Keys (Private / Public)
ssh-keygen
# Install Public Key on OpenStack Cluster/Target Nodes
ssh-copy-id -i /home/nicole/.ssh/id_rsa.pub nicole@openstack-node-01.bunny-lab.io
ssh-copy-id -i /home/nicole/.ssh/id_rsa.pub nicole@openstack-node-02.bunny-lab.io
ssh-copy-id -i /home/nicole/.ssh/id_rsa.pub nicole@openstack-node-03.bunny-lab.io
# Validate that SSH Authentication Works Successfully on Each Node
ssh nicole@openstack-node-01.bunny-lab.io
ssh nicole@openstack-node-02.bunny-lab.io
ssh nicole@openstack-node-03.bunny-lab.io
```
### Install the source and dependencies
Install the source and dependencies for the deployment host.
```sh
sudo su
git clone -b master https://opendev.org/openstack/openstack-ansible /opt/openstack-ansible
cd /opt/openstack-ansible
bash scripts/bootstrap-ansible.sh
```
### Disable Firewalld
The `firewalld` service is enabled on most CentOS systems by default and its default ruleset prevents OpenStack components from communicating properly. Stop the firewalld service and mask it to prevent it from starting.
```sh
systemctl stop firewalld
systemctl mask firewalld
```
## OpenStack Target Node (1/3)
Now we need to get the cluster/target nodes configured so that OpenStack can be deployed into them via the bootstrapper node later. [Original Target Node Documentation](https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts.html)
### Disable SELinux
SELinux enabled is not currently supported in OpenStack-Ansible for CentOS/RHEL due to a lack of maintainers for the feature.
```sh
sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/sysconfig/selinux
```
### Disable Firewalld
The `firewalld` service is enabled on most CentOS systems by default and its default ruleset prevents OpenStack components from communicating properly. Stop the firewalld service and mask it to prevent it from starting.
```sh
systemctl stop firewalld
systemctl mask firewalld
```
### Install Necessary Software
```sh
dnf upgrade
dnf install -y iputils lsof openssh-server sudo tcpdump python3
```
### Reduce Kernel Logging
Reduce the kernel log level by changing the printk value in your sysctls.
```sh
sudo echo "kernel.printk='4 1 7 4'" >> /etc/sysctl.conf
```
### Configure Local Cinder/Ceph Storage (Optional if using iSCSI)
At this point, we need to configure `/dev/sdb` as the local storage for Cinder.
```sh
pvcreate --metadatasize 2048 /dev/sdb
vgcreate cinder-volumes /dev/sdb
```
!!! failure "`Cannot use /dev/sdb: device is partitioned`"
You may (in rare cases) see the following error when trying to run `pvcreate --metadatasize 2048 /dev/sdb`, if that happens, just use `lsblk` to get the drive of the expected disk. In my example, we want the 500GB disk located at `/dev/sda`, seen in the example below:
```
[root@openstack-node-02 nicole]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 500G 0 disk
sdb 8:16 0 250G 0 disk
├─sdb1 8:17 0 600M 0 part /boot/efi
├─sdb2 8:18 0 1G 0 part /boot
├─sdb3 8:19 0 15.7G 0 part [SWAP]
└─sdb4 8:20 0 232.7G 0 part /
sr0 11:0 1 1024M 0 rom
```
!!! question "End of Current Documentation"
This is the end of where I have currently iterated in my lab and followed-along with the official documentation while generalizing it for my specific lab scenarios. The following link is where I am currently at/stuck and need to revisit at my earliest convenience.
https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts.html#configuring-the-network

View File

@@ -0,0 +1,76 @@
# OpenStack
OpenStack is basically a virtual machine hypervisor that is HA and cluster-friendly. This particular variant is deployed via Canonical's MiniStack environment using SNAP. It will deploy OpenStack onto a single node, which can later be expanded to additional nodes. You can also use something like OpenShift to deploy a Kubernetes Cluster onto OpenStack automatically via its various APIs.
**Reference Documentation**:
- https://discourse.ubuntu.com/t/single-node-guided/35765
- https://microstack.run/docs/single-node-guided
!!! note
This document assumes your bare-metal host server is running Ubuntu 22.04 LTS, has at least 16GB of Memory (**32GB for Multi-Node Deployments**), two network interfaces (one for management, one for remote VM access), 200GB of Disk Space for the root filesystem, another 200GB disk for Ceph distributed storage, and 4 processor cores. See [Single-Node Mode System Requirements](https://ubuntu.com/openstack/install)
!!! note Assumed Networking on the First Cluster Node
- **eth0** = 192.168.3.5
- **eth1** = 192.168.5.200
### Update APT then install upgrades
```
sudo apt update && sudo apt upgrade -y && sudo apt install htop ncdu iptables nano -y
```
!!! tip
At this time, it would be a good idea to take a checkpoint/snapshot of the server (if it is a virtual machine). This gives you a starting point to come back to as you troubleshoot inevitable deployment issues.
### Update SNAP then install OpenStack SNAP
```
sudo snap refresh
sudo snap install openstack --channel 2023.1
```
### Install & Configure Dependencies
Sunbeam can generate a script to ensure that the machine has all of the required dependencies installed and is configured correctly for use in MicroStack.
```
sunbeam prepare-node-script | bash -x && newgrp snap_daemon
sudo reboot
```
### Bootstrapping
Deploy the OpenStack cloud using the cluster bootstrap command.
```
sunbeam cluster bootstrap
```
!!! warning
If you get an "Unable to connect to websocket" error, run `sudo snap restart lxd`.
[Known Bug Report](https://bugs.launchpad.net/snap-openstack/+bug/2033400)
!!! note
Management networks shared by hosts = `192.168.3.0/24`
MetalLB address allocation range (supports multiple ranges, comma separated) (10.20.21.10-10.20.21.20): `192.168.3.50-192.168.3.60`
### Cloud Initialization:
- nicole@moon-stack-01:~$ `sunbeam configure --openrc demo-openrc`
- Local or remote access to VMs [local/remote] (local): `remote`
- CIDR of network to use for external networking (10.20.20.0/24): `192.168.5.0/24`
- IP address of default gateway for external network (192.168.5.1):
- Populate OpenStack cloud with demo user, default images, flavors etc [y/n] (y):
- Username to use for access to OpenStack (demo): `nicole`
- Password to use for access to OpenStack (Vb********): `<PASSWORD>`
- Network range to use for project network (192.168.122.0/24):
- List of nameservers guests should use for DNS resolution (192.168.3.11 192.168.3.10):
- Enable ping and SSH access to instances? [y/n] (y):
- Start of IP allocation range for external network (192.168.5.2): `192.168.5.201`
- End of IP allocation range for external network (192.168.5.254): `192.168.5.251`
- Network type for access to external network [flat/vlan] (flat):
- Free network interface that will be configured for external traffic: `eth1`
- WARNING: Interface eth1 is configured. Any configuration will be lost, are you sure you want to continue? [y/n]: y
### Pull Down / Generate the Dashboard URL
```
sunbeam openrc > admin-openrc
sunbeam dashboard-url
```
### Launch a Test VM:
Verify the cloud by launching a VM called test based on the ubuntu image (Ubuntu 22.04 LTS).
```
sunbeam launch ubuntu --name test
```
!!! note Sample output:
- Launching an OpenStack instance ...
- Access instance with `ssh -i /home/ubuntu/.config/openstack/sunbeam ubuntu@10.20.20.200`

View File

@@ -0,0 +1,116 @@
## Purpose
You may need to deploy many copies of a virtual machine rapidly, and don't want to go through the hassle of setting up everything ad-hoc as the needs arise for each VM workload. Creating a cloud-init template allows you to more rapidly deploy production-ready copies of a template VM (that you create below) into a ProxmoxVE environment.
### Download Image and Import into ProxmoxVE
You will first need to pull down the OS image from Ubuntu's website via CLI, as there is currently no way to do this via the WebUI. Using SSH or the Shell within the WebUI of one of the ProxmoxVE servers, run the following commands to download and import the image into ProxmoxVE.
```sh
# Make a place to keep cloud images
mkdir -p /var/lib/vz/template/images/ubuntu && cd /var/lib/vz/template/images/ubuntu
# Download Ubuntu 24.04 LTS cloud image (amd64, server)
wget -q --show-progress https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
# Create a Placeholder VM to Attach Cloud Image
qm create 9000 --name ubuntu-2404-cloud --memory 8192 --cores 8 --net0 virtio,bridge=vmbr0
# Set UEFI (OVMF) + SCSI controller (Cloud images expect UEFI firmware and SCSI disk.)
qm set 9000 --bios ovmf --scsihw virtio-scsi-pci
qm set 9000 --efidisk0 nfs-cluster-storage:0,pre-enrolled-keys=1
# Import the disk into ProxmoxVE
qm importdisk 9000 noble-server-cloudimg-amd64.img nfs-cluster-storage --format qcow2
# Query ProxmoxVE to find out where the volume was created
pvesm list nfs-cluster-storage | grep 9000
# Attach the disk to the placeholder VM
qm set 9000 --scsi0 nfs-cluster-storage:9000/vm-9000-disk-0.qcow2
# Configure Disk to Boot
qm set 9000 --boot c --bootdisk scsi0
```
### Add Cloud-Init Drive & Configure Template Defaults
Now that the Ubuntu cloud image is attached as the VMs primary disk, you need to attach a Cloud-Init drive. This special drive is where Proxmox writes your user data (username, SSH keys, network settings, etc.) at clone time.
```sh
# Add a Cloud-Init drive to the VM
qm set 9000 --ide2 nfs-cluster-storage:cloudinit
# Enable QEMU Guest Agent
qm set 9000 --agent enabled=1
# Set a default Cloud-Init user (replace 'nicole' with your preferred username)
qm set 9000 --ciuser nicole
# Set a default password (this will be resettable per-clone)
qm set 9000 --cipassword 'SuperSecretPassword'
# Set DNS Servers and Search Domain
qm set 9000 --nameserver "1.1.1.1 1.0.0.1"
qm set 9000 --searchdomain bunny-lab.io
# Enable automatic package upgrades within the VM on first boot
qm set 9000 --ciupgrade 1
# Download your infrastructure public SSH key onto the Proxmox node
wget -O /root/infrastructure_id_rsa.pub \
https://git.bunny-lab.io/Infrastructure/LinuxServer_SSH_PublicKey/raw/branch/main/id_rsa.pub
# Tell Proxmox to inject this key via Cloud-Init
qm set 9000 --sshkey /root/infrastructure_id_rsa.pub
# Configure networking to use DHCP by default (this will be overridden at cloning)
qm set 9000 --ipconfig0 ip=dhcp
```
### Setup Packages in VM & Convert to Template
At this point, we have a few things we need to do first before we can turn the VM into a template and make clones of it. You will need to boot up the VM we made (id 9000) and run the following commands to prepare it for becoming a template:
```sh
# Install Updates
sudo apt update && sudo apt upgrade
sudo apt install -y qemu-guest-agent cloud-init
sudo systemctl enable qemu-guest-agent --now
# Magic Stuff Goes Here =============================
# Convert the placeholder VM into a reusable template (ignore chattr errors on NFS storage backends)
qm template 9000
```
### Clone the Template into a New VM
You can now create new VMs instantly from the template we created above.
=== "Via WebUI"
- Log into the ProxmoxVE node where the template was created
- Right-Click the Template > "**Clone**"
- Give the new VM a name
- Set the "Mode" of the clone to "**Full Clone**"
- Navigate to the new GuestVM in ProxmoxVE and click on the "**Cloud-Init**" tab
- Change the "**User**" and "**Password**" fields if you want to change them
- Double-click on the "**IP Config (net0)**" option
- **IPv4/CIDR**: `192.168.3.67/24`
- **Gateway (IPv4)**: `192.168.3.1`
- Click the "**OK**" button
- Start the VM and wait for it to automatically provision itself
=== "Via CLI"
``` sh
# Create a new VM (example: VM 9100) cloned from the template
qm clone 9000 9100 --name ubuntu-2404-test --full
# Optionally, override Cloud-Init settings for this clone:
qm set 9100 --ciuser nicole --cipassword 'AnotherStrongPass'
qm set 9100 --ipconfig0 ip=192.168.3.67/24,gw=192.168.3.1
# Boot the new cloned VM
qm start 9100
```
### Configure VM Hostname
At this point, the hostname of the VM will be randomized and you will probably want to set it to something statically, you can do that with the following commands after the server has finished starting:
```sh
```

View File

@@ -0,0 +1,7 @@
**Purpose**: The purpose of this document is to outline common tasks that you may need to run in your cluster to perform various tasks.
## Delete Node from Cluster
Sometimes you may need to delete a node from the cluster if you have re-built it or had issues and needed to destroy it. In these instances, you would run the following command (assuming you have a 3-node quorum in your cluster).
```
pvecm delnode promox-node-01
```

View File

@@ -0,0 +1,245 @@
## Purpose
This document describes the **end-to-end procedure** for creating a **thick-provisioned iSCSI-backed shared storage target** on **TrueNAS CORE**, and consuming it from a **Proxmox VE cluster** using **shared LVM**.
This approach is intended to:
- Provide SAN-style block semantics
- Enable Proxmox-native snapshot functionality (LVM volume chains)
- Avoid third-party plugins or middleware
- Be fully reproducible via CLI
## Assumptions
- TrueNAS **CORE** (not SCALE)
- ZFS pool already exists and is healthy
- SSH service is enabled on TrueNAS
- Proxmox VE nodes have network connectivity to TrueNAS
- iSCSI traffic is on a reliable, low-latency network (10GbE recommended)
- All VM workloads are drained from at least one Proxmox node for maintenance
!!! note "Proxmox VE Version Context"
This guide assumes **Proxmox VE 9.1.4 (or later)**. Snapshot-as-volume-chain support on shared LVM (e.g., iSCSI) is available and improved, including enhanced handling of vTPM state in offline snapshots.
!!! warning "Important"
`volblocksize` **cannot be changed after zvol creation**. Choose carefully.
## Target Architecture
```
ZFS Pool
└─ Zvol (Thick / Reserved)
└─ iSCSI Extent
└─ Proxmox LVM PV
└─ Shared VG
└─ VM Disks
```
## Create a Dedicated Zvol for Proxmox
### Variables
Adjust as needed before execution.
```sh
POOL_NAME="CLUSTER-STORAGE"
ZVOL_NAME="iscsi-storage"
ZVOL_SIZE="14T"
VOLBLOCKSIZE="16K"
```
### Create the Zvol (Thick-Provisioned)
```sh
zfs create -V ${ZVOL_SIZE} \
-o volblocksize=${VOLBLOCKSIZE} \
-o compression=lz4 \
-o refreservation=${ZVOL_SIZE} \
${POOL_NAME}/${ZVOL_NAME}
```
!!! note
The `refreservation` enforces **true thick provisioning** and prevents overcommit.
## Configure iSCSI Target (TrueNAS CORE)
This section uses a **hybrid approach**:
- **CLI** is used for ZFS and LUN (extent backing) creation
- **TrueNAS GUI** is used for iSCSI portal, target, and association
- **CLI** is used again for validation
### Enable iSCSI Service
```sh
service ctld start
sysrc ctld_enable=YES
```
### Create the iSCSI LUN Backing (CLI)
This step creates the **actual block-backed LUN** that will be exported via iSCSI.
```sh
# Sanity check: confirm the backing zvol exists
ls -l /dev/zvol/${POOL_NAME}/${ZVOL_NAME}
# Create CTL LUN backed by the zvol
ctladm create -b block \
-o file=/dev/zvol/${POOL_NAME}/${ZVOL_NAME} \
-S ISCSI-STORAGE \
-d ISCSI-STORAGE
```
### Verify the LUN is real and correctly sized
```sh
ctladm devlist -v
```
!!! tip
`Size (Blocks)` must be **non-zero** and match the zvol size. If it is `0`, stop and correct before proceeding.
### Configure iSCSI Portal, Target, and Extent Association (CLI Only)
!!! warning "Do NOT Use the TrueNAS iSCSI GUI"
**Once you choose a CLI-managed iSCSI configuration, the TrueNAS Web UI must never be used for iSCSI.**
Opening or modifying **Sharing → Block Shares (iSCSI)** in the GUI will **overwrite CTL runtime state**, invalidate manual `ctladm` configuration, and result in targets that appear correct but expose **no LUNs** to initiators.
**This configuration is CLI-owned and CLI-managed.**
- Do **not** add, edit, or view iSCSI objects in the GUI
- Do **not** use the iSCSI wizard
- Do **not** mix GUI extents with CLI-created LUNs
#### Create iSCSI Portal (Listen on All Interfaces)
```sh
# Backup any existing ctl.conf
cp -av /etc/ctl.conf /etc/ctl.conf.$(date +%Y%m%d-%H%M%S).bak 2>/dev/null || true
# Write a clean /etc/ctl.conf
cat > /etc/ctl.conf <<'EOF'
# --- Bunny Lab: Proxmox iSCSI (CLI-only) ---
auth-group "no-auth" {
auth-type none
initiator-name "iqn.1993-08.org.debian:01:5b963dd51f93" # cluster-node-01 ("cat /etc/iscsi/initiatorname.iscsi")
initiator-name "iqn.1993-08.org.debian:01:1b4df0fa3540" # cluster-node-02 ("cat /etc/iscsi/initiatorname.iscsi")
initiator-name "iqn.1993-08.org.debian:01:5669aa2d89a2" # cluster-node-03 ("cat /etc/iscsi/initiatorname.iscsi")
}
# Listen on all interfaces on the default iSCSI port
portal-group "pg0" {
listen 0.0.0.0:3260
discovery-auth-group "no-auth"
}
# Create a target IQN
target "iqn.2026-01.io.bunny-lab:storage" {
portal-group "pg0"
auth-group "no-auth"
# Export LUN 0 backed by the zvol device
lun 0 {
path /dev/zvol/CLUSTER-STORAGE/iscsi-storage
serial "ISCSI-STORAGE"
device-id "ISCSI-STORAGE"
}
}
EOF
# Restart ctld to apply the configuration file
service ctld restart
# Verify the iSCSI listener is actually up
sockstat -4l | grep ':3260'
# Verify CTL now shows an iSCSI frontend
ctladm portlist -v | egrep -i '(^Port|iscsi|listen=)'
```
!!! success
At this point, the iSCSI target is live and correctly exposing a block device to initiators. You may now proceed to **Connect from ProxmoxVE Nodes** section.
## Connect from ProxmoxVE Nodes
Perform the following **on each Proxmox node**.
```sh
# Install iSCSI Utilities
apt update
apt install -y open-iscsi lvm2
# Discover Target
iscsiadm -m discovery -t sendtargets -p <TRUENAS_IP>
# Log In
iscsiadm -m node --login
# Rescan SCSI Bus
iscsiadm -m session -P 3
### Verify Device
# If everything works successfully, you should see something like "sdi 8:128 0 8T 0 disk".
lsblk
```
## Create Shared LVM (Execute on One Node Only)
!!! warning "Important"
**Only run LVM creation on ONE node**. All other nodes will only scan.
```sh
# Initialize Physical Volume
pvcreate /dev/sdX
# Create Volume Group
vgcreate vg_proxmox_iscsi /dev/sdX
```
## Register Storage in Proxmox
### Rescan LVM (Other Nodes)
```sh
pvscan
vgscan
```
### Add Storage (GUI)
**Datacenter → Storage → Add → LVM**
- ID: `iscsi-cluster-lvm`
- Volume Group: `vg_proxmox_iscsi`
- Content: `Disk image, Container`
- Shared: ✔️
- Allow Snapshots as Volume-Chain: ✔️
## Validation
- Snapshot create / revert / delete
- Live migration between nodes
- PBS backup and restore test
!!! success
If all validation tests pass, the storage is production-ready.
## Expanding iSCSI Storage (No Downtime)
If you need to expand the storage space of the newly-created iSCSI LUN, you can run the ZFS commands seen below on the TrueNAS Core server. The first command increases the size, the second command pre-allocated the space (thick-provisioned).
!!! warning "ProxmoxVE Cluster-specific Notes"
- `pvresize` must be executed on **exactly one** ProxmoxVE node.
- All other nodes should only perform `pvscan` / `vgscan` after the resize.
- Running `pvresize` on multiple nodes can corrupt shared LVM metadata.
```sh
# Expand Zvol (TrueNAS)
zfs set volsize=16T CLUSTER-STORAGE/iscsi-storage
zfs set refreservation=16T CLUSTER-STORAGE/iscsi-storage
service ctld restart
# Rescan the block device on all ProxmoxVE nodes
echo 1 > /sys/class/block/sdX/device/rescan
# Verify on all nodes that the new size is displayed
lsblk /dev/sdX
# Run this on only ONE of the ProxmoxVE nodes.
pvresize /dev/sdX
# Rescan on the other nodes that you did not run the pvresize command on. They will now see the expanded free space.
pvscan
vgscan
```

View File

@@ -0,0 +1,15 @@
## Purpose
Sometimes in some very specific situations, you will find that an LVM / VG just won't come online in ProxmoxVE. If this happens, you can run the following commands (and replace the placeholder location) to manually bring the storage online.
```sh
lvchange -an local-vm-storage/local-vm-storage
lvchange -an local-vm-storage/local-vm-storage_tmeta
lvchange -an local-vm-storage/local-vm-storage_tdata
vgchange -ay local-vm-storage
```
!!! info "Be Patient"
It can take some time for everything to come online.
!!! success
If you see something like this: `6 logical volume(s) in volume group "local-vm-storage" now active`, then you successfully brought the volume online.

View File

@@ -0,0 +1,38 @@
## Purpose
There are a few steps you have to take when upgrading ProxmoxVE from 8.4.1+ to 9.0+. The process is fairly straightforward, so just follow the instructions seen below.
!!! info "GuestVM Assumptions"
It is assumed that if you are running a ProxmoxVE cluster, you will migrate all GuestVMs to another cluster node. If this is a standalone ProxmoxVE server, you will shut down all GuestVMs safely before proceeding.
!!! warning "Perform `pve8to9` Readiness Check"
It's critical that you run the `pve8to9` command to ensure that your ProxmoxVE server meets all of the requirements and doesn't have any failures or potentially server-breaking warnings. If the `pve8to9` command is unknown, then run `apt update && apt dist-upgrade` in the shell then try again. Warnings should be addressed ad-hoc, but *CPU Microcode warnings can be safely ignored*.
**Example pve8to9 Summary Output**:
```sh
= SUMMARY =
TOTAL: 48
PASSED: 39
SKIPPED: 8
WARNINGS: 1
FAILURES: 0
```
### Update Repositories from `bookworm` to `trixie`
```sh
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list.d/pve-install-repo.list
apt update
```
### Upgrade to ProxmoxVE 9.0
!!! warning "Run Upgrade Commands in iLO/iDRAC/IPMI"
At this point, its very likely that if you are using SSH, it may unexpectedly have the session terminated, so you absolutely want to use a local or remote console to the server to run the commands below, both to ensure you maintain access to the console, as well as to see if any issues arise during POST after the reboot.
```sh
apt dist-upgrade -y
reboot
```
!!! note "Disable `pve-enterprise` Repository"
At this point, the ProxmoxVE server should be running on v9.0+, you will want to disable the `pve-enterprise` repository as it will goof up future updates if you don't disable it.

View File

@@ -0,0 +1,152 @@
## Initial Installation / Configuration
Proxmox Virtual Environment is an open source server virtualization management solution based on QEMU/KVM and LXC. You can manage virtual machines, containers, highly available clusters, storage and networks with an integrated, easy-to-use web interface or via CLI.
!!! note
This document assumes you have a storage server that hosts both ISO files via CIFS/SMB share, and has the ability to set up an iSCSI LUN (VM & Container storage). This document assumes that you are using a TrueNAS Core server to host both of these services.
### Create the first Node
You will need to download the [Proxmox VE 8.1 ISO Installer](https://www.proxmox.com/en/downloads) from the Official Proxmox Website. Once it is downloaded, you can use [Balena Etcher](https://etcher.balena.io/#download-etcher) or [Rufus](https://rufus.ie/en/) to deploy Proxmox onto a server.
!!! warning
If you are virtualizing Proxmox under a Hyper-V environment, you will need to follow the [Official Documentation](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/enable-nested-virtualization) to ensure that nested virtualization is enabled. An example is listed below:
```
Set-VMProcessor -VMName <VMName> -ExposeVirtualizationExtensions $true # (1)
Get-VMNetworkAdapter -VMName <VMName> | Set-VMNetworkAdapter -MacAddressSpoofing On # (2)
```
1. This tells Hyper-V to allow the GuestVM to behave as a hypervisor, nested under Hyper-V, allowing the virtualization functionality of the Hypervisor's CPU to be passed-through to the GuestVM.
2. This tells Hyper-V to allow your GuestVM to have multiple nested virtual machines with their own independant MAC addresses. This is useful when using nested Virtual Machines, but is also a requirement when you set up a [Docker Network](../../../networking/docker-networking/docker-networking.md) leveraging MACVLAN technology.
### Networking
You will need to set a static IP address, in this case, it will be an address within the 20GbE network. You will be prompted to enter these during the ProxmoxVE installation. Be sure to set the hostname to something that matches the following FQDN: `proxmox-node-01.MOONGATE.local`.
| Hostname | IP Address | Subnet Mask | Gateway | DNS Server | iSCSI Portal IP |
| --------------- | --------------- | ------------------- | ------- | ---------- | ----------------- |
| proxmox-node-01 | 192.168.101.200 | 255.255.255.0 (/24) | None | 1.1.1.1 | 192.168.101.100 |
| proxmox-node-01 | 192.168.103.200 | 255.255.255.0 (/24) | None | 1.1.1.1 | 192.168.103.100 |
| proxmox-node-02 | 192.168.102.200 | 255.255.255.0 (/24) | None | 1.1.1.1 | 192.168.102.100 |
| proxmox-node-02 | 192.168.104.200 | 255.255.255.0 (/24) | None | 1.1.1.1 | 192.168.104.100 |
### iSCSI Initator Configuration
You will need to add the iSCSI initiator from the proxmox node to the allowed initiator list in TrueNAS Core under "**Sharing > Block Shares (iSCSI) > Initiators Groups**"
In this instance, we will reference Group ID: `2`. We need to add the iniator to the "**Allowed Initiators (IQN)**" section. This also includes the following networks that are allowed to connect to the iSCSI portal:
- `192.168.101.0/24`
- `192.168.102.0/24`
- `192.168.103.0/24`
- `192.168.104.0/24`
To get the iSCSI Initiator IQN of the current Proxmox node, you need to navigate to the Proxmox server's webUI, typically located at `https://<IP>:8006` then log in with username `root` and whatever you set the password to during initial setup when the ISO image was mounted earlier.
- On the left-hand side, click on the name of the server node (e.g. `proxmox-node-01` or `proxmox-node-02`)
- Click on "**Shell**" to open a CLI to the server
- Run the following command to get the iSCSI Initiator (IQN) name to give to TrueNAS Core for the previously-mentioned steps:
``` sh
cat /etc/iscsi/initiatorname.iscsi | grep "InitiatorName=" | sed 's/InitiatorName=//'
```
!!! example
Output of this command will look something like `iqn.1993-08.org.debian:01:b16b0ff1778`.
## Disable Enterprise Subscription functionality
You will likely not be paying for / using the enterprise subscription, so we are going to disable that functionality and enable unstable builds. The unstable builds are surprisingly stable, and should not cause you any issues.
Add Unstable Update Repository:
```jsx title="/etc/apt/sources.list"
# Add to the end of the file
# Non-Production / Unstable Updates
deb https://download.proxmox.com/debian bookworm pve-no-subscription
```
!!! warning
Please note the reference to `bookworm` in both the sections above and below this notice, this may be different depending on the version of ProxmoxVE you are deploying. Please reference the version indicated by the rest of the entries in the sources.list file to know which one to use in the added line section.
Comment-Out Enterprise Repository:
```jsx title="/etc/apt/sources.list.d/pve-enterprise.list"
# deb https://enterprise.proxmox.com/debian/pve bookworm pve-enterprise
```
Pull / Install Available Updates:
``` sh
apt-get update
apt dist-upgrade
reboot
```
## NIC Teaming
You will need to set up NIC teaming to configure a LACP LAGG. This will add redundancy and a way for devices outside of the 20GbE backplane to interact with the server.
- Ensure that all of the network interfaces appear as something similar to the following:
```jsx title="/etc/network/interfaces"
iface eno1 inet manual
iface eno2 inet manual
# etc
```
- Adjust the network interfaces to add a bond:
```jsx title="/etc/network/interfaces"
auto eno1
iface eno1 inet manual
auto eno2
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 192.168.0.11/24
gateway 192.168.0.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
# bridge-vlan-aware yes # I do not use VLANs
# bridge-vids 2-4094 # I do not use VLANs (This could be set to any VLANs you want it a member of)
```
!!! warning
Be sure to include both interfaces for the (Dual-Port) 10GbE connections in the network configuration. Final example document will be updated at a later point in time once the production server is operational.
- Reboot the server again to make the networking changes take effect fully. Use iLO / iDRAC / IPMI if you have that functionality on your server in case your configuration goes errant and needs manual intervention / troubleshooting to re-gain SSH control of the proxmox server.
## Generalizing VMs for Cloning / Templating:
These are the commands I run after cloning a Linux machine so that it resets all information for the machine it was cloned from.
!!! note
If you use cloud-init-aware OS images as described under Cloud-Init Support on https://pve.proxmox.com/pve-docs/chapter-qm.html, these steps wont be necessary!
```jsx title="Change Hostname"
sudo nano /etc/hostname
```
```jsx title="Change Hosts File"
sudo nano /etc/hosts
```
```jsx title="Reset the Machine ID"
rm -f /etc/machine-id /var/lib/dbus/machine-id
dbus-uuidgen --ensure=/etc/machine-id
dbus-uuidgen --ensure
```
```jsx title="Regenerate SSH Keys"
rm -f /etc/machine-id /var/lib/dbus/machine-id
dbus-uuidgen --ensure=/etc/machine-id
dbus-uuidgen --ensure
```
```jsx title="Reboot the Server to Apply Changes"
reboot
```
## Configure Alerting
Setting up alerts in Proxmox is important and critical to making sure you are notified if something goes wrong with your servers.
https://technotim.live/posts/proxmox-alerts/

View File

@@ -0,0 +1,116 @@
**Purpose**: There is a way to incorporate ProxmoxVE and TrueNAS more deeply using SSH, simplifying the deployment of virtual disks/volumes passed into GuestVMs in ProxmoxVE. Using ZFS over iSCSI will give you the following non-exhaustive list of benefits:
- Automatically make Zvols in a ZFS Storage Pool
- Automatically bind device-based iSCSI Extents/LUNs to the Zvols
- Allow TrueNAS to handle VM snapshots directly
- Simplify the filesystem overhead of using TrueNAS and iSCSI with ProxmoxVE
- Ability to take snapshots of GuestVMs
- Ability to perform live-migrations of GuestVMs between ProxmoxVE cluster nodes
!!! note "Environment Assumptions"
This document assumes you are running at least 2 ProxmoxVE nodes. For the sake of the example, it will assume they are named `proxmox-node-01` and `proxmox-node-02`. We will also assume you are using TrueNAS Core. TrueNAS SCALE (should work) in the same way, but there may be minor operational / setup differences between the two different deployments of TrueNAS.
Secondly, this guide assumes the ProxmoxVE cluster nodes and TrueNAS server exist on the same network `192.168.101.0/24`.
## ZFS over iSCSI Operational Flow
``` mermaid
sequenceDiagram
participant ProxmoxVE as ProxmoxVE Cluster
participant TrueNAS as TrueNAS Core (inc. iSCSI & ZFS Storage)
ProxmoxVE->>TrueNAS: Cluster VM node connects via SSH to create ZVol for VM
TrueNAS->>TrueNAS: Create ZVol in ZFS storage pool
TrueNAS->>TrueNAS: Bind ZVol to iSCSI LUN
ProxmoxVE->>TrueNAS: Connect to iSCSI & attach ZVol as VM storage
ProxmoxVE->>TrueNAS: (On-Demand) Connect via SSH to create VM snapshot of ZVol
TrueNAS->>TrueNAS: Create Snapshot of ZVol/VM
```
## All ZFS Storage Nodes / TrueNAS Servers
### Configure SSH Key Exchange
You first need to make some changes to the SSHD configuration of the ZFS server(s) storing data for your cluster. This is fairly straight-forward and only needs two lines adjusted. This is based on the [Proxmox ZFS over ISCSI](https://pve.proxmox.com/wiki/Legacy:_ZFS_over_iSCSI) documentation. Be sure to restart the SSH service or reboot the storage server after making the changes below before proceeding onto the next steps.
=== "OpenSSH-based OS"
```jsx title="/etc/ssh/sshd_config"
UseDNS no
GSSAPIAuthentication no
```
=== "Solaris-based OS"
```jsx title="/etc/ssh/sshd_config"
LookupClientHostnames no
VerifyReverseMapping no
GSSAPIAuthentication no
```
## All ProxmoxVE Cluster Nodes
### Configure SSH Key Exchange
The first step is creating SSH trust between the ProxmoxVE cluster nodes and the TrueNAS storage appliance. You will leverage the ProxmoxVE `shell` on every node of the cluster to run the following commands.
**Note**: I will be writing the SSH configuration with the name `192.168.101.100` for simplicity so I know what server the identity belongs to. You could also name it something else like `storage.bunny-lab.io_id_rsa`.
``` sh
mkdir /etc/pve/priv/zfs
ssh-keygen -f /etc/pve/priv/zfs/192.168.101.100_id_rsa # (1)
ssh-copy-id -i /etc/pve/priv/zfs/192.168.101.100_id_rsa.pub root@192.168.101.100 # (2)
ssh -i /etc/pve/priv/zfs/192.168.101.100_id_rsa root@192.168.101.100 # (3)
```
1. Do not set a password. It will break the automatic functionality.
2. Send the SSH key to the TrueNAS server.
3. Connect to the TrueNAS server at least once to finish establishing the connection.
### Install & Configure Storage Provider
Now you need to set up the storage provider in TrueNAS. You will run the commands below within a ProxmoxVE shell, then when finished, log out of the ProxmoxVE WebUI, clear the browser cache for ProxmoxVE, then log back in. This will have added a new storage provider called `FreeNAS-API` under the `ZFS over iSCSI` storage type.
``` sh
keyring_location=/usr/share/keyrings/ksatechnologies-truenas-proxmox-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/ksatechnologies/truenas-proxmox/gpg.284C106104A8CE6D.key' | gpg --dearmor >> ${keyring_location}
#################################################################
cat << EOF > /etc/apt/sources.list.d/ksatechnologies-repo.list
# Source: KSATechnologies
# Site: https://cloudsmith.io
# Repository: KSATechnologies / truenas-proxmox
# Description: TrueNAS plugin for Proxmox VE - Production
deb [signed-by=${keyring_location}] https://dl.cloudsmith.io/public/ksatechnologies/truenas-proxmox/deb/debian any-version main
EOF
#################################################################
apt update
apt install freenas-proxmox
apt full-upgrade
systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd
```
## Primary ProxmoxVE Cluster Node
From this point, we are ready to add the shared storage provider to the cluster via the primary node in the cluster. This is not strictly required, just simplifies the documentation.
Navigate to **"Datacenter (BUNNY-CLUSTER) > Storage > Add > ZFS over iSCSI"**
| **Field** | **Value** | **Additional Notes** |
| :--- | :--- | :--- |
| ID | `bunny-zfs-over-iscsi` | Friendly Name |
| Portal | `192.168.101.100` | IP Address of iSCSI Portal |
| Pool | `PROXMOX-ZFS-STORAGE` | This is the ZFS Storage Pool you will use to store GuestVM Disks |
| ZFS Block Size | `4k` | |
| Target | `iqn.2005-10.org.moon-storage-01.ctl:proxmox-zfs-storage` | The iSCSI Target |
| Target Group | `<Leave Blank>` | |
| Enable | `<Checked>` | |
| iSCSI Provider | `FreeNAS-API` | |
| Thin-Provision | `<Checked>` | |
| Write Cache | `<Checked>` | |
| API use SSL | `<Unchecked>` | Disabled unless you have SSL Enabled on TrueNAS |
| API Username | `root` | This is the account that is allowed to make ZFS zvols / datasets |
| API IPv4 Host | `192.168.101.100` | iSCSI Portal Address |
| API Password | `<Root Password of TrueNAS Box>` | |
| Nodes | `proxmox-node-01,proxmox-node-02` | All ProxmoxVE Cluster Nodes |
!!! success "Storage is Provisioned"
At this point, the storage should propagate throughout the ProxmoxVE cluster, and appear as a location to deploy virtual machines and/or containers. You can now use this storage for snapshots and live-migrations between ProxmoxVE cluster nodes as well.

View File

@@ -0,0 +1,51 @@
**Purpose**: Rancher Harvester is an awesome tool that acts like a self-hosted cloud VDI provider, similar to AWS, Linode, and other online cloud compute platforms. In most scenarios, you will deploy "Rancher" in addition to Harvester to orchestrate the deployment, management, and rolling upgrades of a Kubernetes Cluster. You can also just run standalone Virtual Machines, similar to Hyper-V, RHEV, oVirt, Bhyve, XenServer, XCP-NG, and VMware ESXi.
:::note Prerequisites
This document assumes your bare-metal host has at least 32GB of Memory, 200GB of Disk Space, and 8 processor cores. See [Recommended System Requirements](https://docs.harvesterhci.io/v1.1/install/requirements)
:::
## First Harvester Node
### Download Installer ISO
You will need to navigate to the Rancher Harvester GitHub to download the [latest ISO release of Harvester](https://releases.rancher.com/harvester/v1.1.2/harvester-v1.1.2-amd64.iso), currently **v1.1.2**. Then image it onto a USB flashdrive using a tool like [Rufus](https://github.com/pbatard/rufus/releases/download/v4.2/rufus-4.2p.exe). Proceed to boot the bare-metal server from the USB drive to begin the Harvester installation process.
### Begin Setup Process
You will be waiting a few minutes while the server boots from the USB drive, but you will eventually land on a page where it asks you to set up various values to use for networking and the cluster itself.
The values seen below are examples and represent how my homelab is configured.
- **Management Interface(s)**: `eno1,eno2,eno3,eno4`
- **Network Bond Mode**: `Active-Backup`
- **IP Address**: `192.168.3.254/24` *<---- **Note:** Be sure to add CIDR Notation*.
- **Gateway**: `192.168.3.1`
- **DNS Server(s)**: `1.1.1.1,1.0.0.1,8.8.8.8,8.8.4.4`
- **Cluster VIP (Virtual IP)**: `192.168.3.251` *<---- **Note**: See "VIRTUAL IP CONFIGURATION" note below.*
- **Cluster Node Token**: `19-USED-when-JOINING-more-NODES-to-EXISTING-cluster-55`
- **NTP Server(s)**: `0.suse.pool.ntp.org`
:::caution Virtual IP Configuration
The VIP assigned to the first node in the cluster will act as a proxy to the built-in load-balancing system. It is important that you do not create a second node with the same VIP (Could cause instability in existing cluster), or use an existing VIP as the Node IP address of a new Harvester Cluster Node.
:::
:::tip
Based on your preference, it would be good to assign the device a static DHCP reservation, or use numbers counting down from **.254** (e.g. `192.168.3.254`, `192.168.3.253`, `192.168.3.252`, etc...)
:::
### Wait for Installation to Complete
The installation process will take quite some time, but when it is finished, the Harvester Node will reboot and take you to a splash screen with the Harvester logo, with indicators as to what the VIP and Management Interface IPs are configured as, and whether or not the associated systems are operational and ready. **Be patient until both statuses say `READY`**. If after 15 minutes the status has still not changed to `READY` both for fields, see the note below.
:::caution Issues with `rancher-harvester-repo` Image
During my initial deployment efforts with Harvester v.1.1.2, I noticed that the Harvester Node never came online. That was because something bugged-out during installation and the `rancher-harvester-repo` image was not properly installed prior to node initialization. This will effectively soft-lock the node unless you reinstall the node from scratch, as the Docker Hub Registry that Harvester is looking for to finish the deployment does not exist anymore and depends on the local image bundled with the installer ISO.
If this happens, you unfortunately need to start over and reinstall Harvester and hope that it works the second time around. No other workarounds are currently known at this time on version 1.1.2.
:::
## Additional Harvester Nodes
If you work in a production environment, you will want more than one Harvester node to allow live-migrations, high-availability, and better load-balancing in the Harvester Cluster. The section below will outline the steps necessary to create additional Harvester nodes, join them to the existing Harvester cluster, and validate that they are functioning without issues.
### Installation Process
Not Documented Yet
### Joining Node to Existing Cluster
Not Documented Yet
## Installing Rancher
If you plan on using Harvester for more than just running Virtual Machines (e.g. Containers), you will want to deploy Rancher inside of the Harvester Cluster in order or orchestrate the deployment, management, and rolling upgrades of various forms of Kubernetes Clusters (RKE2 Suggested). The steps below will go over the process of deploying a High-Availability Rancher environment to "adopt" Harvester as a VDI/compute platform for deploying the Kubernetes Cluster.
### Provision ControlPlane Node(s) VMs on Harvester
Not Documented Yet
### Adopt Harvester as Cluster Target
Not Documented Yet
### Deploy Production Kubernetes Cluster to Harvester
Not Documented Yet