docs/Ansible AWX Operator.md at d4c2ee5ce2628137d8aa61f9f22c77c2c7bb29e6

Files

Nicole Rappe d4c2ee5ce2 Update Docker & Kubernetes/Servers/AWX/AWX Operator/Ansible AWX Operator.md

2024-08-01 03:04:00 -06:00

15 KiB

Raw Blame History

Purpose: Deploying a Rancher RKE2 Cluster-based Ansible AWX Operator server. This can scale to a larger more enterprise environment if needed.

!!! note Prerequisites This document assumes you are running Ubuntu Server 22.04 or later with at least 16GB of memory, 8 CPU cores, and 64GB of storage.

Deploy Rancher RKE2 Cluster

You will need to deploy a Rancher RKE2 Cluster on an Ubuntu Server-based virtual machine. After this phase, you can focus on the Ansible AWX-specific deployment. A single ControlPlane node is all you need to set up AWX, additional infrastructure can be added after-the-fact.

!!! tip If this is a virtual machine, after deploying the RKE2 cluster and validating it functions, now would be the best time to take a checkpoint / snapshot of the VM before moving forward, in case you need to perform rollbacks of the server(s) if you accidentally misconfigure something during deployment.

Server Configuration

The AWX deployment will consist of 3 yaml files that configure the containers for AWX as well as the NGINX ingress networking-side of things. You will need all of them in the same folder for the deployment to be successful. For the purpose of this example, we will put all of them into a folder located at /awx.

# Make the deployment folder
mkdir -p /awx
cd /awx

We need to increase filesystem access limits: Temporarily Set the Limits Now:

sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

Permanently Set the Limits for Later:

# <End of File>
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

Apply the Settings:

sudo sysctl -p

Create AWX Deployment Donfiguration Files

You will need to create these files all in the same directory using the content of the examples below. Be sure to replace values such as the spec.host=awx.bunny-lab.io in the awx-ingress.yml file to a hostname you can point a DNS server / record to.

=== "awx.yml"

```jsx title="/awx/awx.yml"
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
spec:
  service_type: ClusterIP
```

=== "ingress.yml"

```jsx title="/awx/ingress.yml"
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
spec:
  rules:
  - host: awx.bunny-lab.io
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: awx-service
            port:
              number: 80
```

=== "kustomization.yml"

```jsx title="/awx/kustomization.yml"
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - github.com/ansible/awx-operator/config/default?ref=2.10.0
  - awx.yml
  - ingress.yml
images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.10.0
namespace: awx
```

Ensure the Kubernetes Cluster is Ready

Check that the status of the cluster is ready by running the following commands, it should appear similar to the Rancher RKE2 Example:

export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get pods --all-namespaces

Deploy AWX using Kustomize

Now it is time to tell Kubernetes to read the configuration files using Kustomize (built-in to newer versions of Kubernetes) to deploy AWX into the cluster. !!! warning "Be Patient" The AWX deployment process can take a while. Use the commands in the Troubleshooting section if you want to track the progress after running the commands below.

If you get an error that looks like the below, re-run the `kubectl apply -k .` command a second time after waiting about 10 seconds.  The second time the error should be gone.
``` sh
error: resource mapping not found for name: "awx" namespace: "awx" from ".": no matches for kind "AWX" in version "awx.ansible.com/v1beta1"
ensure CRDs are installed first
```

To check on the progress of the deployment, you can run the following command: `kubectl get pods -n awx`
You will know that AWX is ready to be accessed in the next step if the output looks like below:
```
NAME                                               READY   STATUS    RESTARTS        AGE
awx-operator-controller-manager-7b9ccf9d4d-cnwhc   2/2     Running   2 (3m41s ago)   9m41s
awx-postgres-13-0                                  1/1     Running   0               6m12s
awx-task-7b5f8cf98c-rhrpd                          4/4     Running   0               4m46s
awx-web-6dbd7df9f7-kn8k2                           3/3     Running   0               93s
```

cd /awx
kubectl apply -k .

!!! warning "Be Patient - Wait 20 Minutes" The process may take a while to spin up AWX, postgresql, redis, and other workloads necessary for AWX to function. Depending on the speed of the server, it may take between 5 and 20 minutes for AWX to be ready to connect to. You can watch the progress via the CLI commands listed above, or directly on Rancher's WebUI at https://rancher.bunny-lab.io.

Access the AWX WebUI behind Ingress Controller

After you have deployed AWX into the cluster, it will not be immediately accessible to the host's network (such as your personal computer) unless you set up a DNS record pointing to it. In the example above, you would have an A or CNAME DNS record pointing to the internal IP address of the Rancher RKE2 Cluster host.

The RKE2 Cluster will translate awx.bunny-lab.io to the AWX web-service container(s) automatically. SSL certificates are not covered in this documentation, but suffice to say, the can be configured on another reverse proxy such as Traefik or via Cert-Manager / JetStack. The process of setting this up goes outside the scope of this document.

!!! success "Accessing the AWX WebUI" If you have gotten this far, you should now be able to access AWX via the WebUI and log in.

- AWX WebUI: https://awx.bunny-lab.io
![Ansible AWX WebUI](awx.png)
You may see a prompt about "AWX is currently upgrading.  This page will refresh when complete".  Be patient, let it finish.  When it's done, it will take you to a login page.
AWX will generate its own secure password the first time you set up AWX.  Username is `admin`. You can run the following command to retrieve the password:
```
kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode ; echo
```

Change Admin Password

You will want to change the admin password straight-away. Use the following navigation structure to find where to change the password:

graph LR
    A[AWX Dashboard] --> B[Access]
    B --> C[Users]
    C --> D[admin]
    D --> E[Edit]

Upgrading from 2.10.0 to 2.19.1

There is a known issue with upgrading / install AWX Operator beyond version 2.10.0, because of how the PostgreSQL database upgrades from 13.0 to 15.0, and has changed permissions. The following workflow will help get past that and adjust the permissions in such a way that allows the upgrade to proceed successfully. If this is a clean installation, you can also perform this step if the fresh install of 2.19.1 is not working yet. (It wont work out of the box because of this bug).

Create a Temporary Pod to Adjust Permissions

We need to create a pod that will mount the PostgreSQL PVC, make changes to permissions, then destroy the v15.0 pod to have the AWX Operator automatically regenerate it.

apiVersion: v1
kind: Pod
metadata:
  name: temp-pod
  namespace: awx
spec:
  containers:
  - name: temp-container
    image: busybox
    command: ['sh', '-c', 'sleep 3600']
    volumeMounts:
    - mountPath: /var/lib/pgsql/data
      name: postgres-data
  volumes:
  - name: postgres-data
    persistentVolumeClaim:
      claimName: postgres-15-awx-postgres-15-0
  restartPolicy: Never

# Deploy Temporary Pod
kubectl apply -f /awx/temp-pod.yaml

# Open a Shell in the Temporary Pod
kubectl exec -it temp-pod -n awx -- sh

# Adjust Permissions of the PostgreSQL 15.0 Database Folder
chown -R 26:root /var/lib/pgsql/data
exit

# Delete the Temporary Pod
kubectl delete pod temp-pod -n awx

# Delete the Crashlooped PostgreSQL 15.0 Pod to Regenerate It
kubectl delete pod awx-postgres-15-0 -n awx

# Track the Migration
kubectl get pods -n awx
kubectl logs -n awx awx-postgres-15-0

Troubleshooting

You may wish to want to track the deployment process to verify that it is actually doing something. There are a few Kubernetes commands that can assist with this listed below.

AWX-Manager Deployment Logs

You may want to track the internal logs of the awx-manager container which is responsible for the majority of the automated deployment of AWX. You can do so by running the command below.

kubectl logs -n awx awx-operator-controller-manager-6c58d59d97-qj2n2 -c awx-manager

!!! note The -6c58d59d97-qj2n2 noted at the end of the Kubernetes "Pod" mentioned in the command above is randomized. You will need to change it based on the name shown when running the kubectl get pods -n awx command.

Kerberos Implementation

You may find that you need to be able to remotely control domain-joined Windows devices using Kerberos. You need to go through some extra steps to set this up after you have successfully deployed AWX Operator into Kubernetes.

Configure Windows Devices

You will need to prepare the Windows devices to allow them to be remotely controlled by Ansible playbooks. Run the following powershell script on all of the devices that will be managed by the Ansible AWX environment. WinRM Prerequisite Setup Script

Create Kerberos Keytab File

Add the following file to the /awx folder on the AWX Operator server.

[libdefaults]
    default_realm = BUNNY-LAB.IO
    dns_lookup_realm = false
    dns_lookup_kdc = false

[realms]
    BUNNY-LAB.IO = {
        kdc = 192.168.3.25
        kdc = 192.168.3.26
        admin_server = 192.168.3.25
    }

[domain_realm]
    192.168.3.25 = BUNNY-LAB.IO
    192.168.3.26 = BUNNY-LAB.IO
    .bunny-lab.io = BUNNY-LAB.IO
    bunny-lab.io = BUNNY-LAB.IO

Convert Keytab File into ConfigMap

Run the following command to apply the Kerberos Keytab file as a configmap into the Kubernetes cluster that we will later use AWX to make a custom Execution Environment with.

kubectl -n awx create configmap awx-kerberos-config --from-file=/awx/krb5.conf

Create Custom DNS Host Records for Domain Controllers

You will need to be sure that AWX is able to resolve the FQDNs of the domain controllers for Kerberos to be happy. We will do this by adding another config file in the /awx directory and applying it to the deployment.

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-dns
  namespace: awx
data:
  custom-hosts: |
    192.168.3.25 LAB-DC-01.bunny-lab.io LAB-DC-01
    192.168.3.26 LAB-DC-02.bunny-lab.io LAB-DC-02

Then we apply them with the following command:

kubectl apply -f custom_dns_records.yml

Create an AWX Container Group

At this point, we need to make a custom pod for the AWX Execution Environments that will use this Kerberos file. Reference information was found here.

Create Container Group with custom pod spec that mounts krb5.conf to allow Kerberos authentication to be used in this new Execution Environment (EE).
Open AWX UI and click on "Instance Groups" under the "Administration" section, then press "Add > Add container group".
Enter a descriptive name as you like (e.g. Kerberos EE) and click the toggle "Customize Pod Specification".
Put the following YAML string in "Custom pod spec" then press the "Save" button

apiVersion: v1
kind: Pod
metadata:
  namespace: awx
spec:
  serviceAccountName: default
  automountServiceAccountToken: false
  initContainers:
    - name: init-hosts
      image: busybox
      command: ['sh', '-c', 'cat /etc/custom-dns/custom-hosts >> /etc/hosts']
      volumeMounts:
        - name: custom-dns
          mountPath: /etc/custom-dns
  containers:
    - image: 'quay.io/ansible/awx-ee:latest'
      name: worker
      args:
        - ansible-runner
        - worker
        - '--private-data-dir=/runner'
      resources:
        requests:
          cpu: 250m
          memory: 100Mi
      volumeMounts:
        - name: awx-kerberos-volume
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
  volumes:
    - name: awx-kerberos-volume
      configMap:
        name: awx-kerberos-config
    - name: custom-dns
      configMap:
        name: custom-dns

!!! info "Explanation" Init Container: An init container named init-hosts is added. It runs before the main container starts and appends the custom DNS entries from the ConfigMap to the /etc/hosts file of the Kerberos Instance Group Pod.

Job Template & Inventory Examples

At this point, you need to adjust your exist Job Template(s) that need to communicate via Kerberos to domain-joined Windows devices to use the "Instance Group" of "Kerberos" while keeping the same Execution Environment you have been using up until this point. This will change the Execution Environment to include the Kerberos Keytab file in the EE at playbook runtime.

Also add the following variable to the job template:

---
kerberos_user: "nicole.rappe@BUNNY-LAB.IO"

You will want to ensure your inventory file is configured to use Kerberos Authentication as well, so the following example is a starting point:

virt-node-01 ansible_host=192.168.3.22

[virtualizationHosts]
virt-node-01

[virtualizationHosts:vars]
ansible_connection=winrm
ansible_port=5986
ansible_winrm_transport=kerberos
ansible_winrm_scheme=https
ansible_winrm_server_cert_validation=ignore
ansible_winrm_kerberos_realm=BUNNY-LAB.IO
#kerberos_user=nicole.rappe@BUNNY-LAB.IO #Optional, if you define this in the Job Template, it is not necessary.

Lastly, we want to ensure we have Keytab generation happening when the playbook is executed, so add these tasks to the beginning of your playbook(s) that interact with Kerberos devices:

    - name: Acquire Kerberos Ticket using Keytab
      ansible.builtin.shell: |
        kinit -kt /etc/krb5.keytab {{ kerberos_user }}        
      environment:
        KRB5_CONFIG: /etc/krb5.conf
      register: kinit_result

    - name: Ensure Kerberos Ticket was Acquired Successfully
      fail:
        msg: "Failed to acquire Kerberos ticket"
      when: kinit_result.rc != 0

15 KiB Raw Blame History