Bump version to 0.9.65

Restore original no-connection behavior
Previously not specifying a connection when multiple were available would error. This restores that behaviour.
2023-08-23 01:56:57 -04:00 · 2023-08-23 01:38:50 -04:00 · 2023-08-22 09:26:35 -04:00 · 2023-08-18 12:34:27 -04:00 · 2023-08-18 12:34:27 -04:00 · 2023-08-18 11:58:13 -04:00
77 changed files with 19749 additions and 592 deletions
--- a/.flake8
+++ b/.flake8
@ -3,10 +3,12 @@
 #   * W503 (line break before binary operator): Black moves these to new lines
 #   * E501 (line too long): Long lines are a fact of life in comment blocks; Black handles active instances of this
 #   * E203 (whitespace before ':'): Black recommends this as disabled
-ignore = W503, E501
+#   * F403 (import * used; unable to detect undefined names): We use a wildcard for helpers
+#   * F405 (possibly undefined name): We use a wildcard for helpers
+ignore = W503, E501, F403, F405
 extend-ignore = E203
 # We exclude the Debian, migrations, and provisioner examples
-exclude = debian,api-daemon/migrations/versions,api-daemon/provisioner/examples
+exclude = debian,api-daemon/migrations/versions,api-daemon/provisioner/examples,node-daemon/monitoring
 # Set the max line length to 88 for Black
 max-line-length = 88

--- a/.version
+++ b/.version
@ -1 +1 @@
-0.9.60
+0.9.65
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,50 @@
 ## PVC Changelog

+###### [v0.9.65](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.65)
+
+  * [CLI] Fixes a bug in the node list filtering command
+  * [CLI] Fixes a bug/default when no connection is specified
+
+###### [v0.9.64](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.64)
+
+  **Breaking Change [CLI]**: The CLI client root commands have been reorganized. The following commands have changed:
+
+   * `pvc cluster` -> `pvc connection` (all subcommands)
+   * `pvc task` -> `pvc cluster` (all subcommands)
+   * `pvc maintenance` -> `pvc cluster maintenance`
+   * `pvc status` -> `pvc cluster status`
+
+Ensure you have updated to the latest version of the PVC Ansible repository before deploying this version or using PVC Ansible oneshot playbooks for management.
+
+  **Breaking Change [CLI]**: The `--restart` option for VM configuration changes now has an explicit `--no-restart` to disable restarting, or a prompt if neither is specified; `--unsafe` no longer bypasses this prompt which was a bug. Applies to most `vm <cmd> set` commands like `vm vcpu set`, `vm memory set`, etc. All instances also feature restart confirmation afterwards, which, if `--restart` is provided, will prompt for confirmation unless `--yes` or `--unsafe` is specified.
+
+  **Breaking Change [CLI]**: The `--long` option previously on some `info` commands no longer exists; use `-f long`/`--format long` instead.
+
+  * [CLI] Significantly refactors the CLI client code for consistency and cleanliness
+  * [CLI] Implements `-f`/`--format` options for all `list` and `info` commands in a consistent way
+  * [CLI] Changes the behaviour of VM modification options with "--restart" to provide a "--no-restart"; defaults to a prompt if neither is specified and ignores the "--unsafe" global entirely
+  * [API] Fixes several bugs in the 3-debootstrap.py provisioner example script
+  * [Node] Fixes some bugs around VM shutdown on node flush
+  * [Documentation] Adds mentions of Ganeti and Harvester
+
+###### [v0.9.63](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.63)
+
+  * Mentions Ganeti in the docs
+  * Increases API timeout back to 2s
+  * Adds .update-* configs to dpkg plugin
+  * Adds full/nearfull OSD warnings
+  * Improves size value handling for volumes
+
+###### [v0.9.62](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.62)
+
+  * [all] Adds an enhanced health checking, monitoring, and reporting system for nodes and clusters
+  * [cli] Adds a cluster detail command
+
+###### [v0.9.61](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.61)
+
+  * [provisioner] Fixes a bug in network comparison
+  * [api] Fixes a bug being unable to rename disabled VMs
+
 ###### [v0.9.60](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.60)

  * [Provisioner] Cleans up several remaining bugs in the example scripts; they should all be valid now
--- a/README.md
+++ b/README.md
@ -9,7 +9,7 @@

 ## What is PVC?

-PVC is a Linux KVM-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.
+PVC is a Linux KVM-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Ganeti, Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.

 PVC is a complete HCI solution, built from well-known and well-trusted Free Software tools, to assist an administrator in creating and managing a cluster of servers to run virtual machines, as well as self-managing several important aspects including storage failover, node failure and recovery, virtual machine failure and recovery, and network plumbing. It is designed to act consistently, reliably, and unobtrusively, letting the administrator concentrate on more important things.

--- a/api-daemon/provisioner/examples/script/3-debootstrap.py
+++ b/api-daemon/provisioner/examples/script/3-debootstrap.py
@ -441,7 +441,7 @@ class VMBuilderScript(VMBuilder):

        # The directory we mounted things on earlier during prepare(); this could very well
        # be exposed as a module-level variable if you so choose
-        temporary_directory = "/tmp/target"
+        temp_dir = "/tmp/target"

        # Use these convenient aliases for later (avoiding lots of "self.vm_data" everywhere)
        vm_name = self.vm_name
@ -469,6 +469,8 @@ class VMBuilderScript(VMBuilder):
                "grub-pc",
                "cloud-init",
                "python3-cffi-backend",
+                "acpid",
+                "acpi-support-base",
                "wget",
            ]

@ -482,17 +484,17 @@ class VMBuilderScript(VMBuilder):

        # Perform a debootstrap installation
        print(
-            f"Installing system with debootstrap: debootstrap --include={','.join(deb_packages)} {deb_release} {temporary_directory} {deb_mirror}"
+            f"Installing system with debootstrap: debootstrap --include={','.join(deb_packages)} {deb_release} {temp_dir} {deb_mirror}"
        )
        os.system(
-            f"debootstrap --include={','.join(deb_packages)} {deb_release} {temporary_directory} {deb_mirror}"
+            f"debootstrap --include={','.join(deb_packages)} {deb_release} {temp_dir} {deb_mirror}"
        )

        # Bind mount the devfs so we can grub-install later
-        os.system("mount --bind /dev {}/dev".format(temporary_directory))
+        os.system("mount --bind /dev {}/dev".format(temp_dir))

        # Create an fstab entry for each volume
-        fstab_file = "{}/etc/fstab".format(temporary_directory)
+        fstab_file = "{}/etc/fstab".format(temp_dir)
        # The volume ID starts at zero and increments by one for each volume in the fixed-order
        # volume list. This lets us work around the insanity of Libvirt IDs not matching guest IDs,
        # while still letting us have some semblance of control here without enforcing things
@ -537,13 +539,13 @@ class VMBuilderScript(VMBuilder):
            volume_id += 1

        # Write the hostname; you could also take an FQDN argument for this as an example
-        hostname_file = "{}/etc/hostname".format(temporary_directory)
+        hostname_file = "{}/etc/hostname".format(temp_dir)
        with open(hostname_file, "w") as fh:
            fh.write("{}".format(vm_name))

        # Fix the cloud-init.target since it's broken by default in Debian 11
        cloudinit_target_file = "{}/etc/systemd/system/cloud-init.target".format(
-            temporary_directory
+            temp_dir
        )
        with open(cloudinit_target_file, "w") as fh:
            # We lose our indent on these raw blocks to preserve the apperance of the files
@ -557,7 +559,7 @@ After=multi-user.target
            fh.write(data)

        # Write the cloud-init configuration
-        ci_cfg_file = "{}/etc/cloud/cloud.cfg".format(temporary_directory)
+        ci_cfg_file = "{}/etc/cloud/cloud.cfg".format(temp_dir)
        with open(ci_cfg_file, "w") as fh:
            fh.write(
                """
@ -618,15 +620,15 @@ After=multi-user.target
                     - arches: [default]
                       failsafe:
                         primary: {deb_mirror}
-                """
-            ).format(deb_mirror=deb_mirror)
+                """.format(
+                    deb_mirror=deb_mirror
+                )
+            )

        # Due to device ordering within the Libvirt XML configuration, the first Ethernet interface
        # will always be on PCI bus ID 2, hence the name "ens2".
        # Write a DHCP stanza for ens2
-        ens2_network_file = "{}/etc/network/interfaces.d/ens2".format(
-            temporary_directory
-        )
+        ens2_network_file = "{}/etc/network/interfaces.d/ens2".format(temp_dir)
        with open(ens2_network_file, "w") as fh:
            data = """auto ens2
 iface ens2 inet dhcp
@ -634,7 +636,7 @@ iface ens2 inet dhcp
            fh.write(data)

        # Write the DHCP config for ens2
-        dhclient_file = "{}/etc/dhcp/dhclient.conf".format(temporary_directory)
+        dhclient_file = "{}/etc/dhcp/dhclient.conf".format(temp_dir)
        with open(dhclient_file, "w") as fh:
            # We can use fstrings too, since PVC will always have Python 3.6+, though
            # using format() might be preferable for clarity in some situations
@ -654,7 +656,7 @@ interface "ens2" {{
            fh.write(data)

        # Write the GRUB configuration
-        grubcfg_file = "{}/etc/default/grub".format(temporary_directory)
+        grubcfg_file = "{}/etc/default/grub".format(temp_dir)
        with open(grubcfg_file, "w") as fh:
            data = """# Written by the PVC provisioner
 GRUB_DEFAULT=0
@ -671,7 +673,7 @@ GRUB_DISABLE_LINUX_UUID=false
            fh.write(data)

        # Do some tasks inside the chroot using the provided context manager
-        with chroot(temporary_directory):
+        with chroot(temp_dir):
            # Install and update GRUB
            os.system(
                "grub-install --force /dev/rbd/{}/{}_{}".format(
@ -704,16 +706,17 @@ GRUB_DISABLE_LINUX_UUID=false
        """

        # Run any imports first
+        import os
        from pvcapid.vmbuilder import open_zk
        from pvcapid.Daemon import config
        import daemon_lib.common as pvc_common
        import daemon_lib.ceph as pvc_ceph

-        # Set the tempdir we used in the prepare() and install() steps
+        # Set the temp_dir we used in the prepare() and install() steps
        temp_dir = "/tmp/target"

        # Unmount the bound devfs
-        os.system("umount {}/dev".format(temporary_directory))
+        os.system("umount {}/dev".format(temp_dir))

        # Use this construct for reversing the list, as the normal reverse() messes with the list
        for volume in list(reversed(self.vm_data["volumes"])):
--- a/api-daemon/pvcapid/Daemon.py
+++ b/api-daemon/pvcapid/Daemon.py
@ -27,7 +27,7 @@ from ssl import SSLContext, TLSVersion
 from distutils.util import strtobool as dustrtobool

 # Daemon version
-version = "0.9.60"
+version = "0.9.65"

 # API version
 API_VERSION = 1.0
--- a/api-daemon/pvcapid/flaskapi.py
+++ b/api-daemon/pvcapid/flaskapi.py
@ -448,18 +448,48 @@ class API_Status(Resource):
              type: object
              id: ClusterStatus
              properties:
-                health:
+                cluster_health:
+                  type: object
+                  properties:
+                    health:
+                      type: integer
+                      description: The overall health (%) of the cluster
+                      example: 100
+                    messages:
+                      type: array
+                      description: A list of health event strings
+                      items:
+                        type: string
+                        example: "hv1: plugin 'nics': bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps"
+                node_health:
+                  type: object
+                  properties:
+                    hvX:
+                      type: object
+                      description: A node entry for per-node health details, one per node in the cluster
+                      properties:
+                        health:
+                          type: integer
+                          description: The health (%) of the node
+                          example: 100
+                        messages:
+                          type: array
+                          description: A list of health event strings
+                          items:
+                            type: string
+                            example: "'nics': bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps"
+                maintenance:
                  type: string
-                  description: The overall cluster health
-                  example: Optimal
-                storage_health:
-                  type: string
-                  description: The overall storage cluster health
-                  example: Optimal
+                  description: Whether the cluster is in maintenance mode or not (string boolean)
+                  example: true
                primary_node:
                  type: string
                  description: The current primary coordinator node
                  example: pvchv1
+                pvc_version:
+                  type: string
+                  description: The PVC version of the current primary coordinator node
+                  example: 0.9.61
                upstream_ip:
                  type: string
                  description: The cluster upstream IP address in CIDR format
@ -605,6 +635,38 @@ class API_Node_Root(Resource):
                arch:
                  type: string
                  description: The architecture of the CPU
+                health:
+                  type: integer
+                  description: The overall health (%) of the node
+                  example: 100
+                health_plugins:
+                  type: array
+                  description: A list of health plugin names currently loaded on the node
+                  items:
+                    type: string
+                    example: "nics"
+                health_details:
+                  type: array
+                  description: A list of health plugin results
+                  items:
+                    type: object
+                    properties:
+                      name:
+                        type: string
+                        description: The name of the health plugin
+                        example: nics
+                      last_run:
+                        type: integer
+                        description: The UNIX timestamp (s) of the last plugin run
+                        example: 1676786078
+                      health_delta:
+                        type: integer
+                        description: The health delta (negatively applied to the health percentage) of the plugin's current state
+                        example: 10
+                      message:
+                        type: string
+                        description: The output message of the plugin
+                        example: "bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps"
                load:
                  type: number
                  format: float
--- a/api-daemon/pvcapid/provisioner.py
+++ b/api-daemon/pvcapid/provisioner.py
@ -580,7 +580,7 @@ def delete_template_network_element(name, vni):
    networks, code = list_template_network_vnis(name)
    found_vni = False
    for network in networks:
-        if network["vni"] == int(vni):
+        if network["vni"] == vni:
            found_vni = True
    if not found_vni:
        retmsg = {
--- a/client-cli-old/pvc.py
+++ b/client-cli-old/pvc.py
@ -0,0 +1,33 @@
+#!/usr/bin/env python3
+
+# pvc.py - PVC client command-line interface (stub testing interface)
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import pvc.pvc
+
+
+#
+# Main entry point
+#
+def main():
+    return pvc.pvc.cli(obj={})
+
+
+if __name__ == "__main__":
+    main()
--- a/client-cli/pvc/cli_lib/init.py
+++ b/client-cli/pvc/cli_lib/init.py
--- a/client-cli-old/pvc/lib/init.py
+++ b/client-cli-old/pvc/lib/init.py
--- a/client-cli-old/pvc/lib/ansiprint.py
+++ b/client-cli-old/pvc/lib/ansiprint.py
--- a/client-cli-old/pvc/lib/ceph.py
+++ b/client-cli-old/pvc/lib/ceph.py
@ -27,8 +27,8 @@ from requests_toolbelt.multipart.encoder import (
    MultipartEncoderMonitor,
 )

-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import UploadProgressBar, call_api
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import UploadProgressBar, call_api

 #
 # Supplemental functions
--- a/client-cli-old/pvc/lib/cluster.py
+++ b/client-cli-old/pvc/lib/cluster.py
@ -21,8 +21,8 @@

 import json

-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import call_api
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import call_api


 def initialize(config, overwrite=False):
@ -125,81 +125,63 @@ def format_info(cluster_information, oformat):
        return json.dumps(cluster_information, indent=4)

    # Plain formatting, i.e. human-readable
-    if cluster_information["health"] == "Optimal":
-        health_colour = ansiprint.green()
-    elif cluster_information["health"] == "Maintenance":
+    if (
+        cluster_information.get("maintenance") == "true"
+        or cluster_information.get("cluster_health", {}).get("health", "N/A") == "N/A"
+    ):
        health_colour = ansiprint.blue()
-    else:
+    elif cluster_information.get("cluster_health", {}).get("health", 100) > 90:
+        health_colour = ansiprint.green()
+    elif cluster_information.get("cluster_health", {}).get("health", 100) > 50:
        health_colour = ansiprint.yellow()
-
-    if cluster_information["storage_health"] == "Optimal":
-        storage_health_colour = ansiprint.green()
-    elif cluster_information["storage_health"] == "Maintenance":
-        storage_health_colour = ansiprint.blue()
    else:
-        storage_health_colour = ansiprint.yellow()
+        health_colour = ansiprint.red()

    ainformation = []

-    if oformat == "short":
-        ainformation.append(
-            "{}PVC cluster status:{}".format(ansiprint.bold(), ansiprint.end())
-        )
-        ainformation.append(
-            "{}Cluster health:{}      {}{}{}".format(
-                ansiprint.purple(),
-                ansiprint.end(),
-                health_colour,
-                cluster_information["health"],
-                ansiprint.end(),
-            )
-        )
-        if cluster_information["health_msg"]:
-            for line in cluster_information["health_msg"]:
-                ainformation.append("                     > {}".format(line))
-        ainformation.append(
-            "{}Storage health:{}      {}{}{}".format(
-                ansiprint.purple(),
-                ansiprint.end(),
-                storage_health_colour,
-                cluster_information["storage_health"],
-                ansiprint.end(),
-            )
-        )
-        if cluster_information["storage_health_msg"]:
-            for line in cluster_information["storage_health_msg"]:
-                ainformation.append("                     > {}".format(line))
-
-        return "\n".join(ainformation)
-
    ainformation.append(
        "{}PVC cluster status:{}".format(ansiprint.bold(), ansiprint.end())
    )
    ainformation.append("")
+
+    health_text = (
+        f"{cluster_information.get('cluster_health', {}).get('health', 'N/A')}"
+    )
+    if health_text != "N/A":
+        health_text += "%"
+    if cluster_information.get("maintenance") == "true":
+        health_text += " (maintenance on)"
+
    ainformation.append(
-        "{}Cluster health:{}      {}{}{}".format(
+        "{}Cluster health:{}  {}{}{}".format(
            ansiprint.purple(),
            ansiprint.end(),
            health_colour,
-            cluster_information["health"],
+            health_text,
            ansiprint.end(),
        )
    )
-    if cluster_information["health_msg"]:
-        for line in cluster_information["health_msg"]:
-            ainformation.append("                     > {}".format(line))
-    ainformation.append(
-        "{}Storage health:{}      {}{}{}".format(
-            ansiprint.purple(),
-            ansiprint.end(),
-            storage_health_colour,
-            cluster_information["storage_health"],
-            ansiprint.end(),
+    if cluster_information.get("cluster_health", {}).get("messages"):
+        health_messages = "\n                 > ".join(
+            sorted(cluster_information["cluster_health"]["messages"])
        )
-    )
-    if cluster_information["storage_health_msg"]:
-        for line in cluster_information["storage_health_msg"]:
-            ainformation.append("                     > {}".format(line))
+        ainformation.append(
+            "{}Health messages:{} > {}".format(
+                ansiprint.purple(),
+                ansiprint.end(),
+                health_messages,
+            )
+        )
+    else:
+        ainformation.append(
+            "{}Health messages:{} N/A".format(
+                ansiprint.purple(),
+                ansiprint.end(),
+            )
+        )
+
+    if oformat == "short":
+        return "\n".join(ainformation)

    ainformation.append("")
    ainformation.append(
@ -207,6 +189,13 @@ def format_info(cluster_information, oformat):
            ansiprint.purple(), ansiprint.end(), cluster_information["primary_node"]
        )
    )
+    ainformation.append(
+        "{}PVC version:{}         {}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            cluster_information.get("pvc_version", "N/A"),
+        )
+    )
    ainformation.append(
        "{}Cluster upstream IP:{} {}".format(
            ansiprint.purple(), ansiprint.end(), cluster_information["upstream_ip"]
--- a/client-cli-old/pvc/lib/common.py
+++ b/client-cli-old/pvc/lib/common.py
@ -124,8 +124,8 @@ def call_api(
    data=None,
    files=None,
 ):
-    # Set the connect timeout to 3 seconds but extremely long (48 hour) data timeout
-    timeout = (3.05, 172800)
+    # Set the connect timeout to 2 seconds but extremely long (48 hour) data timeout
+    timeout = (2.05, 172800)

    # Craft the URI
    uri = "{}://{}{}{}".format(
--- a/client-cli-old/pvc/lib/network.py
+++ b/client-cli-old/pvc/lib/network.py
@ -20,8 +20,8 @@
 ###############################################################################

 import re
-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import call_api
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import call_api


 def isValidMAC(macaddr):
@ -961,7 +961,9 @@ def format_list_dhcp(dhcp_lease_list):
        )
    )

-    for dhcp_lease_information in sorted(dhcp_lease_list, key=lambda l: l["hostname"]):
+    for dhcp_lease_information in sorted(
+        dhcp_lease_list, key=lambda lease: lease["hostname"]
+    ):
        dhcp_lease_list_output.append(
            "{bold}\
 {lease_hostname: <{lease_hostname_length}} \
@ -1059,7 +1061,7 @@ def format_list_acl(acl_list):
    )

    for acl_information in sorted(
-        acl_list, key=lambda l: l["direction"] + str(l["order"])
+        acl_list, key=lambda acl: acl["direction"] + str(acl["order"])
    ):
        acl_list_output.append(
            "{bold}\
--- a/client-cli-old/pvc/lib/node.py
+++ b/client-cli-old/pvc/lib/node.py
@ -21,8 +21,8 @@

 import time

-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import call_api
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import call_api


 #
@ -215,6 +215,19 @@ def node_list(
 # Output display functions
 #
 def getOutputColours(node_information):
+    node_health = node_information.get("health", "N/A")
+    if isinstance(node_health, int):
+        if node_health <= 50:
+            health_colour = ansiprint.red()
+        elif node_health <= 90:
+            health_colour = ansiprint.yellow()
+        elif node_health <= 100:
+            health_colour = ansiprint.green()
+        else:
+            health_colour = ansiprint.blue()
+    else:
+        health_colour = ansiprint.blue()
+
    if node_information["daemon_state"] == "run":
        daemon_state_colour = ansiprint.green()
    elif node_information["daemon_state"] == "stop":
@ -251,6 +264,7 @@ def getOutputColours(node_information):
        mem_provisioned_colour = ""

    return (
+        health_colour,
        daemon_state_colour,
        coordinator_state_colour,
        domain_state_colour,
@ -261,6 +275,7 @@ def getOutputColours(node_information):

 def format_info(node_information, long_output):
    (
+        health_colour,
        daemon_state_colour,
        coordinator_state_colour,
        domain_state_colour,
@ -273,14 +288,56 @@ def format_info(node_information, long_output):
    # Basic information
    ainformation.append(
        "{}Name:{}                  {}".format(
-            ansiprint.purple(), ansiprint.end(), node_information["name"]
+            ansiprint.purple(),
+            ansiprint.end(),
+            node_information["name"],
        )
    )
    ainformation.append(
        "{}PVC Version:{}           {}".format(
-            ansiprint.purple(), ansiprint.end(), node_information["pvc_version"]
+            ansiprint.purple(),
+            ansiprint.end(),
+            node_information["pvc_version"],
        )
    )
+
+    node_health = node_information.get("health", "N/A")
+    if isinstance(node_health, int):
+        node_health_text = f"{node_health}%"
+    else:
+        node_health_text = node_health
+    ainformation.append(
+        "{}Health:{}                {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            health_colour,
+            node_health_text,
+            ansiprint.end(),
+        )
+    )
+
+    node_health_details = node_information.get("health_details", [])
+    if long_output:
+        node_health_messages = "\n                       ".join(
+            [f"{plugin['name']}: {plugin['message']}" for plugin in node_health_details]
+        )
+    else:
+        node_health_messages = "\n                       ".join(
+            [
+                f"{plugin['name']}: {plugin['message']}"
+                for plugin in node_health_details
+                if int(plugin.get("health_delta", 0)) > 0
+            ]
+        )
+
+    if len(node_health_messages) > 0:
+        ainformation.append(
+            "{}Health Plugin Details:{} {}".format(
+                ansiprint.purple(), ansiprint.end(), node_health_messages
+            )
+        )
+    ainformation.append("")
+
    ainformation.append(
        "{}Daemon State:{}          {}{}{}".format(
            ansiprint.purple(),
@ -308,11 +365,6 @@ def format_info(node_information, long_output):
            ansiprint.end(),
        )
    )
-    ainformation.append(
-        "{}Active VM Count:{}       {}".format(
-            ansiprint.purple(), ansiprint.end(), node_information["domains_count"]
-        )
-    )
    if long_output:
        ainformation.append("")
        ainformation.append(
@ -331,6 +383,11 @@ def format_info(node_information, long_output):
            )
        )
    ainformation.append("")
+    ainformation.append(
+        "{}Active VM Count:{}       {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["domains_count"]
+        )
+    )
    ainformation.append(
        "{}Host CPUs:{}             {}".format(
            ansiprint.purple(), ansiprint.end(), node_information["vcpu"]["total"]
@ -397,6 +454,7 @@ def format_list(node_list, raw):
    # Determine optimal column widths
    node_name_length = 5
    pvc_version_length = 8
+    health_length = 7
    daemon_state_length = 7
    coordinator_state_length = 12
    domain_state_length = 7
@ -417,6 +475,15 @@ def format_list(node_list, raw):
        _pvc_version_length = len(node_information.get("pvc_version", "N/A")) + 1
        if _pvc_version_length > pvc_version_length:
            pvc_version_length = _pvc_version_length
+        # node_health column
+        node_health = node_information.get("health", "N/A")
+        if isinstance(node_health, int):
+            node_health_text = f"{node_health}%"
+        else:
+            node_health_text = node_health
+        _health_length = len(node_health_text) + 1
+        if _health_length > health_length:
+            health_length = _health_length
        # daemon_state column
        _daemon_state_length = len(node_information["daemon_state"]) + 1
        if _daemon_state_length > daemon_state_length:
@ -466,7 +533,10 @@ def format_list(node_list, raw):
    # Format the string (header)
    node_list_output.append(
        "{bold}{node_header: <{node_header_length}} {state_header: <{state_header_length}} {resource_header: <{resource_header_length}} {memory_header: <{memory_header_length}}{end_bold}".format(
-            node_header_length=node_name_length + pvc_version_length + 1,
+            node_header_length=node_name_length
+            + pvc_version_length
+            + health_length
+            + 2,
            state_header_length=daemon_state_length
            + coordinator_state_length
            + domain_state_length
@ -484,7 +554,14 @@ def format_list(node_list, raw):
            bold=ansiprint.bold(),
            end_bold=ansiprint.end(),
            node_header="Nodes "
-            + "".join(["-" for _ in range(6, node_name_length + pvc_version_length)]),
+            + "".join(
+                [
+                    "-"
+                    for _ in range(
+                        6, node_name_length + pvc_version_length + health_length + 1
+                    )
+                ]
+            ),
            state_header="States "
            + "".join(
                [
@ -526,12 +603,13 @@ def format_list(node_list, raw):
    )

    node_list_output.append(
-        "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} \
+        "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} {node_health: <{health_length}} \
 {daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
 {node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
 {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {node_mem_allocated: <{mem_alloc_length}} {node_mem_provisioned: <{mem_prov_length}}{end_bold}".format(
            node_name_length=node_name_length,
            pvc_version_length=pvc_version_length,
+            health_length=health_length,
            daemon_state_length=daemon_state_length,
            coordinator_state_length=coordinator_state_length,
            domain_state_length=domain_state_length,
@ -551,6 +629,7 @@ def format_list(node_list, raw):
            end_colour="",
            node_name="Name",
            node_pvc_version="Version",
+            node_health="Health",
            node_daemon_state="Daemon",
            node_coordinator_state="Coordinator",
            node_domain_state="Domain",
@ -568,19 +647,28 @@ def format_list(node_list, raw):
    # Format the string (elements)
    for node_information in sorted(node_list, key=lambda n: n["name"]):
        (
+            health_colour,
            daemon_state_colour,
            coordinator_state_colour,
            domain_state_colour,
            mem_allocated_colour,
            mem_provisioned_colour,
        ) = getOutputColours(node_information)
+
+        node_health = node_information.get("health", "N/A")
+        if isinstance(node_health, int):
+            node_health_text = f"{node_health}%"
+        else:
+            node_health_text = node_health
+
        node_list_output.append(
-            "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} \
+            "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} {health_colour}{node_health: <{health_length}}{end_colour} \
 {daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
 {node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
 {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {mem_allocated_colour}{node_mem_allocated: <{mem_alloc_length}}{end_colour} {mem_provisioned_colour}{node_mem_provisioned: <{mem_prov_length}}{end_colour}{end_bold}".format(
                node_name_length=node_name_length,
                pvc_version_length=pvc_version_length,
+                health_length=health_length,
                daemon_state_length=daemon_state_length,
                coordinator_state_length=coordinator_state_length,
                domain_state_length=domain_state_length,
@ -594,6 +682,7 @@ def format_list(node_list, raw):
                mem_prov_length=mem_prov_length,
                bold="",
                end_bold="",
+                health_colour=health_colour,
                daemon_state_colour=daemon_state_colour,
                coordinator_state_colour=coordinator_state_colour,
                domain_state_colour=domain_state_colour,
@ -602,6 +691,7 @@ def format_list(node_list, raw):
                end_colour=ansiprint.end(),
                node_name=node_information["name"],
                node_pvc_version=node_information.get("pvc_version", "N/A"),
+                node_health=node_health_text,
                node_daemon_state=node_information["daemon_state"],
                node_coordinator_state=node_information["coordinator_state"],
                node_domain_state=node_information["domain_state"],
--- a/client-cli-old/pvc/lib/provisioner.py
+++ b/client-cli-old/pvc/lib/provisioner.py
@ -24,8 +24,8 @@ from requests_toolbelt.multipart.encoder import (
    MultipartEncoderMonitor,
 )

-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import UploadProgressBar, call_api
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import UploadProgressBar, call_api
 from ast import literal_eval


--- a/client-cli-old/pvc/lib/vm.py
+++ b/client-cli-old/pvc/lib/vm.py
@ -22,8 +22,8 @@
 import time
 import re

-import pvc.cli_lib.ansiprint as ansiprint
-from pvc.cli_lib.common import call_api, format_bytes, format_metric
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import call_api, format_bytes, format_metric


 #
@ -677,7 +677,7 @@ def vm_networks_add(
    from lxml.objectify import fromstring
    from lxml.etree import tostring
    from random import randint
-    import pvc.cli_lib.network as pvc_network
+    import pvc.lib.network as pvc_network

    network_exists, _ = pvc_network.net_info(config, network)
    if not network_exists:
@ -1046,7 +1046,7 @@ def vm_volumes_add(config, vm, volume, disk_id, bus, disk_type, live, restart):
    from lxml.objectify import fromstring
    from lxml.etree import tostring
    from copy import deepcopy
-    import pvc.cli_lib.ceph as pvc_ceph
+    import pvc.lib.ceph as pvc_ceph

    if disk_type == "rbd":
        # Verify that the provided volume is valid
--- a/client-cli-old/pvc/lib/zkhandler.py
+++ b/client-cli-old/pvc/lib/zkhandler.py
--- a/client-cli-old/pvc/pvc.py
+++ b/client-cli-old/pvc/pvc.py
@ -37,13 +37,13 @@ from distutils.util import strtobool

 from functools import wraps

-import pvc.cli_lib.ansiprint as ansiprint
-import pvc.cli_lib.cluster as pvc_cluster
-import pvc.cli_lib.node as pvc_node
-import pvc.cli_lib.vm as pvc_vm
-import pvc.cli_lib.network as pvc_network
-import pvc.cli_lib.ceph as pvc_ceph
-import pvc.cli_lib.provisioner as pvc_provisioner
+import pvc.lib.ansiprint as ansiprint
+import pvc.lib.cluster as pvc_cluster
+import pvc.lib.node as pvc_node
+import pvc.lib.vm as pvc_vm
+import pvc.lib.network as pvc_network
+import pvc.lib.ceph as pvc_ceph
+import pvc.lib.provisioner as pvc_provisioner


 myhostname = socket.gethostname().split(".")[0]
@ -134,7 +134,7 @@ def get_config(store_data, cluster=None):
    config = dict()
    config["debug"] = False
    config["cluster"] = cluster
-    config["desctription"] = description
+    config["description"] = description
    config["api_host"] = "{}:{}".format(host, port)
    config["api_scheme"] = scheme
    config["api_key"] = api_key
@ -382,8 +382,6 @@ def cluster_list(raw):

    if not raw:
        # Display the data nicely
-        echo("Available clusters:")
-        echo("")
        echo(
            "{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}".format(
                bold=ansiprint.bold(),
@ -443,6 +441,230 @@ def cluster_list(raw):
            echo(cluster)


+###############################################################################
+# pvc cluster detail
+###############################################################################
+@click.command(name="detail", short_help="Show details of all available clusters.")
+def cluster_detail():
+    """
+    Show quick details of all PVC clusters configured in this CLI instance.
+    """
+
+    # Get the existing data
+    clusters = get_store(store_path)
+
+    cluster_details_list = list()
+
+    echo("Gathering information from clusters... ", nl=False)
+
+    for cluster in clusters:
+        _store_data = get_store(store_path)
+        cluster_config = get_config(_store_data, cluster=cluster)
+        retcode, retdata = pvc_cluster.get_info(cluster_config)
+        if retcode == 0:
+            retdata = None
+        cluster_details = {"config": cluster_config, "data": retdata}
+        cluster_details_list.append(cluster_details)
+
+    echo("done.")
+    echo("")
+
+    # Find the lengths of each column
+    name_length = 5
+    description_length = 12
+    health_length = 7
+    primary_node_length = 8
+    pvc_version_length = 8
+    nodes_length = 6
+    vms_length = 4
+    networks_length = 9
+    osds_length = 5
+    pools_length = 6
+    volumes_length = 8
+    snapshots_length = 10
+
+    for cluster_details in cluster_details_list:
+        _name_length = len(cluster_details["config"]["cluster"]) + 1
+        if _name_length > name_length:
+            name_length = _name_length
+
+        _description_length = len(cluster_details["config"]["description"]) + 1
+        if _description_length > description_length:
+            description_length = _description_length
+
+        if cluster_details["data"] is None:
+            continue
+
+        _health_length = (
+            len(
+                str(
+                    cluster_details["data"]
+                    .get("cluster_health", {})
+                    .get("health", "N/A")
+                )
+                + "%"
+            )
+            + 1
+        )
+        if _health_length > health_length:
+            health_length = _health_length
+
+        _primary_node_length = len(cluster_details["data"]["primary_node"]) + 1
+        if _primary_node_length > primary_node_length:
+            primary_node_length = _primary_node_length
+
+        _pvc_version_length = (
+            len(cluster_details["data"].get("pvc_version", "< 0.9.62")) + 1
+        )
+        if _pvc_version_length > pvc_version_length:
+            pvc_version_length = _pvc_version_length
+
+        _nodes_length = len(str(cluster_details["data"]["nodes"]["total"])) + 1
+        if _nodes_length > nodes_length:
+            nodes_length = _nodes_length
+
+        _vms_length = len(str(cluster_details["data"]["vms"]["total"])) + 1
+        if _vms_length > vms_length:
+            vms_length = _vms_length
+
+        _networks_length = len(str(cluster_details["data"]["networks"])) + 1
+        if _networks_length > networks_length:
+            networks_length = _networks_length
+
+        _osds_length = len(str(cluster_details["data"]["osds"]["total"])) + 1
+        if _osds_length > osds_length:
+            osds_length = _osds_length
+
+        _pools_length = len(str(cluster_details["data"]["pools"])) + 1
+        if _pools_length > pools_length:
+            pools_length = _pools_length
+
+        _volumes_length = len(str(cluster_details["data"]["volumes"])) + 1
+        if _volumes_length > volumes_length:
+            volumes_length = _volumes_length
+
+        _snapshots_length = len(str(cluster_details["data"]["snapshots"])) + 1
+        if _snapshots_length > snapshots_length:
+            snapshots_length = _snapshots_length
+
+    # Display the data nicely
+    echo(
+        "{bold}{name: <{name_length}} {description: <{description_length}} {health: <{health_length}} {primary_node: <{primary_node_length}} {pvc_version: <{pvc_version_length}} {nodes: <{nodes_length}} {vms: <{vms_length}} {networks: <{networks_length}} {osds: <{osds_length}} {pools: <{pools_length}} {volumes: <{volumes_length}} {snapshots: <{snapshots_length}}{end_bold}".format(
+            bold=ansiprint.bold(),
+            end_bold=ansiprint.end(),
+            name="Name",
+            name_length=name_length,
+            description="Description",
+            description_length=description_length,
+            health="Health",
+            health_length=health_length,
+            primary_node="Primary",
+            primary_node_length=primary_node_length,
+            pvc_version="Version",
+            pvc_version_length=pvc_version_length,
+            nodes="Nodes",
+            nodes_length=nodes_length,
+            vms="VMs",
+            vms_length=vms_length,
+            networks="Networks",
+            networks_length=networks_length,
+            osds="OSDs",
+            osds_length=osds_length,
+            pools="Pools",
+            pools_length=pools_length,
+            volumes="Volumes",
+            volumes_length=volumes_length,
+            snapshots="Snapshots",
+            snapshots_length=snapshots_length,
+        )
+    )
+
+    for cluster_details in cluster_details_list:
+        if cluster_details["data"] is None:
+            health_colour = ansiprint.blue()
+            name = cluster_details["config"]["cluster"]
+            description = cluster_details["config"]["description"]
+            health = "N/A"
+            primary_node = "N/A"
+            pvc_version = "N/A"
+            nodes = "N/A"
+            vms = "N/A"
+            networks = "N/A"
+            osds = "N/A"
+            pools = "N/A"
+            volumes = "N/A"
+            snapshots = "N/A"
+        else:
+            if (
+                cluster_details["data"].get("maintenance") == "true"
+                or cluster_details["data"]
+                .get("cluster_health", {})
+                .get("health", "N/A")
+                == "N/A"
+            ):
+                health_colour = ansiprint.blue()
+            elif (
+                cluster_details["data"].get("cluster_health", {}).get("health", 100)
+                > 90
+            ):
+                health_colour = ansiprint.green()
+            elif (
+                cluster_details["data"].get("cluster_health", {}).get("health", 100)
+                > 50
+            ):
+                health_colour = ansiprint.yellow()
+            else:
+                health_colour = ansiprint.red()
+
+            name = cluster_details["config"]["cluster"]
+            description = cluster_details["config"]["description"]
+            health = str(
+                cluster_details["data"].get("cluster_health", {}).get("health", "N/A")
+            )
+            if health != "N/A":
+                health += "%"
+            primary_node = cluster_details["data"]["primary_node"]
+            pvc_version = cluster_details["data"].get("pvc_version", "< 0.9.62")
+            nodes = str(cluster_details["data"]["nodes"]["total"])
+            vms = str(cluster_details["data"]["vms"]["total"])
+            networks = str(cluster_details["data"]["networks"])
+            osds = str(cluster_details["data"]["osds"]["total"])
+            pools = str(cluster_details["data"]["pools"])
+            volumes = str(cluster_details["data"]["volumes"])
+            snapshots = str(cluster_details["data"]["snapshots"])
+
+        echo(
+            "{name: <{name_length}} {description: <{description_length}} {health_colour}{health: <{health_length}}{end_colour} {primary_node: <{primary_node_length}} {pvc_version: <{pvc_version_length}} {nodes: <{nodes_length}} {vms: <{vms_length}} {networks: <{networks_length}} {osds: <{osds_length}} {pools: <{pools_length}} {volumes: <{volumes_length}} {snapshots: <{snapshots_length}}".format(
+                health_colour=health_colour,
+                end_colour=ansiprint.end(),
+                name=name,
+                name_length=name_length,
+                description=description,
+                description_length=description_length,
+                health=health,
+                health_length=health_length,
+                primary_node=primary_node,
+                primary_node_length=primary_node_length,
+                pvc_version=pvc_version,
+                pvc_version_length=pvc_version_length,
+                nodes=nodes,
+                nodes_length=nodes_length,
+                vms=vms,
+                vms_length=vms_length,
+                networks=networks,
+                networks_length=networks_length,
+                osds=osds,
+                osds_length=osds_length,
+                pools=pools,
+                pools_length=pools_length,
+                volumes=volumes,
+                volumes_length=volumes_length,
+                snapshots=snapshots,
+                snapshots_length=snapshots_length,
+            )
+        )
+
+
 # Validate that the cluster is set for a given command
 def cluster_req(function):
    @wraps(function)
@ -452,6 +674,24 @@ def cluster_req(function):
                'No cluster specified and no local pvcapid.yaml configuration found. Use "pvc cluster" to add a cluster API to connect to.'
            )
            exit(1)
+
+        if not config["quiet"]:
+            if config["api_scheme"] == "https" and not config["verify_ssl"]:
+                ssl_unverified_msg = " (unverified)"
+            else:
+                ssl_unverified_msg = ""
+            echo(
+                'Using cluster "{}" - Host: "{}"  Scheme: "{}{}"  Prefix: "{}"'.format(
+                    config["cluster"],
+                    config["api_host"],
+                    config["api_scheme"],
+                    ssl_unverified_msg,
+                    config["api_prefix"],
+                ),
+                err=True,
+            )
+            echo("", err=True)
+
        return function(*args, **kwargs)

    return validate_cluster
@ -697,15 +937,29 @@ def node_log(node, lines, follow):
    default=False,
    help="Display more detailed information.",
 )
+@click.option(
+    "-f",
+    "--format",
+    "oformat",
+    default="plain",
+    show_default=True,
+    type=click.Choice(["plain", "json", "json-pretty"]),
+    help="Output format of node status information.",
+)
@cluster_req
-def node_info(node, long_output):
+def node_info(node, long_output, oformat):
    """
    Show information about node NODE. If unspecified, defaults to this host.
    """

    retcode, retdata = pvc_node.node_info(config, node)
    if retcode:
-        retdata = pvc_node.format_info(retdata, long_output)
+        if oformat == "json":
+            retdata = json.dumps(retdata)
+        elif oformat == "json-pretty":
+            retdata = json.dumps(retdata, indent=4)
+        else:
+            retdata = pvc_node.format_info(retdata, long_output)
    cleanup(retcode, retdata)


@ -5882,23 +6136,7 @@ def cli(_cluster, _debug, _quiet, _unsafe, _colour):
        config["debug"] = _debug
        config["unsafe"] = _unsafe
        config["colour"] = _colour
-
-        if not _quiet:
-            if config["api_scheme"] == "https" and not config["verify_ssl"]:
-                ssl_unverified_msg = " (unverified)"
-            else:
-                ssl_unverified_msg = ""
-            echo(
-                'Using cluster "{}" - Host: "{}"  Scheme: "{}{}"  Prefix: "{}"'.format(
-                    config["cluster"],
-                    config["api_host"],
-                    config["api_scheme"],
-                    ssl_unverified_msg,
-                    config["api_prefix"],
-                ),
-                err=True,
-            )
-            echo("", err=True)
+        config["quiet"] = _quiet

    audit()

@ -5909,6 +6147,7 @@ def cli(_cluster, _debug, _quiet, _unsafe, _colour):
 cli_cluster.add_command(cluster_add)
 cli_cluster.add_command(cluster_remove)
 cli_cluster.add_command(cluster_list)
+cli_cluster.add_command(cluster_detail)

 cli_node.add_command(node_secondary)
 cli_node.add_command(node_primary)
--- a/client-cli-old/scripts/README
+++ b/client-cli-old/scripts/README
--- a/client-cli-old/scripts/export_vm
+++ b/client-cli-old/scripts/export_vm
--- a/client-cli-old/scripts/force_single_node
+++ b/client-cli-old/scripts/force_single_node
--- a/client-cli-old/scripts/import_vm
+++ b/client-cli-old/scripts/import_vm
--- a/client-cli-old/scripts/migrate_vm
+++ b/client-cli-old/scripts/migrate_vm
--- a/client-cli-old/setup.py
+++ b/client-cli-old/setup.py
@ -0,0 +1,20 @@
+from setuptools import setup
+
+setup(
+    name="pvc",
+    version="0.9.63",
+    packages=["pvc", "pvc.lib"],
+    install_requires=[
+        "Click",
+        "PyYAML",
+        "lxml",
+        "colorama",
+        "requests",
+        "requests-toolbelt",
+    ],
+    entry_points={
+        "console_scripts": [
+            "pvc = pvc.pvc:cli",
+        ],
+    },
+)
--- a/client-cli/pvc.py
+++ b/client-cli/pvc.py
@ -0,0 +1,33 @@
+#!/usr/bin/env python3
+
+# pvc.py - PVC client command-line interface (stub testing interface)
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from pvc.cli.cli import cli
+
+
+#
+# Main entry point
+#
+def main():
+    return cli(obj={})
+
+
+if __name__ == "__main__":
+    main()
--- a/client-cli/pvc/cli/cli.py
+++ b/client-cli/pvc/cli/cli.py
--- a/client-cli/pvc/cli/formatters.py
+++ b/client-cli/pvc/cli/formatters.py
@ -0,0 +1,734 @@
+#!/usr/bin/env python3
+
+# formatters.py - PVC Click CLI output formatters library
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2023 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from pvc.lib.node import format_info as node_format_info
+from pvc.lib.node import format_list as node_format_list
+from pvc.lib.vm import format_vm_tags as vm_format_tags
+from pvc.lib.vm import format_vm_vcpus as vm_format_vcpus
+from pvc.lib.vm import format_vm_memory as vm_format_memory
+from pvc.lib.vm import format_vm_networks as vm_format_networks
+from pvc.lib.vm import format_vm_volumes as vm_format_volumes
+from pvc.lib.vm import format_info as vm_format_info
+from pvc.lib.vm import format_list as vm_format_list
+from pvc.lib.network import format_info as network_format_info
+from pvc.lib.network import format_list as network_format_list
+from pvc.lib.network import format_list_dhcp as network_format_dhcp_list
+from pvc.lib.network import format_list_acl as network_format_acl_list
+from pvc.lib.network import format_list_sriov_pf as network_format_sriov_pf_list
+from pvc.lib.network import format_info_sriov_vf as network_format_sriov_vf_info
+from pvc.lib.network import format_list_sriov_vf as network_format_sriov_vf_list
+from pvc.lib.storage import format_raw_output as storage_format_raw
+from pvc.lib.storage import format_info_benchmark as storage_format_benchmark_info
+from pvc.lib.storage import format_list_benchmark as storage_format_benchmark_list
+from pvc.lib.storage import format_list_osd as storage_format_osd_list
+from pvc.lib.storage import format_list_pool as storage_format_pool_list
+from pvc.lib.storage import format_list_volume as storage_format_volume_list
+from pvc.lib.storage import format_list_snapshot as storage_format_snapshot_list
+from pvc.lib.provisioner import format_list_template as provisioner_format_template_list
+from pvc.lib.provisioner import format_list_userdata as provisioner_format_userdata_list
+from pvc.lib.provisioner import format_list_script as provisioner_format_script_list
+from pvc.lib.provisioner import format_list_ova as provisioner_format_ova_list
+from pvc.lib.provisioner import format_list_profile as provisioner_format_profile_list
+from pvc.lib.provisioner import format_list_task as provisioner_format_task_status
+
+
+# Define colour values for use in formatters
+ansii = {
+    "red": "\033[91m",
+    "blue": "\033[94m",
+    "cyan": "\033[96m",
+    "green": "\033[92m",
+    "yellow": "\033[93m",
+    "purple": "\033[95m",
+    "bold": "\033[1m",
+    "end": "\033[0m",
+}
+
+
+def cli_cluster_status_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the full output of cli_cluster_status
+    """
+
+    # Normalize data to local variables
+    health = data.get("cluster_health", {}).get("health", -1)
+    messages = data.get("cluster_health", {}).get("messages", None)
+    maintenance = data.get("maintenance", "N/A")
+    primary_node = data.get("primary_node", "N/A")
+    pvc_version = data.get("pvc_version", "N/A")
+    upstream_ip = data.get("upstream_ip", "N/A")
+    total_nodes = data.get("nodes", {}).get("total", 0)
+    total_vms = data.get("vms", {}).get("total", 0)
+    total_networks = data.get("networks", 0)
+    total_osds = data.get("osds", {}).get("total", 0)
+    total_pools = data.get("pools", 0)
+    total_volumes = data.get("volumes", 0)
+    total_snapshots = data.get("snapshots", 0)
+
+    if maintenance == "true" or health == -1:
+        health_colour = ansii["blue"]
+    elif health > 90:
+        health_colour = ansii["green"]
+    elif health > 50:
+        health_colour = ansii["yellow"]
+    else:
+        health_colour = ansii["red"]
+
+    output = list()
+
+    output.append(f"{ansii['bold']}PVC cluster status:{ansii['end']}")
+    output.append("")
+
+    if health != "-1":
+        health = f"{health}%"
+    else:
+        health = "N/A"
+
+    if maintenance == "true":
+        health = f"{health} (maintenance on)"
+
+    output.append(
+        f"{ansii['purple']}Cluster health:{ansii['end']}   {health_colour}{health}{ansii['end']}"
+    )
+
+    if messages is not None and len(messages) > 0:
+        messages = "\n                  ".join(sorted(messages))
+        output.append(f"{ansii['purple']}Health messages:{ansii['end']}  {messages}")
+
+    output.append("")
+
+    output.append(f"{ansii['purple']}Primary node:{ansii['end']}     {primary_node}")
+    output.append(f"{ansii['purple']}PVC version:{ansii['end']}      {pvc_version}")
+    output.append(f"{ansii['purple']}Upstream IP:{ansii['end']}      {upstream_ip}")
+    output.append("")
+
+    node_states = ["run,ready"]
+    node_states.extend(
+        [
+            state
+            for state in data.get("nodes", {}).keys()
+            if state not in ["total", "run,ready"]
+        ]
+    )
+
+    nodes_strings = list()
+    for state in node_states:
+        if state in ["run,ready"]:
+            state_colour = ansii["green"]
+        elif state in ["run,flush", "run,unflush", "run,flushed"]:
+            state_colour = ansii["blue"]
+        elif "dead" in state or "stop" in state:
+            state_colour = ansii["red"]
+        else:
+            state_colour = ansii["yellow"]
+
+        nodes_strings.append(
+            f"{data.get('nodes', {}).get(state)}/{total_nodes} {state_colour}{state}{ansii['end']}"
+        )
+
+    nodes_string = ", ".join(nodes_strings)
+
+    output.append(f"{ansii['purple']}Nodes:{ansii['end']}            {nodes_string}")
+
+    vm_states = ["start", "disable"]
+    vm_states.extend(
+        [
+            state
+            for state in data.get("vms", {}).keys()
+            if state not in ["total", "start", "disable"]
+        ]
+    )
+
+    vms_strings = list()
+    for state in vm_states:
+        if data.get("vms", {}).get(state) is None:
+            continue
+        if state in ["start"]:
+            state_colour = ansii["green"]
+        elif state in ["migrate", "disable"]:
+            state_colour = ansii["blue"]
+        elif state in ["stop", "fail"]:
+            state_colour = ansii["red"]
+        else:
+            state_colour = ansii["yellow"]
+
+        vms_strings.append(
+            f"{data.get('vms', {}).get(state)}/{total_vms} {state_colour}{state}{ansii['end']}"
+        )
+
+    vms_string = ", ".join(vms_strings)
+
+    output.append(f"{ansii['purple']}VMs:{ansii['end']}              {vms_string}")
+
+    osd_states = ["up,in"]
+    osd_states.extend(
+        [
+            state
+            for state in data.get("osds", {}).keys()
+            if state not in ["total", "up,in"]
+        ]
+    )
+
+    osds_strings = list()
+    for state in osd_states:
+        if state in ["up,in"]:
+            state_colour = ansii["green"]
+        elif state in ["down,out"]:
+            state_colour = ansii["red"]
+        else:
+            state_colour = ansii["yellow"]
+
+        osds_strings.append(
+            f"{data.get('osds', {}).get(state)}/{total_osds} {state_colour}{state}{ansii['end']}"
+        )
+
+    osds_string = " ".join(osds_strings)
+
+    output.append(f"{ansii['purple']}OSDs:{ansii['end']}             {osds_string}")
+
+    output.append(f"{ansii['purple']}Pools:{ansii['end']}            {total_pools}")
+
+    output.append(f"{ansii['purple']}Volumes:{ansii['end']}          {total_volumes}")
+
+    output.append(f"{ansii['purple']}Snapshots:{ansii['end']}        {total_snapshots}")
+
+    output.append(f"{ansii['purple']}Networks:{ansii['end']}         {total_networks}")
+
+    output.append("")
+
+    return "\n".join(output)
+
+
+def cli_cluster_status_format_short(CLI_CONFIG, data):
+    """
+    Pretty format the health-only output of cli_cluster_status
+    """
+
+    # Normalize data to local variables
+    health = data.get("cluster_health", {}).get("health", -1)
+    messages = data.get("cluster_health", {}).get("messages", None)
+    maintenance = data.get("maintenance", "N/A")
+
+    if maintenance == "true" or health == -1:
+        health_colour = ansii["blue"]
+    elif health > 90:
+        health_colour = ansii["green"]
+    elif health > 50:
+        health_colour = ansii["yellow"]
+    else:
+        health_colour = ansii["red"]
+
+    output = list()
+
+    output.append(f"{ansii['bold']}PVC cluster status:{ansii['end']}")
+    output.append("")
+
+    if health != "-1":
+        health = f"{health}%"
+    else:
+        health = "N/A"
+
+    if maintenance == "true":
+        health = f"{health} (maintenance on)"
+
+    output.append(
+        f"{ansii['purple']}Cluster health:{ansii['end']}   {health_colour}{health}{ansii['end']}"
+    )
+
+    if messages is not None and len(messages) > 0:
+        messages = "\n                  ".join(sorted(messages))
+        output.append(f"{ansii['purple']}Health messages:{ansii['end']}  {messages}")
+
+    output.append("")
+
+    return "\n".join(output)
+
+
+def cli_connection_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_connection_list
+    """
+
+    # Set the fields data
+    fields = {
+        "name": {"header": "Name", "length": len("Name") + 1},
+        "description": {"header": "Description", "length": len("Description") + 1},
+        "address": {"header": "Address", "length": len("Address") + 1},
+        "port": {"header": "Port", "length": len("Port") + 1},
+        "scheme": {"header": "Scheme", "length": len("Scheme") + 1},
+        "api_key": {"header": "API Key", "length": len("API Key") + 1},
+    }
+
+    # Parse each connection and adjust field lengths
+    for connection in data:
+        for field, length in [(f, fields[f]["length"]) for f in fields]:
+            _length = len(str(connection[field]))
+            if _length > length:
+                length = len(str(connection[field])) + 1
+
+            fields[field]["length"] = length
+
+    # Create the output object and define the line format
+    output = list()
+    line = "{bold}{name: <{lname}} {desc: <{ldesc}} {addr: <{laddr}} {port: <{lport}} {schm: <{lschm}} {akey: <{lakey}}{end}"
+
+    # Add the header line
+    output.append(
+        line.format(
+            bold=ansii["bold"],
+            end=ansii["end"],
+            name=fields["name"]["header"],
+            lname=fields["name"]["length"],
+            desc=fields["description"]["header"],
+            ldesc=fields["description"]["length"],
+            addr=fields["address"]["header"],
+            laddr=fields["address"]["length"],
+            port=fields["port"]["header"],
+            lport=fields["port"]["length"],
+            schm=fields["scheme"]["header"],
+            lschm=fields["scheme"]["length"],
+            akey=fields["api_key"]["header"],
+            lakey=fields["api_key"]["length"],
+        )
+    )
+
+    # Add a line per connection
+    for connection in data:
+        output.append(
+            line.format(
+                bold="",
+                end="",
+                name=connection["name"],
+                lname=fields["name"]["length"],
+                desc=connection["description"],
+                ldesc=fields["description"]["length"],
+                addr=connection["address"],
+                laddr=fields["address"]["length"],
+                port=connection["port"],
+                lport=fields["port"]["length"],
+                schm=connection["scheme"],
+                lschm=fields["scheme"]["length"],
+                akey=connection["api_key"],
+                lakey=fields["api_key"]["length"],
+            )
+        )
+
+    return "\n".join(output)
+
+
+def cli_connection_detail_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_connection_detail
+    """
+
+    # Set the fields data
+    fields = {
+        "name": {"header": "Name", "length": len("Name") + 1},
+        "description": {"header": "Description", "length": len("Description") + 1},
+        "health": {"header": "Health", "length": len("Health") + 1},
+        "primary_node": {"header": "Primary", "length": len("Primary") + 1},
+        "pvc_version": {"header": "Version", "length": len("Version") + 1},
+        "nodes": {"header": "Nodes", "length": len("Nodes") + 1},
+        "vms": {"header": "VMs", "length": len("VMs") + 1},
+        "networks": {"header": "Networks", "length": len("Networks") + 1},
+        "osds": {"header": "OSDs", "length": len("OSDs") + 1},
+        "pools": {"header": "Pools", "length": len("Pools") + 1},
+        "volumes": {"header": "Volumes", "length": len("Volumes") + 1},
+        "snapshots": {"header": "Snapshots", "length": len("Snapshots") + 1},
+    }
+
+    # Parse each connection and adjust field lengths
+    for connection in data:
+        for field, length in [(f, fields[f]["length"]) for f in fields]:
+            _length = len(str(connection[field]))
+            if _length > length:
+                length = len(str(connection[field])) + 1
+
+            fields[field]["length"] = length
+
+    # Create the output object and define the line format
+    output = list()
+    line = "{bold}{name: <{lname}} {desc: <{ldesc}} {chlth}{hlth: <{lhlth}}{endc} {prin: <{lprin}} {vers: <{lvers}} {nods: <{lnods}} {vms: <{lvms}} {nets: <{lnets}} {osds: <{losds}} {pols: <{lpols}} {vols: <{lvols}} {snts: <{lsnts}}{end}"
+
+    # Add the header line
+    output.append(
+        line.format(
+            bold=ansii["bold"],
+            end=ansii["end"],
+            chlth="",
+            endc="",
+            name=fields["name"]["header"],
+            lname=fields["name"]["length"],
+            desc=fields["description"]["header"],
+            ldesc=fields["description"]["length"],
+            hlth=fields["health"]["header"],
+            lhlth=fields["health"]["length"],
+            prin=fields["primary_node"]["header"],
+            lprin=fields["primary_node"]["length"],
+            vers=fields["pvc_version"]["header"],
+            lvers=fields["pvc_version"]["length"],
+            nods=fields["nodes"]["header"],
+            lnods=fields["nodes"]["length"],
+            vms=fields["vms"]["header"],
+            lvms=fields["vms"]["length"],
+            nets=fields["networks"]["header"],
+            lnets=fields["networks"]["length"],
+            osds=fields["osds"]["header"],
+            losds=fields["osds"]["length"],
+            pols=fields["pools"]["header"],
+            lpols=fields["pools"]["length"],
+            vols=fields["volumes"]["header"],
+            lvols=fields["volumes"]["length"],
+            snts=fields["snapshots"]["header"],
+            lsnts=fields["snapshots"]["length"],
+        )
+    )
+
+    # Add a line per connection
+    for connection in data:
+        if connection["health"] == "N/A":
+            health_value = "N/A"
+            health_colour = ansii["purple"]
+        else:
+            health_value = f"{connection['health']}%"
+            if connection["maintenance"] == "true":
+                health_colour = ansii["blue"]
+            elif connection["health"] > 90:
+                health_colour = ansii["green"]
+            elif connection["health"] > 50:
+                health_colour = ansii["yellow"]
+            else:
+                health_colour = ansii["red"]
+
+        output.append(
+            line.format(
+                bold="",
+                end="",
+                chlth=health_colour,
+                endc=ansii["end"],
+                name=connection["name"],
+                lname=fields["name"]["length"],
+                desc=connection["description"],
+                ldesc=fields["description"]["length"],
+                hlth=health_value,
+                lhlth=fields["health"]["length"],
+                prin=connection["primary_node"],
+                lprin=fields["primary_node"]["length"],
+                vers=connection["pvc_version"],
+                lvers=fields["pvc_version"]["length"],
+                nods=connection["nodes"],
+                lnods=fields["nodes"]["length"],
+                vms=connection["vms"],
+                lvms=fields["vms"]["length"],
+                nets=connection["networks"],
+                lnets=fields["networks"]["length"],
+                osds=connection["osds"],
+                losds=fields["osds"]["length"],
+                pols=connection["pools"],
+                lpols=fields["pools"]["length"],
+                vols=connection["volumes"],
+                lvols=fields["volumes"]["length"],
+                snts=connection["snapshots"],
+                lsnts=fields["snapshots"]["length"],
+            )
+        )
+
+    return "\n".join(output)
+
+
+def cli_node_info_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the basic output of cli_node_info
+    """
+
+    return node_format_info(CLI_CONFIG, data, long_output=False)
+
+
+def cli_node_info_format_long(CLI_CONFIG, data):
+    """
+    Pretty format the full output of cli_node_info
+    """
+
+    return node_format_info(CLI_CONFIG, data, long_output=True)
+
+
+def cli_node_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_node_list
+    """
+
+    return node_format_list(CLI_CONFIG, data)
+
+
+def cli_vm_tag_get_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_tag_get
+    """
+
+    return vm_format_tags(CLI_CONFIG, data)
+
+
+def cli_vm_vcpu_get_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_vcpu_get
+    """
+
+    return vm_format_vcpus(CLI_CONFIG, data)
+
+
+def cli_vm_memory_get_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_memory_get
+    """
+
+    return vm_format_memory(CLI_CONFIG, data)
+
+
+def cli_vm_network_get_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_network_get
+    """
+
+    return vm_format_networks(CLI_CONFIG, data)
+
+
+def cli_vm_volume_get_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_volume_get
+    """
+
+    return vm_format_volumes(CLI_CONFIG, data)
+
+
+def cli_vm_info_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the basic output of cli_vm_info
+    """
+
+    return vm_format_info(CLI_CONFIG, data, long_output=False)
+
+
+def cli_vm_info_format_long(CLI_CONFIG, data):
+    """
+    Pretty format the full output of cli_vm_info
+    """
+
+    return vm_format_info(CLI_CONFIG, data, long_output=True)
+
+
+def cli_vm_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_vm_list
+    """
+
+    return vm_format_list(CLI_CONFIG, data)
+
+
+def cli_network_info_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the full output of cli_network_info
+    """
+
+    return network_format_info(CLI_CONFIG, data, long_output=True)
+
+
+def cli_network_info_format_long(CLI_CONFIG, data):
+    """
+    Pretty format the full output of cli_network_info
+    """
+
+    return network_format_info(CLI_CONFIG, data, long_output=True)
+
+
+def cli_network_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_list
+    """
+
+    return network_format_list(CLI_CONFIG, data)
+
+
+def cli_network_dhcp_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_dhcp_list
+    """
+
+    return network_format_dhcp_list(CLI_CONFIG, data)
+
+
+def cli_network_acl_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_acl_list
+    """
+
+    return network_format_acl_list(CLI_CONFIG, data)
+
+
+def cli_network_sriov_pf_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_sriov_pf_list
+    """
+
+    return network_format_sriov_pf_list(CLI_CONFIG, data)
+
+
+def cli_network_sriov_vf_info_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_sriov_vf_info
+    """
+
+    return network_format_sriov_vf_info(CLI_CONFIG, data)
+
+
+def cli_network_sriov_vf_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_network_sriov_vf_list
+    """
+
+    return network_format_sriov_vf_list(CLI_CONFIG, data)
+
+
+def cli_storage_status_format_raw(CLI_CONFIG, data):
+    """
+    Direct format the output of cli_storage_status
+    """
+
+    return storage_format_raw(CLI_CONFIG, data)
+
+
+def cli_storage_util_format_raw(CLI_CONFIG, data):
+    """
+    Direct format the output of cli_storage_util
+    """
+
+    return storage_format_raw(CLI_CONFIG, data)
+
+
+def cli_storage_benchmark_info_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_benchmark_info
+    """
+
+    return storage_format_benchmark_info(CLI_CONFIG, data)
+
+
+def cli_storage_benchmark_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_benchmark_list
+    """
+
+    return storage_format_benchmark_list(CLI_CONFIG, data)
+
+
+def cli_storage_osd_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_osd_list
+    """
+
+    return storage_format_osd_list(CLI_CONFIG, data)
+
+
+def cli_storage_pool_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_pool_list
+    """
+
+    return storage_format_pool_list(CLI_CONFIG, data)
+
+
+def cli_storage_volume_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_volume_list
+    """
+
+    return storage_format_volume_list(CLI_CONFIG, data)
+
+
+def cli_storage_snapshot_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_storage_snapshot_list
+    """
+
+    return storage_format_snapshot_list(CLI_CONFIG, data)
+
+
+def cli_provisioner_template_system_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_template_system_list
+    """
+
+    return provisioner_format_template_list(CLI_CONFIG, data, template_type="system")
+
+
+def cli_provisioner_template_network_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_template_network_list
+    """
+
+    return provisioner_format_template_list(CLI_CONFIG, data, template_type="network")
+
+
+def cli_provisioner_template_storage_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_template_storage_list
+    """
+
+    return provisioner_format_template_list(CLI_CONFIG, data, template_type="storage")
+
+
+def cli_provisioner_userdata_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_userdata_list
+    """
+
+    return provisioner_format_userdata_list(CLI_CONFIG, data)
+
+
+def cli_provisioner_script_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_script_list
+    """
+
+    return provisioner_format_script_list(CLI_CONFIG, data)
+
+
+def cli_provisioner_ova_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_ova_list
+    """
+
+    return provisioner_format_ova_list(CLI_CONFIG, data)
+
+
+def cli_provisioner_profile_list_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_profile_list
+    """
+
+    return provisioner_format_profile_list(CLI_CONFIG, data)
+
+
+def cli_provisioner_status_format_pretty(CLI_CONFIG, data):
+    """
+    Pretty format the output of cli_provisioner_status
+    """
+
+    return provisioner_format_task_status(CLI_CONFIG, data)
--- a/client-cli/pvc/cli/helpers.py
+++ b/client-cli/pvc/cli/helpers.py
@ -0,0 +1,241 @@
+#!/usr/bin/env python3
+
+# helpers.py - PVC Click CLI helper function library
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2023 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from click import echo as click_echo
+from click import progressbar
+from distutils.util import strtobool
+from json import load as jload
+from json import dump as jdump
+from os import chmod, environ, getpid, path
+from socket import gethostname
+from sys import argv
+from syslog import syslog, openlog, closelog, LOG_AUTH
+from time import sleep
+from yaml import load as yload
+from yaml import BaseLoader
+
+import pvc.lib.provisioner
+
+
+DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvcapid.yaml"}
+DEFAULT_STORE_FILENAME = "pvc.json"
+DEFAULT_API_PREFIX = "/api/v1"
+DEFAULT_NODE_HOSTNAME = gethostname().split(".")[0]
+
+
+def echo(config, message, newline=True, stderr=False):
+    """
+    Output a message with click.echo respecting our configuration
+    """
+
+    if config.get("colour", False):
+        colour = True
+    else:
+        colour = None
+
+    if config.get("silent", False):
+        pass
+    elif config.get("quiet", False) and stderr:
+        pass
+    else:
+        click_echo(message=message, color=colour, nl=newline, err=stderr)
+
+
+def audit():
+    """
+    Log an audit message to the local syslog AUTH facility
+    """
+
+    args = argv
+    args[0] = "pvc"
+    pid = getpid()
+
+    openlog(facility=LOG_AUTH, ident=f"{args[0]}[{pid}]")
+    syslog(
+        f"""client audit: command "{' '.join(args)}" by user {environ.get('USER', None)}"""
+    )
+    closelog()
+
+
+def read_config_from_yaml(cfgfile):
+    """
+    Read the PVC API configuration from the local API configuration file
+    """
+
+    try:
+        with open(cfgfile) as fh:
+            api_config = yload(fh, Loader=BaseLoader)["pvc"]["api"]
+
+        host = api_config["listen_address"]
+        port = api_config["listen_port"]
+        scheme = "https" if strtobool(api_config["ssl"]["enabled"]) else "http"
+        api_key = (
+            api_config["authentication"]["tokens"][0]["token"]
+            if strtobool(api_config["authentication"]["enabled"])
+            else None
+        )
+    except KeyError:
+        host = None
+        port = None
+        scheme = None
+        api_key = None
+
+    return cfgfile, host, port, scheme, api_key
+
+
+def get_config(store_data, connection=None):
+    """
+    Load CLI configuration from store data
+    """
+
+    if store_data is None:
+        return {"badcfg": True}
+
+    connection_details = store_data.get(connection, None)
+
+    if not connection_details:
+        connection = "local"
+        connection_details = DEFAULT_STORE_DATA
+
+    if connection_details.get("cfgfile", None) is not None:
+        if path.isfile(connection_details.get("cfgfile", None)):
+            description, host, port, scheme, api_key = read_config_from_yaml(
+                connection_details.get("cfgfile", None)
+            )
+            if None in [description, host, port, scheme]:
+                return {"badcfg": True}
+        else:
+            return {"badcfg": True}
+        # Rewrite a wildcard listener to use localhost instead
+        if host == "0.0.0.0":
+            host = "127.0.0.1"
+    else:
+        # This is a static configuration, get the details directly
+        description = connection_details["description"]
+        host = connection_details["host"]
+        port = connection_details["port"]
+        scheme = connection_details["scheme"]
+        api_key = connection_details["api_key"]
+
+    config = dict()
+    config["debug"] = False
+    config["connection"] = connection
+    config["description"] = description
+    config["api_host"] = f"{host}:{port}"
+    config["api_scheme"] = scheme
+    config["api_key"] = api_key
+    config["api_prefix"] = DEFAULT_API_PREFIX
+    if connection == "local":
+        config["verify_ssl"] = False
+    else:
+        config["verify_ssl"] = bool(
+            strtobool(environ.get("PVC_CLIENT_VERIFY_SSL", "True"))
+        )
+
+    return config
+
+
+def get_store(store_path):
+    """
+    Load store information from the store path
+    """
+
+    store_file = f"{store_path}/{DEFAULT_STORE_FILENAME}"
+
+    with open(store_file) as fh:
+        try:
+            store_data = jload(fh)
+            return store_data
+        except Exception:
+            return dict()
+
+
+def update_store(store_path, store_data):
+    """
+    Update store information to the store path, creating it (with sensible permissions) if needed
+    """
+
+    store_file = f"{store_path}/{DEFAULT_STORE_FILENAME}"
+
+    if not path.exists(store_file):
+        with open(store_file, "w") as fh:
+            fh.write("")
+        chmod(store_file, int(environ.get("PVC_CLIENT_DB_PERMS", "600"), 8))
+
+    with open(store_file, "w") as fh:
+        jdump(store_data, fh, sort_keys=True, indent=4)
+
+
+def wait_for_provisioner(CLI_CONFIG, task_id):
+    """
+    Wait for a provisioner task to complete
+    """
+
+    echo(CLI_CONFIG, f"Task ID: {task_id}")
+    echo(CLI_CONFIG, "")
+
+    # Wait for the task to start
+    echo(CLI_CONFIG, "Waiting for task to start...", newline=False)
+    while True:
+        sleep(1)
+        task_status = pvc.lib.provisioner.task_status(
+            CLI_CONFIG, task_id, is_watching=True
+        )
+        if task_status.get("state") != "PENDING":
+            break
+        echo(".", newline=False)
+    echo(CLI_CONFIG, " done.")
+    echo(CLI_CONFIG, "")
+
+    # Start following the task state, updating progress as we go
+    total_task = task_status.get("total")
+    with progressbar(length=total_task, show_eta=False) as bar:
+        last_task = 0
+        maxlen = 0
+        while True:
+            sleep(1)
+            if task_status.get("state") != "RUNNING":
+                break
+            if task_status.get("current") > last_task:
+                current_task = int(task_status.get("current"))
+                bar.update(current_task - last_task)
+                last_task = current_task
+                # The extensive spaces at the end cause this to overwrite longer previous messages
+                curlen = len(str(task_status.get("status")))
+                if curlen > maxlen:
+                    maxlen = curlen
+                lendiff = maxlen - curlen
+                overwrite_whitespace = " " * lendiff
+                echo(
+                    CLI_CONFIG,
+                    "  " + task_status.get("status") + overwrite_whitespace,
+                    newline=False,
+                )
+            task_status = pvc.lib.provisioner.task_status(
+                CLI_CONFIG, task_id, is_watching=True
+            )
+        if task_status.get("state") == "SUCCESS":
+            bar.update(total_task - last_task)
+
+    echo(CLI_CONFIG, "")
+    retdata = task_status.get("state") + ": " + task_status.get("status")
+
+    return retdata
--- a/client-cli/pvc/cli/parsers.py
+++ b/client-cli/pvc/cli/parsers.py
@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+
+# parsers.py - PVC Click CLI data parser function library
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2023 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from os import path
+from re import sub
+
+from pvc.cli.helpers import read_config_from_yaml, get_config
+
+import pvc.lib.cluster
+
+
+def cli_connection_list_parser(connections_config, show_keys_flag):
+    """
+    Parse connections_config into formatable data for cli_connection_list
+    """
+
+    connections_data = list()
+
+    for connection, details in connections_config.items():
+        if details.get("cfgfile", None) is not None:
+            if path.isfile(details.get("cfgfile")):
+                description, address, port, scheme, api_key = read_config_from_yaml(
+                    details.get("cfgfile")
+                )
+            else:
+                continue
+            if not show_keys_flag and api_key is not None:
+                api_key = sub(r"[a-z0-9]", "x", api_key)
+            connections_data.append(
+                {
+                    "name": connection,
+                    "description": description,
+                    "address": address,
+                    "port": port,
+                    "scheme": scheme,
+                    "api_key": api_key,
+                }
+            )
+        else:
+            if not show_keys_flag:
+                details["api_key"] = sub(r"[a-z0-9]", "x", details["api_key"])
+            connections_data.append(
+                {
+                    "name": connection,
+                    "description": details["description"],
+                    "address": details["host"],
+                    "port": details["port"],
+                    "scheme": details["scheme"],
+                    "api_key": details["api_key"],
+                }
+            )
+
+    return connections_data
+
+
+def cli_connection_detail_parser(connections_config):
+    """
+    Parse connections_config into formatable data for cli_connection_detail
+    """
+    connections_data = list()
+    for connection, details in connections_config.items():
+        cluster_config = get_config(connections_config, connection=connection)
+        if cluster_config.get("badcfg", False):
+            continue
+        # Connect to each API and gather cluster status
+        retcode, retdata = pvc.lib.cluster.get_info(cluster_config)
+        if retcode == 0:
+            # Create dummy data of N/A for all fields
+            connections_data.append(
+                {
+                    "name": cluster_config["connection"],
+                    "description": cluster_config["description"],
+                    "health": "N/A",
+                    "maintenance": "N/A",
+                    "primary_node": "N/A",
+                    "pvc_version": "N/A",
+                    "nodes": "N/A",
+                    "vms": "N/A",
+                    "networks": "N/A",
+                    "osds": "N/A",
+                    "pools": "N/A",
+                    "volumes": "N/A",
+                    "snapshots": "N/A",
+                }
+            )
+        else:
+            # Normalize data into nice formattable version
+            connections_data.append(
+                {
+                    "name": cluster_config["connection"],
+                    "description": cluster_config["description"],
+                    "health": retdata.get("cluster_health", {}).get("health", "N/A"),
+                    "maintenance": retdata.get("maintenance", "N/A"),
+                    "primary_node": retdata.get("primary_node", "N/A"),
+                    "pvc_version": retdata.get("pvc_version", "N/A"),
+                    "nodes": retdata.get("nodes", {}).get("total", "N/A"),
+                    "vms": retdata.get("vms", {}).get("total", "N/A"),
+                    "networks": retdata.get("networks", "N/A"),
+                    "osds": retdata.get("osds", {}).get("total", "N/A"),
+                    "pools": retdata.get("pools", "N/A"),
+                    "volumes": retdata.get("volumes", "N/A"),
+                    "snapshots": retdata.get("snapshots", "N/A"),
+                }
+            )
+
+    return connections_data
--- a/client-cli/pvc/cli/waiters.py
+++ b/client-cli/pvc/cli/waiters.py
@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+
+# waiters.py - PVC Click CLI output waiters library
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2023 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from time import sleep, time
+
+from pvc.cli.helpers import echo
+
+import pvc.lib.node
+
+
+def cli_node_waiter(config, node, state_field, state_value):
+    """
+    Wait for state transitions for cli_node tasks
+
+    {node} is the name of the node
+    {state_field} is the node_info field to check for {state_value}
+    {state_value} is the TRANSITIONAL value that, when no longer set, will terminate waiting
+    """
+
+    # Sleep for this long between API polls
+    sleep_time = 1
+
+    # Print a dot after this many {sleep_time}s
+    dot_time = 5
+
+    t_start = time()
+
+    echo(config, "Waiting...", newline=False)
+    sleep(sleep_time)
+
+    count = 0
+    while True:
+        count += 1
+        try:
+            _retcode, _retdata = pvc.lib.node.node_info(config, node)
+            if _retdata[state_field] != state_value:
+                break
+            else:
+                raise ValueError
+        except Exception:
+            sleep(sleep_time)
+            if count % dot_time == 0:
+                echo(config, ".", newline=False)
+
+    t_end = time()
+    echo(config, f" done. [{int(t_end - t_start)}s]")
--- a/client-cli/pvc/lib/init.py
+++ b/client-cli/pvc/lib/init.py
--- a/client-cli/pvc/lib/ansiprint.py
+++ b/client-cli/pvc/lib/ansiprint.py
@ -0,0 +1,97 @@
+#!/usr/bin/env python3
+
+# ansiprint.py - Printing function for formatted messages
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import datetime
+
+
+# ANSII colours for output
+def red():
+    return "\033[91m"
+
+
+def blue():
+    return "\033[94m"
+
+
+def cyan():
+    return "\033[96m"
+
+
+def green():
+    return "\033[92m"
+
+
+def yellow():
+    return "\033[93m"
+
+
+def purple():
+    return "\033[95m"
+
+
+def bold():
+    return "\033[1m"
+
+
+def end():
+    return "\033[0m"
+
+
+# Print function
+def echo(message, prefix, state):
+    # Get the date
+    date = "{} - ".format(datetime.datetime.now().strftime("%Y/%m/%d %H:%M:%S.%f"))
+    endc = end()
+
+    # Continuation
+    if state == "c":
+        date = ""
+        colour = ""
+        prompt = "    "
+    # OK
+    elif state == "o":
+        colour = green()
+        prompt = ">>> "
+    # Error
+    elif state == "e":
+        colour = red()
+        prompt = ">>> "
+    # Warning
+    elif state == "w":
+        colour = yellow()
+        prompt = ">>> "
+    # Tick
+    elif state == "t":
+        colour = purple()
+        prompt = ">>> "
+    # Information
+    elif state == "i":
+        colour = blue()
+        prompt = ">>> "
+    else:
+        colour = bold()
+        prompt = ">>> "
+
+    # Append space to prefix
+    if prefix != "":
+        prefix = prefix + " "
+
+    print(colour + prompt + endc + date + prefix + message)
--- a/client-cli/pvc/lib/cluster.py
+++ b/client-cli/pvc/lib/cluster.py
@ -0,0 +1,116 @@
+#!/usr/bin/env python3
+
+# cluster.py - PVC CLI client function library, cluster management
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import json
+
+from pvc.lib.common import call_api
+
+
+def initialize(config, overwrite=False):
+    """
+    Initialize the PVC cluster
+
+    API endpoint: GET /api/v1/initialize
+    API arguments: overwrite, yes-i-really-mean-it
+    API schema: {json_data_object}
+    """
+    params = {"yes-i-really-mean-it": "yes", "overwrite": overwrite}
+    response = call_api(config, "post", "/initialize", params=params)
+
+    if response.status_code == 200:
+        retstatus = True
+    else:
+        retstatus = False
+
+    return retstatus, response.json().get("message", "")
+
+
+def backup(config):
+    """
+    Get a JSON backup of the cluster
+
+    API endpoint: GET /api/v1/backup
+    API arguments:
+    API schema: {json_data_object}
+    """
+    response = call_api(config, "get", "/backup")
+
+    if response.status_code == 200:
+        return True, response.json()
+    else:
+        return False, response.json().get("message", "")
+
+
+def restore(config, cluster_data):
+    """
+    Restore a JSON backup to the cluster
+
+    API endpoint: POST /api/v1/restore
+    API arguments: yes-i-really-mean-it
+    API schema: {json_data_object}
+    """
+    cluster_data_json = json.dumps(cluster_data)
+
+    params = {"yes-i-really-mean-it": "yes"}
+    data = {"cluster_data": cluster_data_json}
+    response = call_api(config, "post", "/restore", params=params, data=data)
+
+    if response.status_code == 200:
+        retstatus = True
+    else:
+        retstatus = False
+
+    return retstatus, response.json().get("message", "")
+
+
+def maintenance_mode(config, state):
+    """
+    Enable or disable PVC cluster maintenance mode
+
+    API endpoint: POST /api/v1/status
+    API arguments: {state}={state}
+    API schema: {json_data_object}
+    """
+    params = {"state": state}
+    response = call_api(config, "post", "/status", params=params)
+
+    if response.status_code == 200:
+        retstatus = True
+    else:
+        retstatus = False
+
+    return retstatus, response.json().get("message", "")
+
+
+def get_info(config):
+    """
+    Get status of the PVC cluster
+
+    API endpoint: GET /api/v1/status
+    API arguments:
+    API schema: {json_data_object}
+    """
+    response = call_api(config, "get", "/status")
+
+    if response.status_code == 200:
+        return True, response.json()
+    else:
+        return False, response.json().get("message", "")
--- a/client-cli/pvc/lib/common.py
+++ b/client-cli/pvc/lib/common.py
@ -0,0 +1,201 @@
+#!/usr/bin/env python3
+
+# common.py - PVC CLI client function library, Common functions
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import os
+import math
+import time
+import requests
+import click
+from urllib3 import disable_warnings
+
+
+def format_bytes(size_bytes):
+    byte_unit_matrix = {
+        "B": 1,
+        "K": 1024,
+        "M": 1024 * 1024,
+        "G": 1024 * 1024 * 1024,
+        "T": 1024 * 1024 * 1024 * 1024,
+        "P": 1024 * 1024 * 1024 * 1024 * 1024,
+    }
+    human_bytes = "0B"
+    for unit in sorted(byte_unit_matrix, key=byte_unit_matrix.get):
+        formatted_bytes = int(math.ceil(size_bytes / byte_unit_matrix[unit]))
+        if formatted_bytes < 10000:
+            human_bytes = "{}{}".format(formatted_bytes, unit)
+            break
+    return human_bytes
+
+
+def format_metric(integer):
+    integer_unit_matrix = {
+        "": 1,
+        "K": 1000,
+        "M": 1000 * 1000,
+        "B": 1000 * 1000 * 1000,
+        "T": 1000 * 1000 * 1000 * 1000,
+        "Q": 1000 * 1000 * 1000 * 1000 * 1000,
+    }
+    human_integer = "0"
+    for unit in sorted(integer_unit_matrix, key=integer_unit_matrix.get):
+        formatted_integer = int(math.ceil(integer / integer_unit_matrix[unit]))
+        if formatted_integer < 10000:
+            human_integer = "{}{}".format(formatted_integer, unit)
+            break
+    return human_integer
+
+
+class UploadProgressBar(object):
+    def __init__(self, filename, end_message="", end_nl=True):
+        file_size = os.path.getsize(filename)
+        file_size_human = format_bytes(file_size)
+        click.echo("Uploading file (total size {})...".format(file_size_human))
+
+        self.length = file_size
+        self.time_last = int(round(time.time() * 1000)) - 1000
+        self.bytes_last = 0
+        self.bytes_diff = 0
+        self.is_end = False
+
+        self.end_message = end_message
+        self.end_nl = end_nl
+        if not self.end_nl:
+            self.end_suffix = " "
+        else:
+            self.end_suffix = ""
+
+        self.bar = click.progressbar(length=self.length, show_eta=True)
+
+    def update(self, monitor):
+        bytes_cur = monitor.bytes_read
+        self.bytes_diff += bytes_cur - self.bytes_last
+        if self.bytes_last == bytes_cur:
+            self.is_end = True
+        self.bytes_last = bytes_cur
+
+        time_cur = int(round(time.time() * 1000))
+        if (time_cur - 1000) > self.time_last:
+            self.time_last = time_cur
+            self.bar.update(self.bytes_diff)
+            self.bytes_diff = 0
+
+        if self.is_end:
+            self.bar.update(self.bytes_diff)
+            self.bytes_diff = 0
+            click.echo()
+            click.echo()
+            if self.end_message:
+                click.echo(self.end_message + self.end_suffix, nl=self.end_nl)
+
+
+class ErrorResponse(requests.Response):
+    def __init__(self, json_data, status_code):
+        self.json_data = json_data
+        self.status_code = status_code
+
+    def json(self):
+        return self.json_data
+
+
+def call_api(
+    config,
+    operation,
+    request_uri,
+    headers={},
+    params=None,
+    data=None,
+    files=None,
+):
+    # Set the connect timeout to 2 seconds but extremely long (48 hour) data timeout
+    timeout = (2.05, 172800)
+
+    # Craft the URI
+    uri = "{}://{}{}{}".format(
+        config["api_scheme"], config["api_host"], config["api_prefix"], request_uri
+    )
+
+    # Craft the authentication header if required
+    if config["api_key"]:
+        headers["X-Api-Key"] = config["api_key"]
+
+    # Determine the request type and hit the API
+    disable_warnings()
+    try:
+        if operation == "get":
+            response = requests.get(
+                uri,
+                timeout=timeout,
+                headers=headers,
+                params=params,
+                data=data,
+                verify=config["verify_ssl"],
+            )
+        if operation == "post":
+            response = requests.post(
+                uri,
+                timeout=timeout,
+                headers=headers,
+                params=params,
+                data=data,
+                files=files,
+                verify=config["verify_ssl"],
+            )
+        if operation == "put":
+            response = requests.put(
+                uri,
+                timeout=timeout,
+                headers=headers,
+                params=params,
+                data=data,
+                files=files,
+                verify=config["verify_ssl"],
+            )
+        if operation == "patch":
+            response = requests.patch(
+                uri,
+                timeout=timeout,
+                headers=headers,
+                params=params,
+                data=data,
+                verify=config["verify_ssl"],
+            )
+        if operation == "delete":
+            response = requests.delete(
+                uri,
+                timeout=timeout,
+                headers=headers,
+                params=params,
+                data=data,
+                verify=config["verify_ssl"],
+            )
+    except Exception as e:
+        message = "Failed to connect to the API: {}".format(e)
+        response = ErrorResponse({"message": message}, 500)
+
+    # Display debug output
+    if config["debug"]:
+        click.echo("API endpoint: {}".format(uri), err=True)
+        click.echo("Response code: {}".format(response.status_code), err=True)
+        click.echo("Response headers: {}".format(response.headers), err=True)
+        click.echo(err=True)
+
+    # Return the response object
+    return response
--- a/client-cli/pvc/lib/network.py
+++ b/client-cli/pvc/lib/network.py
--- a/client-cli/pvc/lib/node.py
+++ b/client-cli/pvc/lib/node.py
@ -0,0 +1,706 @@
+#!/usr/bin/env python3
+
+# node.py - PVC CLI client function library, node management
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import time
+
+import pvc.lib.ansiprint as ansiprint
+from pvc.lib.common import call_api
+
+
+#
+# Primary functions
+#
+def node_coordinator_state(config, node, action):
+    """
+    Set node coordinator state state (primary/secondary)
+
+    API endpoint: POST /api/v1/node/{node}/coordinator-state
+    API arguments: action={action}
+    API schema: {"message": "{data}"}
+    """
+    params = {"state": action}
+    response = call_api(
+        config,
+        "post",
+        "/node/{node}/coordinator-state".format(node=node),
+        params=params,
+    )
+
+    if response.status_code == 200:
+        retstatus = True
+    else:
+        retstatus = False
+
+    return retstatus, response.json().get("message", "")
+
+
+def node_domain_state(config, node, action):
+    """
+    Set node domain state state (flush/ready)
+
+    API endpoint: POST /api/v1/node/{node}/domain-state
+    API arguments: action={action}, wait={wait}
+    API schema: {"message": "{data}"}
+    """
+    params = {"state": action}
+    response = call_api(
+        config, "post", "/node/{node}/domain-state".format(node=node), params=params
+    )
+
+    if response.status_code == 200:
+        retstatus = True
+    else:
+        retstatus = False
+
+    return retstatus, response.json().get("message", "")
+
+
+def view_node_log(config, node, lines=100):
+    """
+    Return node log lines from the API (and display them in a pager in the main CLI)
+
+    API endpoint: GET /node/{node}/log
+    API arguments: lines={lines}
+    API schema: {"name":"{node}","data":"{node_log}"}
+    """
+    params = {"lines": lines}
+    response = call_api(
+        config, "get", "/node/{node}/log".format(node=node), params=params
+    )
+
+    if response.status_code != 200:
+        return False, response.json().get("message", "")
+
+    node_log = response.json()["data"]
+
+    # Shrink the log buffer to length lines
+    shrunk_log = node_log.split("\n")[-lines:]
+    loglines = "\n".join(shrunk_log)
+
+    return True, loglines
+
+
+def follow_node_log(config, node, lines=10):
+    """
+    Return and follow node log lines from the API
+
+    API endpoint: GET /node/{node}/log
+    API arguments: lines={lines}
+    API schema: {"name":"{nodename}","data":"{node_log}"}
+    """
+    # We always grab 200 to match the follow call, but only _show_ `lines` number
+    params = {"lines": 200}
+    response = call_api(
+        config, "get", "/node/{node}/log".format(node=node), params=params
+    )
+
+    if response.status_code != 200:
+        return False, response.json().get("message", "")
+
+    # Shrink the log buffer to length lines
+    node_log = response.json()["data"]
+    shrunk_log = node_log.split("\n")[-int(lines) :]
+    loglines = "\n".join(shrunk_log)
+
+    # Print the initial data and begin following
+    print(loglines, end="")
+    print("\n", end="")
+
+    while True:
+        # Grab the next line set (200 is a reasonable number of lines per half-second; any more are skipped)
+        try:
+            params = {"lines": 200}
+            response = call_api(
+                config, "get", "/node/{node}/log".format(node=node), params=params
+            )
+            new_node_log = response.json()["data"]
+        except Exception:
+            break
+        # Split the new and old log strings into constitutent lines
+        old_node_loglines = node_log.split("\n")
+        new_node_loglines = new_node_log.split("\n")
+
+        # Set the node log to the new log value for the next iteration
+        node_log = new_node_log
+
+        # Get the difference between the two sets of lines
+        old_node_loglines_set = set(old_node_loglines)
+        diff_node_loglines = [
+            x for x in new_node_loglines if x not in old_node_loglines_set
+        ]
+
+        # If there's a difference, print it out
+        if len(diff_node_loglines) > 0:
+            print("\n".join(diff_node_loglines), end="")
+            print("\n", end="")
+
+        # Wait half a second
+        time.sleep(0.5)
+
+    return True, ""
+
+
+def node_info(config, node):
+    """
+    Get information about node
+
+    API endpoint: GET /api/v1/node/{node}
+    API arguments:
+    API schema: {json_data_object}
+    """
+    response = call_api(config, "get", "/node/{node}".format(node=node))
+
+    if response.status_code == 200:
+        if isinstance(response.json(), list) and len(response.json()) != 1:
+            # No exact match, return not found
+            return False, "Node not found."
+        else:
+            # Return a single instance if the response is a list
+            if isinstance(response.json(), list):
+                return True, response.json()[0]
+            # This shouldn't happen, but is here just in case
+            else:
+                return True, response.json()
+    else:
+        return False, response.json().get("message", "")
+
+
+def node_list(
+    config, limit, target_daemon_state, target_coordinator_state, target_domain_state
+):
+    """
+    Get list information about nodes (limited by {limit})
+
+    API endpoint: GET /api/v1/node
+    API arguments: limit={limit}
+    API schema: [{json_data_object},{json_data_object},etc.]
+    """
+    params = dict()
+    if limit:
+        params["limit"] = limit
+    if target_daemon_state:
+        params["daemon_state"] = target_daemon_state
+    if target_coordinator_state:
+        params["coordinator_state"] = target_coordinator_state
+    if target_domain_state:
+        params["domain_state"] = target_domain_state
+
+    response = call_api(config, "get", "/node", params=params)
+
+    if response.status_code == 200:
+        return True, response.json()
+    else:
+        return False, response.json().get("message", "")
+
+
+#
+# Output display functions
+#
+def getOutputColours(node_information):
+    node_health = node_information.get("health", "N/A")
+    if isinstance(node_health, int):
+        if node_health <= 50:
+            health_colour = ansiprint.red()
+        elif node_health <= 90:
+            health_colour = ansiprint.yellow()
+        elif node_health <= 100:
+            health_colour = ansiprint.green()
+        else:
+            health_colour = ansiprint.blue()
+    else:
+        health_colour = ansiprint.blue()
+
+    if node_information["daemon_state"] == "run":
+        daemon_state_colour = ansiprint.green()
+    elif node_information["daemon_state"] == "stop":
+        daemon_state_colour = ansiprint.red()
+    elif node_information["daemon_state"] == "shutdown":
+        daemon_state_colour = ansiprint.yellow()
+    elif node_information["daemon_state"] == "init":
+        daemon_state_colour = ansiprint.yellow()
+    elif node_information["daemon_state"] == "dead":
+        daemon_state_colour = ansiprint.red() + ansiprint.bold()
+    else:
+        daemon_state_colour = ansiprint.blue()
+
+    if node_information["coordinator_state"] == "primary":
+        coordinator_state_colour = ansiprint.green()
+    elif node_information["coordinator_state"] == "secondary":
+        coordinator_state_colour = ansiprint.blue()
+    else:
+        coordinator_state_colour = ansiprint.cyan()
+
+    if node_information["domain_state"] == "ready":
+        domain_state_colour = ansiprint.green()
+    else:
+        domain_state_colour = ansiprint.blue()
+
+    if node_information["memory"]["allocated"] > node_information["memory"]["total"]:
+        mem_allocated_colour = ansiprint.yellow()
+    else:
+        mem_allocated_colour = ""
+
+    if node_information["memory"]["provisioned"] > node_information["memory"]["total"]:
+        mem_provisioned_colour = ansiprint.yellow()
+    else:
+        mem_provisioned_colour = ""
+
+    return (
+        health_colour,
+        daemon_state_colour,
+        coordinator_state_colour,
+        domain_state_colour,
+        mem_allocated_colour,
+        mem_provisioned_colour,
+    )
+
+
+def format_info(config, node_information, long_output):
+    (
+        health_colour,
+        daemon_state_colour,
+        coordinator_state_colour,
+        domain_state_colour,
+        mem_allocated_colour,
+        mem_provisioned_colour,
+    ) = getOutputColours(node_information)
+
+    # Format a nice output; do this line-by-line then concat the elements at the end
+    ainformation = []
+    # Basic information
+    ainformation.append(
+        "{}Name:{}                  {}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            node_information["name"],
+        )
+    )
+    ainformation.append(
+        "{}PVC Version:{}           {}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            node_information["pvc_version"],
+        )
+    )
+
+    node_health = node_information.get("health", "N/A")
+    if isinstance(node_health, int):
+        node_health_text = f"{node_health}%"
+    else:
+        node_health_text = node_health
+    ainformation.append(
+        "{}Health:{}                {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            health_colour,
+            node_health_text,
+            ansiprint.end(),
+        )
+    )
+
+    node_health_details = node_information.get("health_details", [])
+    if long_output:
+        node_health_messages = "\n                       ".join(
+            [f"{plugin['name']}: {plugin['message']}" for plugin in node_health_details]
+        )
+    else:
+        node_health_messages = "\n                       ".join(
+            [
+                f"{plugin['name']}: {plugin['message']}"
+                for plugin in node_health_details
+                if int(plugin.get("health_delta", 0)) > 0
+            ]
+        )
+
+    if len(node_health_messages) > 0:
+        ainformation.append(
+            "{}Health Plugin Details:{} {}".format(
+                ansiprint.purple(), ansiprint.end(), node_health_messages
+            )
+        )
+    ainformation.append("")
+
+    ainformation.append(
+        "{}Daemon State:{}          {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            daemon_state_colour,
+            node_information["daemon_state"],
+            ansiprint.end(),
+        )
+    )
+    ainformation.append(
+        "{}Coordinator State:{}     {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            coordinator_state_colour,
+            node_information["coordinator_state"],
+            ansiprint.end(),
+        )
+    )
+    ainformation.append(
+        "{}Domain State:{}          {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            domain_state_colour,
+            node_information["domain_state"],
+            ansiprint.end(),
+        )
+    )
+    if long_output:
+        ainformation.append("")
+        ainformation.append(
+            "{}Architecture:{}          {}".format(
+                ansiprint.purple(), ansiprint.end(), node_information["arch"]
+            )
+        )
+        ainformation.append(
+            "{}Operating System:{}      {}".format(
+                ansiprint.purple(), ansiprint.end(), node_information["os"]
+            )
+        )
+        ainformation.append(
+            "{}Kernel Version:{}        {}".format(
+                ansiprint.purple(), ansiprint.end(), node_information["kernel"]
+            )
+        )
+    ainformation.append("")
+    ainformation.append(
+        "{}Active VM Count:{}       {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["domains_count"]
+        )
+    )
+    ainformation.append(
+        "{}Host CPUs:{}             {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["vcpu"]["total"]
+        )
+    )
+    ainformation.append(
+        "{}vCPUs:{}                 {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["vcpu"]["allocated"]
+        )
+    )
+    ainformation.append(
+        "{}Load:{}                  {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["load"]
+        )
+    )
+    ainformation.append(
+        "{}Total RAM (MiB):{}       {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["memory"]["total"]
+        )
+    )
+    ainformation.append(
+        "{}Used RAM (MiB):{}        {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["memory"]["used"]
+        )
+    )
+    ainformation.append(
+        "{}Free RAM (MiB):{}        {}".format(
+            ansiprint.purple(), ansiprint.end(), node_information["memory"]["free"]
+        )
+    )
+    ainformation.append(
+        "{}Allocated RAM (MiB):{}   {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            mem_allocated_colour,
+            node_information["memory"]["allocated"],
+            ansiprint.end(),
+        )
+    )
+    ainformation.append(
+        "{}Provisioned RAM (MiB):{} {}{}{}".format(
+            ansiprint.purple(),
+            ansiprint.end(),
+            mem_provisioned_colour,
+            node_information["memory"]["provisioned"],
+            ansiprint.end(),
+        )
+    )
+
+    # Join it all together
+    ainformation.append("")
+    return "\n".join(ainformation)
+
+
+def format_list(config, node_list):
+    if node_list == "Node not found.":
+        return node_list
+
+    node_list_output = []
+
+    # Determine optimal column widths
+    node_name_length = 5
+    pvc_version_length = 8
+    health_length = 7
+    daemon_state_length = 7
+    coordinator_state_length = 12
+    domain_state_length = 7
+    domains_count_length = 4
+    cpu_count_length = 6
+    load_length = 5
+    mem_total_length = 6
+    mem_used_length = 5
+    mem_free_length = 5
+    mem_alloc_length = 6
+    mem_prov_length = 5
+    for node_information in node_list:
+        # node_name column
+        _node_name_length = len(node_information["name"]) + 1
+        if _node_name_length > node_name_length:
+            node_name_length = _node_name_length
+        # node_pvc_version column
+        _pvc_version_length = len(node_information.get("pvc_version", "N/A")) + 1
+        if _pvc_version_length > pvc_version_length:
+            pvc_version_length = _pvc_version_length
+        # node_health column
+        node_health = node_information.get("health", "N/A")
+        if isinstance(node_health, int):
+            node_health_text = f"{node_health}%"
+        else:
+            node_health_text = node_health
+        _health_length = len(node_health_text) + 1
+        if _health_length > health_length:
+            health_length = _health_length
+        # daemon_state column
+        _daemon_state_length = len(node_information["daemon_state"]) + 1
+        if _daemon_state_length > daemon_state_length:
+            daemon_state_length = _daemon_state_length
+        # coordinator_state column
+        _coordinator_state_length = len(node_information["coordinator_state"]) + 1
+        if _coordinator_state_length > coordinator_state_length:
+            coordinator_state_length = _coordinator_state_length
+        # domain_state column
+        _domain_state_length = len(node_information["domain_state"]) + 1
+        if _domain_state_length > domain_state_length:
+            domain_state_length = _domain_state_length
+        # domains_count column
+        _domains_count_length = len(str(node_information["domains_count"])) + 1
+        if _domains_count_length > domains_count_length:
+            domains_count_length = _domains_count_length
+        # cpu_count column
+        _cpu_count_length = len(str(node_information["cpu_count"])) + 1
+        if _cpu_count_length > cpu_count_length:
+            cpu_count_length = _cpu_count_length
+        # load column
+        _load_length = len(str(node_information["load"])) + 1
+        if _load_length > load_length:
+            load_length = _load_length
+        # mem_total column
+        _mem_total_length = len(str(node_information["memory"]["total"])) + 1
+        if _mem_total_length > mem_total_length:
+            mem_total_length = _mem_total_length
+        # mem_used column
+        _mem_used_length = len(str(node_information["memory"]["used"])) + 1
+        if _mem_used_length > mem_used_length:
+            mem_used_length = _mem_used_length
+        # mem_free column
+        _mem_free_length = len(str(node_information["memory"]["free"])) + 1
+        if _mem_free_length > mem_free_length:
+            mem_free_length = _mem_free_length
+        # mem_alloc column
+        _mem_alloc_length = len(str(node_information["memory"]["allocated"])) + 1
+        if _mem_alloc_length > mem_alloc_length:
+            mem_alloc_length = _mem_alloc_length
+
+        # mem_prov column
+        _mem_prov_length = len(str(node_information["memory"]["provisioned"])) + 1
+        if _mem_prov_length > mem_prov_length:
+            mem_prov_length = _mem_prov_length
+
+    # Format the string (header)
+    node_list_output.append(
+        "{bold}{node_header: <{node_header_length}} {state_header: <{state_header_length}} {resource_header: <{resource_header_length}} {memory_header: <{memory_header_length}}{end_bold}".format(
+            node_header_length=node_name_length
+            + pvc_version_length
+            + health_length
+            + 2,
+            state_header_length=daemon_state_length
+            + coordinator_state_length
+            + domain_state_length
+            + 2,
+            resource_header_length=domains_count_length
+            + cpu_count_length
+            + load_length
+            + 2,
+            memory_header_length=mem_total_length
+            + mem_used_length
+            + mem_free_length
+            + mem_alloc_length
+            + mem_prov_length
+            + 4,
+            bold=ansiprint.bold(),
+            end_bold=ansiprint.end(),
+            node_header="Nodes "
+            + "".join(
+                [
+                    "-"
+                    for _ in range(
+                        6, node_name_length + pvc_version_length + health_length + 1
+                    )
+                ]
+            ),
+            state_header="States "
+            + "".join(
+                [
+                    "-"
+                    for _ in range(
+                        7,
+                        daemon_state_length
+                        + coordinator_state_length
+                        + domain_state_length
+                        + 1,
+                    )
+                ]
+            ),
+            resource_header="Resources "
+            + "".join(
+                [
+                    "-"
+                    for _ in range(
+                        10, domains_count_length + cpu_count_length + load_length + 1
+                    )
+                ]
+            ),
+            memory_header="Memory (M) "
+            + "".join(
+                [
+                    "-"
+                    for _ in range(
+                        11,
+                        mem_total_length
+                        + mem_used_length
+                        + mem_free_length
+                        + mem_alloc_length
+                        + mem_prov_length
+                        + 3,
+                    )
+                ]
+            ),
+        )
+    )
+
+    node_list_output.append(
+        "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} {node_health: <{health_length}} \
+{daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
+{node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
+{node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {node_mem_allocated: <{mem_alloc_length}} {node_mem_provisioned: <{mem_prov_length}}{end_bold}".format(
+            node_name_length=node_name_length,
+            pvc_version_length=pvc_version_length,
+            health_length=health_length,
+            daemon_state_length=daemon_state_length,
+            coordinator_state_length=coordinator_state_length,
+            domain_state_length=domain_state_length,
+            domains_count_length=domains_count_length,
+            cpu_count_length=cpu_count_length,
+            load_length=load_length,
+            mem_total_length=mem_total_length,
+            mem_used_length=mem_used_length,
+            mem_free_length=mem_free_length,
+            mem_alloc_length=mem_alloc_length,
+            mem_prov_length=mem_prov_length,
+            bold=ansiprint.bold(),
+            end_bold=ansiprint.end(),
+            daemon_state_colour="",
+            coordinator_state_colour="",
+            domain_state_colour="",
+            end_colour="",
+            node_name="Name",
+            node_pvc_version="Version",
+            node_health="Health",
+            node_daemon_state="Daemon",
+            node_coordinator_state="Coordinator",
+            node_domain_state="Domain",
+            node_domains_count="VMs",
+            node_cpu_count="vCPUs",
+            node_load="Load",
+            node_mem_total="Total",
+            node_mem_used="Used",
+            node_mem_free="Free",
+            node_mem_allocated="Alloc",
+            node_mem_provisioned="Prov",
+        )
+    )
+
+    # Format the string (elements)
+    for node_information in sorted(node_list, key=lambda n: n["name"]):
+        (
+            health_colour,
+            daemon_state_colour,
+            coordinator_state_colour,
+            domain_state_colour,
+            mem_allocated_colour,
+            mem_provisioned_colour,
+        ) = getOutputColours(node_information)
+
+        node_health = node_information.get("health", "N/A")
+        if isinstance(node_health, int):
+            node_health_text = f"{node_health}%"
+        else:
+            node_health_text = node_health
+
+        node_list_output.append(
+            "{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} {health_colour}{node_health: <{health_length}}{end_colour} \
+{daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
+{node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
+{node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {mem_allocated_colour}{node_mem_allocated: <{mem_alloc_length}}{end_colour} {mem_provisioned_colour}{node_mem_provisioned: <{mem_prov_length}}{end_colour}{end_bold}".format(
+                node_name_length=node_name_length,
+                pvc_version_length=pvc_version_length,
+                health_length=health_length,
+                daemon_state_length=daemon_state_length,
+                coordinator_state_length=coordinator_state_length,
+                domain_state_length=domain_state_length,
+                domains_count_length=domains_count_length,
+                cpu_count_length=cpu_count_length,
+                load_length=load_length,
+                mem_total_length=mem_total_length,
+                mem_used_length=mem_used_length,
+                mem_free_length=mem_free_length,
+                mem_alloc_length=mem_alloc_length,
+                mem_prov_length=mem_prov_length,
+                bold="",
+                end_bold="",
+                health_colour=health_colour,
+                daemon_state_colour=daemon_state_colour,
+                coordinator_state_colour=coordinator_state_colour,
+                domain_state_colour=domain_state_colour,
+                mem_allocated_colour=mem_allocated_colour,
+                mem_provisioned_colour=mem_allocated_colour,
+                end_colour=ansiprint.end(),
+                node_name=node_information["name"],
+                node_pvc_version=node_information.get("pvc_version", "N/A"),
+                node_health=node_health_text,
+                node_daemon_state=node_information["daemon_state"],
+                node_coordinator_state=node_information["coordinator_state"],
+                node_domain_state=node_information["domain_state"],
+                node_domains_count=node_information["domains_count"],
+                node_cpu_count=node_information["vcpu"]["allocated"],
+                node_load=node_information["load"],
+                node_mem_total=node_information["memory"]["total"],
+                node_mem_used=node_information["memory"]["used"],
+                node_mem_free=node_information["memory"]["free"],
+                node_mem_allocated=node_information["memory"]["allocated"],
+                node_mem_provisioned=node_information["memory"]["provisioned"],
+            )
+        )
+
+    return "\n".join(node_list_output)
--- a/client-cli/pvc/lib/provisioner.py
+++ b/client-cli/pvc/lib/provisioner.py
--- a/client-cli/pvc/lib/storage.py
+++ b/client-cli/pvc/lib/storage.py
--- a/client-cli/pvc/lib/vm.py
+++ b/client-cli/pvc/lib/vm.py
--- a/client-cli/pvc/lib/zkhandler.py
+++ b/client-cli/pvc/lib/zkhandler.py
@ -0,0 +1,102 @@
+#!/usr/bin/env python3
+
+# zkhandler.py - Secure versioned ZooKeeper updates
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import uuid
+
+
+# Exists function
+def exists(zk_conn, key):
+    stat = zk_conn.exists(key)
+    if stat:
+        return True
+    else:
+        return False
+
+
+# Child list function
+def listchildren(zk_conn, key):
+    children = zk_conn.get_children(key)
+    return children
+
+
+# Delete key function
+def deletekey(zk_conn, key, recursive=True):
+    zk_conn.delete(key, recursive=recursive)
+
+
+# Data read function
+def readdata(zk_conn, key):
+    data_raw = zk_conn.get(key)
+    data = data_raw[0].decode("utf8")
+    return data
+
+
+# Data write function
+def writedata(zk_conn, kv):
+    # Start up a transaction
+    zk_transaction = zk_conn.transaction()
+
+    # Proceed one KV pair at a time
+    for key in sorted(kv):
+        data = kv[key]
+
+        # Check if this key already exists or not
+        if not zk_conn.exists(key):
+            # We're creating a new key
+            zk_transaction.create(key, str(data).encode("utf8"))
+        else:
+            # We're updating a key with version validation
+            orig_data = zk_conn.get(key)
+            version = orig_data[1].version
+
+            # Set what we expect the new version to be
+            new_version = version + 1
+
+            # Update the data
+            zk_transaction.set_data(key, str(data).encode("utf8"))
+
+            # Set up the check
+            try:
+                zk_transaction.check(key, new_version)
+            except TypeError:
+                print('Zookeeper key "{}" does not match expected version'.format(key))
+                return False
+
+    # Commit the transaction
+    try:
+        zk_transaction.commit()
+        return True
+    except Exception:
+        return False
+
+
+# Write lock function
+def writelock(zk_conn, key):
+    lock_id = str(uuid.uuid1())
+    lock = zk_conn.WriteLock("{}".format(key), lock_id)
+    return lock
+
+
+# Read lock function
+def readlock(zk_conn, key):
+    lock_id = str(uuid.uuid1())
+    lock = zk_conn.ReadLock("{}".format(key), lock_id)
+    return lock
--- a/client-cli/setup.py
+++ b/client-cli/setup.py
@ -2,8 +2,8 @@ from setuptools import setup

 setup(
    name="pvc",
-    version="0.9.60",
-    packages=["pvc", "pvc.cli_lib"],
+    version="0.9.65",
+    packages=["pvc.cli", "pvc.lib"],
    install_requires=[
        "Click",
        "PyYAML",
@ -14,7 +14,7 @@ setup(
    ],
    entry_points={
        "console_scripts": [
-            "pvc = pvc.pvc:cli",
+            "pvc = pvc.cli.cli:cli",
        ],
    },
 )
--- a/daemon-common/ceph.py
+++ b/daemon-common/ceph.py
@ -73,6 +73,11 @@ byte_unit_matrix = {
    "G": 1024 * 1024 * 1024,
    "T": 1024 * 1024 * 1024 * 1024,
    "P": 1024 * 1024 * 1024 * 1024 * 1024,
+    "E": 1024 * 1024 * 1024 * 1024 * 1024 * 1024,
+    "Z": 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024,
+    "Y": 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024,
+    "R": 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024,
+    "Q": 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024 * 1024,
 }

 # Matrix of human-to-metric values
@ -83,6 +88,11 @@ ops_unit_matrix = {
    "G": 1000 * 1000 * 1000,
    "T": 1000 * 1000 * 1000 * 1000,
    "P": 1000 * 1000 * 1000 * 1000 * 1000,
+    "E": 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
+    "Z": 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
+    "Y": 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
+    "R": 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
+    "Q": 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000 * 1000,
 }


@ -103,14 +113,18 @@ def format_bytes_tohuman(databytes):


 def format_bytes_fromhuman(datahuman):
-    # Trim off human-readable character
-    dataunit = str(datahuman)[-1]
-    datasize = int(str(datahuman)[:-1])
-    if not re.match(r"[A-Z]", dataunit):
+    if not re.search(r"[A-Za-z]+", datahuman):
        dataunit = "B"
        datasize = int(datahuman)
-    databytes = datasize * byte_unit_matrix[dataunit]
-    return databytes
+    else:
+        dataunit = str(re.match(r"[0-9]+([A-Za-z])[iBb]*", datahuman).group(1))
+        datasize = int(re.match(r"([0-9]+)[A-Za-z]+", datahuman).group(1))
+
+    if byte_unit_matrix.get(dataunit):
+        databytes = datasize * byte_unit_matrix[dataunit]
+        return databytes
+    else:
+        return None


 # Format ops sizes to/from human-readable units
@ -158,6 +172,19 @@ def get_status(zkhandler):
    return True, status_data


+def get_health(zkhandler):
+    primary_node = zkhandler.read("base.config.primary_node")
+    ceph_health = zkhandler.read("base.storage.health").rstrip()
+
+    # Create a data structure for the information
+    status_data = {
+        "type": "health",
+        "primary_node": primary_node,
+        "ceph_data": ceph_health,
+    }
+    return True, status_data
+
+
 def get_util(zkhandler):
    primary_node = zkhandler.read("base.config.primary_node")
    ceph_df = zkhandler.read("base.storage.util").rstrip()
@ -718,22 +745,26 @@ def getVolumeInformation(zkhandler, pool, volume):


 def add_volume(zkhandler, pool, name, size):
-    # Add 'B' if the volume is in bytes
-    if re.match(r"^[0-9]+$", size):
-        size = "{}B".format(size)
-
    # 1. Verify the size of the volume
    pool_information = getPoolInformation(zkhandler, pool)
    size_bytes = format_bytes_fromhuman(size)
+    if size_bytes is None:
+        return (
+            False,
+            f"ERROR: Requested volume size '{size}' does not have a valid SI unit",
+        )
+
    if size_bytes >= int(pool_information["stats"]["free_bytes"]):
        return (
            False,
-            "ERROR: Requested volume size is greater than the available free space in the pool",
+            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the available free space in the pool ('{format_bytes_tohuman(pool_information['stats']['free_bytes'])}')",
        )

    # 2. Create the volume
    retcode, stdout, stderr = common.run_os_command(
-        "rbd create --size {} {}/{}".format(size, pool, name)
+        "rbd create --size {} {}/{}".format(
+            format_bytes_tohuman(size_bytes), pool, name
+        )
    )
    if retcode:
        return False, 'ERROR: Failed to create RBD volume "{}": {}'.format(name, stderr)
@ -753,7 +784,9 @@ def add_volume(zkhandler, pool, name, size):
        ]
    )

-    return True, 'Created RBD volume "{}/{}" ({}).'.format(pool, name, size)
+    return True, 'Created RBD volume "{}" of size "{}" in pool "{}".'.format(
+        name, format_bytes_tohuman(size_bytes), pool
+    )


 def clone_volume(zkhandler, pool, name_src, name_new):
@ -800,28 +833,32 @@ def resize_volume(zkhandler, pool, name, size):
            name, pool
        )

-    # Add 'B' if the volume is in bytes
-    if re.match(r"^[0-9]+$", size):
-        size = "{}B".format(size)
-
    # 1. Verify the size of the volume
    pool_information = getPoolInformation(zkhandler, pool)
    size_bytes = format_bytes_fromhuman(size)
+    if size_bytes is None:
+        return (
+            False,
+            f"ERROR: Requested volume size '{size}' does not have a valid SI unit",
+        )
+
    if size_bytes >= int(pool_information["stats"]["free_bytes"]):
        return (
            False,
-            "ERROR: Requested volume size is greater than the available free space in the pool",
+            f"ERROR: Requested volume size '{format_bytes_tohuman(size_bytes)}' is greater than the available free space in the pool ('{format_bytes_tohuman(pool_information['stats']['free_bytes'])}')",
        )

    # 2. Resize the volume
    retcode, stdout, stderr = common.run_os_command(
-        "rbd resize --size {} {}/{}".format(size, pool, name)
+        "rbd resize --size {} {}/{}".format(
+            format_bytes_tohuman(size_bytes), pool, name
+        )
    )
    if retcode:
        return (
            False,
            'ERROR: Failed to resize RBD volume "{}" to size "{}" in pool "{}": {}'.format(
-                name, size, pool, stderr
+                name, format_bytes_tohuman(size_bytes), pool, stderr
            ),
        )

@ -847,7 +884,7 @@ def resize_volume(zkhandler, pool, name, size):
            if target_vm_conn:
                target_vm_conn.blockResize(
                    volume_id,
-                    format_bytes_fromhuman(size),
+                    size_bytes,
                    libvirt.VIR_DOMAIN_BLOCK_RESIZE_BYTES,
                )
            target_lv_conn.close()
@ -870,7 +907,7 @@ def resize_volume(zkhandler, pool, name, size):
    )

    return True, 'Resized RBD volume "{}" to size "{}" in pool "{}".'.format(
-        name, size, pool
+        name, format_bytes_tohuman(size_bytes), pool
    )


--- a/daemon-common/cluster.py
+++ b/daemon-common/cluster.py
@ -19,7 +19,7 @@
 #
 ###############################################################################

-import re
+from json import loads

 import daemon_lib.common as common
 import daemon_lib.vm as pvc_vm
@ -44,17 +44,179 @@ def set_maintenance(zkhandler, maint_state):
        return True, "Successfully set cluster in normal mode"


+def getClusterHealth(zkhandler, node_list, vm_list, ceph_osd_list):
+    health_delta_map = {
+        "node_stopped": 50,
+        "node_flushed": 10,
+        "vm_stopped": 10,
+        "osd_out": 50,
+        "osd_down": 10,
+        "osd_full": 50,
+        "osd_nearfull": 10,
+        "memory_overprovisioned": 50,
+        "ceph_err": 50,
+        "ceph_warn": 10,
+    }
+
+    # Generate total cluster health numbers
+    cluster_health_value = 100
+    cluster_health_messages = list()
+
+    for index, node in enumerate(node_list):
+        # Apply node health values to total health number
+        try:
+            node_health_int = int(node["health"])
+        except Exception:
+            node_health_int = 100
+        cluster_health_value -= 100 - node_health_int
+
+        for entry in node["health_details"]:
+            if entry["health_delta"] > 0:
+                cluster_health_messages.append(
+                    f"{node['name']}: plugin '{entry['name']}': {entry['message']}"
+                )
+
+        # Handle unhealthy node states
+        if node["daemon_state"] not in ["run"]:
+            cluster_health_value -= health_delta_map["node_stopped"]
+            cluster_health_messages.append(
+                f"cluster: Node {node['name']} in {node['daemon_state'].upper()} daemon state"
+            )
+        elif node["domain_state"] not in ["ready"]:
+            cluster_health_value -= health_delta_map["node_flushed"]
+            cluster_health_messages.append(
+                f"cluster: Node {node['name']} in {node['domain_state'].upper()} domain state"
+            )
+
+    for index, vm in enumerate(vm_list):
+        # Handle unhealthy VM states
+        if vm["state"] in ["stop", "fail"]:
+            cluster_health_value -= health_delta_map["vm_stopped"]
+            cluster_health_messages.append(
+                f"cluster: VM {vm['name']} in {vm['state'].upper()} state"
+            )
+
+    for index, ceph_osd in enumerate(ceph_osd_list):
+        in_texts = {1: "in", 0: "out"}
+        up_texts = {1: "up", 0: "down"}
+
+        # Handle unhealthy OSD states
+        if in_texts[ceph_osd["stats"]["in"]] not in ["in"]:
+            cluster_health_value -= health_delta_map["osd_out"]
+            cluster_health_messages.append(
+                f"cluster: Ceph OSD {ceph_osd['id']} in {in_texts[ceph_osd['stats']['in']].upper()} state"
+            )
+        elif up_texts[ceph_osd["stats"]["up"]] not in ["up"]:
+            cluster_health_value -= health_delta_map["osd_down"]
+            cluster_health_messages.append(
+                f"cluster: Ceph OSD {ceph_osd['id']} in {up_texts[ceph_osd['stats']['up']].upper()} state"
+            )
+
+        # Handle full or nearfull OSDs (>85%)
+        if ceph_osd["stats"]["utilization"] >= 90:
+            cluster_health_value -= health_delta_map["osd_full"]
+            cluster_health_messages.append(
+                f"cluster: Ceph OSD {ceph_osd['id']} is FULL ({ceph_osd['stats']['utilization']:.1f}% > 90%)"
+            )
+        elif ceph_osd["stats"]["utilization"] >= 85:
+            cluster_health_value -= health_delta_map["osd_nearfull"]
+            cluster_health_messages.append(
+                f"cluster: Ceph OSD {ceph_osd['id']} is NEARFULL ({ceph_osd['stats']['utilization']:.1f}% > 85%)"
+            )
+
+    # Check for (n-1) overprovisioning
+    #   Assume X nodes. If the total VM memory allocation (counting only running VMss) is greater than
+    #   the total memory of the (n-1) smallest nodes, trigger this warning.
+    n_minus_1_total = 0
+    alloc_total = 0
+    node_largest_index = None
+    node_largest_count = 0
+    for index, node in enumerate(node_list):
+        node_mem_total = node["memory"]["total"]
+        node_mem_alloc = node["memory"]["allocated"]
+        alloc_total += node_mem_alloc
+        # Determine if this node is the largest seen so far
+        if node_mem_total > node_largest_count:
+            node_largest_index = index
+            node_largest_count = node_mem_total
+    n_minus_1_node_list = list()
+    for index, node in enumerate(node_list):
+        if index == node_largest_index:
+            continue
+        n_minus_1_node_list.append(node)
+    for index, node in enumerate(n_minus_1_node_list):
+        n_minus_1_total += node["memory"]["total"]
+    if alloc_total > n_minus_1_total:
+        cluster_health_value -= health_delta_map["memory_overprovisioned"]
+        cluster_health_messages.append(
+            f"cluster: Total memory is OVERPROVISIONED ({alloc_total} > {n_minus_1_total} @ N-1)"
+        )
+
+    # Check Ceph cluster health
+    ceph_health = loads(zkhandler.read("base.storage.health"))
+    ceph_health_status = ceph_health["status"]
+    ceph_health_entries = ceph_health["checks"].keys()
+
+    ceph_health_status_map = {
+        "HEALTH_ERR": "ERROR",
+        "HEALTH_WARN": "WARNING",
+    }
+    for entry in ceph_health_entries:
+        cluster_health_messages.append(
+            f"cluster: Ceph {ceph_health_status_map[ceph_health['checks'][entry]['severity']]} {entry}: {ceph_health['checks'][entry]['summary']['message']}"
+        )
+
+    if ceph_health_status == "HEALTH_ERR":
+        cluster_health_value -= health_delta_map["ceph_err"]
+    elif ceph_health_status == "HEALTH_WARN":
+        cluster_health_value -= health_delta_map["ceph_warn"]
+
+    if cluster_health_value < 0:
+        cluster_health_value = 0
+
+    cluster_health = {
+        "health": cluster_health_value,
+        "messages": cluster_health_messages,
+    }
+
+    return cluster_health
+
+
+def getNodeHealth(zkhandler, node_list):
+    node_health = dict()
+    for index, node in enumerate(node_list):
+        node_health_messages = list()
+        node_health_value = node["health"]
+        for entry in node["health_details"]:
+            if entry["health_delta"] > 0:
+                node_health_messages.append(f"'{entry['name']}': {entry['message']}")
+
+        node_health_entry = {
+            "health": node_health_value,
+            "messages": node_health_messages,
+        }
+
+        node_health[node["name"]] = node_health_entry
+
+    return node_health
+
+
 def getClusterInformation(zkhandler):
    # Get cluster maintenance state
-    maint_state = zkhandler.read("base.config.maintenance")
-
-    # List of messages to display to the clients
-    cluster_health_msg = []
-    storage_health_msg = []
+    maintenance_state = zkhandler.read("base.config.maintenance")

    # Get node information object list
    retcode, node_list = pvc_node.get_list(zkhandler, None)

+    # Get primary node
+    primary_node = common.getPrimaryNode(zkhandler)
+
+    # Get PVC version of primary node
+    pvc_version = "0.0.0"
+    for node in node_list:
+        if node["name"] == primary_node:
+            pvc_version = node["pvc_version"]
+
    # Get vm information object list
    retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None, None)

@ -78,135 +240,6 @@ def getClusterInformation(zkhandler):
    ceph_volume_count = len(ceph_volume_list)
    ceph_snapshot_count = len(ceph_snapshot_list)

-    # Determinations for general cluster health
-    cluster_healthy_status = True
-    # Check for (n-1) overprovisioning
-    #   Assume X nodes. If the total VM memory allocation (counting only running VMss) is greater than
-    #   the total memory of the (n-1) smallest nodes, trigger this warning.
-    n_minus_1_total = 0
-    alloc_total = 0
-
-    node_largest_index = None
-    node_largest_count = 0
-    for index, node in enumerate(node_list):
-        node_mem_total = node["memory"]["total"]
-        node_mem_alloc = node["memory"]["allocated"]
-        alloc_total += node_mem_alloc
-
-        # Determine if this node is the largest seen so far
-        if node_mem_total > node_largest_count:
-            node_largest_index = index
-            node_largest_count = node_mem_total
-    n_minus_1_node_list = list()
-    for index, node in enumerate(node_list):
-        if index == node_largest_index:
-            continue
-        n_minus_1_node_list.append(node)
-    for index, node in enumerate(n_minus_1_node_list):
-        n_minus_1_total += node["memory"]["total"]
-    if alloc_total > n_minus_1_total:
-        cluster_healthy_status = False
-        cluster_health_msg.append(
-            "Total VM memory ({}) is overprovisioned (max {}) for (n-1) failure scenarios".format(
-                alloc_total, n_minus_1_total
-            )
-        )
-
-    # Determinations for node health
-    node_healthy_status = list(range(0, node_count))
-    node_report_status = list(range(0, node_count))
-    for index, node in enumerate(node_list):
-        daemon_state = node["daemon_state"]
-        domain_state = node["domain_state"]
-        if daemon_state != "run" and domain_state != "ready":
-            node_healthy_status[index] = False
-            cluster_health_msg.append(
-                "Node '{}' in {},{} state".format(
-                    node["name"], daemon_state, domain_state
-                )
-            )
-        else:
-            node_healthy_status[index] = True
-        node_report_status[index] = daemon_state + "," + domain_state
-
-    # Determinations for VM health
-    vm_healthy_status = list(range(0, vm_count))
-    vm_report_status = list(range(0, vm_count))
-    for index, vm in enumerate(vm_list):
-        vm_state = vm["state"]
-        if vm_state not in ["start", "disable", "migrate", "unmigrate", "provision"]:
-            vm_healthy_status[index] = False
-            cluster_health_msg.append(
-                "VM '{}' in {} state".format(vm["name"], vm_state)
-            )
-        else:
-            vm_healthy_status[index] = True
-        vm_report_status[index] = vm_state
-
-    # Determinations for OSD health
-    ceph_osd_healthy_status = list(range(0, ceph_osd_count))
-    ceph_osd_report_status = list(range(0, ceph_osd_count))
-    for index, ceph_osd in enumerate(ceph_osd_list):
-        try:
-            ceph_osd_up = ceph_osd["stats"]["up"]
-        except KeyError:
-            ceph_osd_up = 0
-
-        try:
-            ceph_osd_in = ceph_osd["stats"]["in"]
-        except KeyError:
-            ceph_osd_in = 0
-
-        up_texts = {1: "up", 0: "down"}
-        in_texts = {1: "in", 0: "out"}
-
-        if not ceph_osd_up or not ceph_osd_in:
-            ceph_osd_healthy_status[index] = False
-            cluster_health_msg.append(
-                "OSD {} in {},{} state".format(
-                    ceph_osd["id"], up_texts[ceph_osd_up], in_texts[ceph_osd_in]
-                )
-            )
-        else:
-            ceph_osd_healthy_status[index] = True
-        ceph_osd_report_status[index] = (
-            up_texts[ceph_osd_up] + "," + in_texts[ceph_osd_in]
-        )
-
-    # Find out the overall cluster health; if any element of a healthy_status is false, it's unhealthy
-    if maint_state == "true":
-        cluster_health = "Maintenance"
-    elif (
-        cluster_healthy_status is False
-        or False in node_healthy_status
-        or False in vm_healthy_status
-        or False in ceph_osd_healthy_status
-    ):
-        cluster_health = "Degraded"
-    else:
-        cluster_health = "Optimal"
-
-    # Find out our storage health from Ceph
-    ceph_status = zkhandler.read("base.storage").split("\n")
-    ceph_health = ceph_status[2].split()[-1]
-
-    # Parse the status output to get the health indicators
-    line_record = False
-    for index, line in enumerate(ceph_status):
-        if re.search("services:", line):
-            line_record = False
-        if line_record and len(line.strip()) > 0:
-            storage_health_msg.append(line.strip())
-        if re.search("health:", line):
-            line_record = True
-
-    if maint_state == "true":
-        storage_health = "Maintenance"
-    elif ceph_health != "HEALTH_OK":
-        storage_health = "Degraded"
-    else:
-        storage_health = "Optimal"
-
    # State lists
    node_state_combinations = [
        "run,ready",
@ -237,13 +270,19 @@ def getClusterInformation(zkhandler):
        "unmigrate",
        "provision",
    ]
-    ceph_osd_state_combinations = ["up,in", "up,out", "down,in", "down,out"]
+    ceph_osd_state_combinations = [
+        "up,in",
+        "up,out",
+        "down,in",
+        "down,out",
+    ]

    # Format the Node states
    formatted_node_states = {"total": node_count}
    for state in node_state_combinations:
        state_count = 0
-        for node_state in node_report_status:
+        for node in node_list:
+            node_state = f"{node['daemon_state']},{node['domain_state']}"
            if node_state == state:
                state_count += 1
        if state_count > 0:
@ -253,17 +292,20 @@ def getClusterInformation(zkhandler):
    formatted_vm_states = {"total": vm_count}
    for state in vm_state_combinations:
        state_count = 0
-        for vm_state in vm_report_status:
-            if vm_state == state:
+        for vm in vm_list:
+            if vm["state"] == state:
                state_count += 1
        if state_count > 0:
            formatted_vm_states[state] = state_count

    # Format the OSD states
+    up_texts = {1: "up", 0: "down"}
+    in_texts = {1: "in", 0: "out"}
    formatted_osd_states = {"total": ceph_osd_count}
    for state in ceph_osd_state_combinations:
        state_count = 0
-        for ceph_osd_state in ceph_osd_report_status:
+        for ceph_osd in ceph_osd_list:
+            ceph_osd_state = f"{up_texts[ceph_osd['stats']['up']]},{in_texts[ceph_osd['stats']['in']]}"
            if ceph_osd_state == state:
                state_count += 1
        if state_count > 0:
@ -271,11 +313,13 @@ def getClusterInformation(zkhandler):

    # Format the status data
    cluster_information = {
-        "health": cluster_health,
-        "health_msg": cluster_health_msg,
-        "storage_health": storage_health,
-        "storage_health_msg": storage_health_msg,
-        "primary_node": common.getPrimaryNode(zkhandler),
+        "cluster_health": getClusterHealth(
+            zkhandler, node_list, vm_list, ceph_osd_list
+        ),
+        "node_health": getNodeHealth(zkhandler, node_list),
+        "maintenance": maintenance_state,
+        "primary_node": primary_node,
+        "pvc_version": pvc_version,
        "upstream_ip": zkhandler.read("base.config.upstream_ip"),
        "nodes": formatted_node_states,
        "vms": formatted_vm_states,
--- a/daemon-common/migrations/versions/9.json
+++ b/daemon-common/migrations/versions/9.json
@ -0,0 +1 @@
+{"version": "9", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "logs": "/logs", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}
--- a/daemon-common/node.py
+++ b/daemon-common/node.py
@ -21,6 +21,7 @@

 import time
 import re
+import json

 import daemon_lib.common as common

@ -49,6 +50,44 @@ def getNodeInformation(zkhandler, node_name):
        zkhandler.read(("node.count.provisioned_domains", node_name))
    )
    node_running_domains = zkhandler.read(("node.running_domains", node_name)).split()
+    try:
+        node_health = int(zkhandler.read(("node.monitoring.health", node_name)))
+    except Exception:
+        node_health = "N/A"
+    try:
+        node_health_plugins = zkhandler.read(
+            ("node.monitoring.plugins", node_name)
+        ).split()
+    except Exception:
+        node_health_plugins = list()
+
+    node_health_details = list()
+    for plugin in node_health_plugins:
+        plugin_last_run = zkhandler.read(
+            ("node.monitoring.data", node_name, "monitoring_plugin.last_run", plugin)
+        )
+        plugin_health_delta = zkhandler.read(
+            (
+                "node.monitoring.data",
+                node_name,
+                "monitoring_plugin.health_delta",
+                plugin,
+            )
+        )
+        plugin_message = zkhandler.read(
+            ("node.monitoring.data", node_name, "monitoring_plugin.message", plugin)
+        )
+        plugin_data = zkhandler.read(
+            ("node.monitoring.data", node_name, "monitoring_plugin.data", plugin)
+        )
+        plugin_output = {
+            "name": plugin,
+            "last_run": int(plugin_last_run),
+            "health_delta": int(plugin_health_delta),
+            "message": plugin_message,
+            "data": json.loads(plugin_data),
+        }
+        node_health_details.append(plugin_output)

    # Construct a data structure to represent the data
    node_information = {
@ -61,10 +100,16 @@ def getNodeInformation(zkhandler, node_name):
        "kernel": node_kernel,
        "os": node_os,
        "arch": node_arch,
+        "health": node_health,
+        "health_plugins": node_health_plugins,
+        "health_details": node_health_details,
        "load": node_load,
        "domains_count": node_domains_count,
        "running_domains": node_running_domains,
-        "vcpu": {"total": node_cpu_count, "allocated": node_vcpu_allocated},
+        "vcpu": {
+            "total": node_cpu_count,
+            "allocated": node_vcpu_allocated,
+        },
        "memory": {
            "total": node_mem_total,
            "allocated": node_mem_allocated,
@ -82,16 +127,14 @@ def getNodeInformation(zkhandler, node_name):
 def secondary_node(zkhandler, node):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    # Ensure node is a coordinator
    daemon_mode = zkhandler.read(("node.mode", node))
    if daemon_mode == "hypervisor":
        return (
            False,
-            'ERROR: Cannot change coordinator mode on non-coordinator node "{}"'.format(
+            "ERROR: Cannot change coordinator state on non-coordinator node {}".format(
                node
            ),
        )
@ -99,14 +142,14 @@ def secondary_node(zkhandler, node):
    # Ensure node is in run daemonstate
    daemon_state = zkhandler.read(("node.state.daemon", node))
    if daemon_state != "run":
-        return False, 'ERROR: Node "{}" is not active'.format(node)
+        return False, "ERROR: Node {} is not active".format(node)

    # Get current state
    current_state = zkhandler.read(("node.state.router", node))
    if current_state == "secondary":
-        return True, 'Node "{}" is already in secondary coordinator mode.'.format(node)
+        return True, "Node {} is already in secondary coordinator state.".format(node)

-    retmsg = "Setting node {} in secondary coordinator mode.".format(node)
+    retmsg = "Setting node {} in secondary coordinator state.".format(node)
    zkhandler.write([("base.config.primary_node", "none")])

    return True, retmsg
@ -115,16 +158,14 @@ def secondary_node(zkhandler, node):
 def primary_node(zkhandler, node):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    # Ensure node is a coordinator
    daemon_mode = zkhandler.read(("node.mode", node))
    if daemon_mode == "hypervisor":
        return (
            False,
-            'ERROR: Cannot change coordinator mode on non-coordinator node "{}"'.format(
+            "ERROR: Cannot change coordinator state on non-coordinator node {}".format(
                node
            ),
        )
@ -132,14 +173,14 @@ def primary_node(zkhandler, node):
    # Ensure node is in run daemonstate
    daemon_state = zkhandler.read(("node.state.daemon", node))
    if daemon_state != "run":
-        return False, 'ERROR: Node "{}" is not active'.format(node)
+        return False, "ERROR: Node {} is not active".format(node)

    # Get current state
    current_state = zkhandler.read(("node.state.router", node))
    if current_state == "primary":
-        return True, 'Node "{}" is already in primary coordinator mode.'.format(node)
+        return True, "Node {} is already in primary coordinator state.".format(node)

-    retmsg = "Setting node {} in primary coordinator mode.".format(node)
+    retmsg = "Setting node {} in primary coordinator state.".format(node)
    zkhandler.write([("base.config.primary_node", node)])

    return True, retmsg
@ -148,14 +189,12 @@ def primary_node(zkhandler, node):
 def flush_node(zkhandler, node, wait=False):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    if zkhandler.read(("node.state.domain", node)) == "flushed":
-        return True, "Hypervisor {} is already flushed.".format(node)
+        return True, "Node {} is already flushed.".format(node)

-    retmsg = "Flushing hypervisor {} of running VMs.".format(node)
+    retmsg = "Removing node {} from active service.".format(node)

    # Add the new domain to Zookeeper
    zkhandler.write([(("node.state.domain", node), "flush")])
@ -163,7 +202,7 @@ def flush_node(zkhandler, node, wait=False):
    if wait:
        while zkhandler.read(("node.state.domain", node)) == "flush":
            time.sleep(1)
-        retmsg = "Flushed hypervisor {} of running VMs.".format(node)
+        retmsg = "Removed node {} from active service.".format(node)

    return True, retmsg

@ -171,14 +210,12 @@ def flush_node(zkhandler, node, wait=False):
 def ready_node(zkhandler, node, wait=False):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    if zkhandler.read(("node.state.domain", node)) == "ready":
-        return True, "Hypervisor {} is already ready.".format(node)
+        return True, "Node {} is already ready.".format(node)

-    retmsg = "Restoring hypervisor {} to active service.".format(node)
+    retmsg = "Restoring node {} to active service.".format(node)

    # Add the new domain to Zookeeper
    zkhandler.write([(("node.state.domain", node), "unflush")])
@ -186,7 +223,7 @@ def ready_node(zkhandler, node, wait=False):
    if wait:
        while zkhandler.read(("node.state.domain", node)) == "unflush":
            time.sleep(1)
-        retmsg = "Restored hypervisor {} to active service.".format(node)
+        retmsg = "Restored node {} to active service.".format(node)

    return True, retmsg

@ -194,9 +231,7 @@ def ready_node(zkhandler, node, wait=False):
 def get_node_log(zkhandler, node, lines=2000):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    # Get the data from ZK
    node_log = zkhandler.read(("logs.messages", node))
@ -214,14 +249,12 @@ def get_node_log(zkhandler, node, lines=2000):
 def get_info(zkhandler, node):
    # Verify node is valid
    if not common.verifyNode(zkhandler, node):
-        return False, 'ERROR: No node named "{}" is present in the cluster.'.format(
-            node
-        )
+        return False, "ERROR: No node named {} is present in the cluster.".format(node)

    # Get information about node in a pretty format
    node_information = getNodeInformation(zkhandler, node)
    if not node_information:
-        return False, 'ERROR: Could not get information about node "{}".'.format(node)
+        return False, "ERROR: Could not get information about node {}.".format(node)

    return True, node_information

--- a/daemon-common/vm.py
+++ b/daemon-common/vm.py
@ -644,7 +644,7 @@ def rename_vm(zkhandler, domain, new_domain):

    # Verify that the VM is in a stopped state; renaming is not supported otherwise
    state = zkhandler.read(("domain.state", dom_uuid))
-    if state != "stop":
+    if state not in ["stop", "disable"]:
        return (
            False,
            'ERROR: VM "{}" is not in stopped state; VMs cannot be renamed while running.'.format(
--- a/daemon-common/zkhandler.py
+++ b/daemon-common/zkhandler.py
@ -540,7 +540,7 @@ class ZKHandler(object):
 #
 class ZKSchema(object):
    # Current version
-    _version = 8
+    _version = 9

    # Root for doing nested keys
    _schema_root = ""
@ -569,6 +569,7 @@ class ZKSchema(object):
            "domain": f"{_schema_root}/domains",
            "network": f"{_schema_root}/networks",
            "storage": f"{_schema_root}/ceph",
+            "storage.health": f"{_schema_root}/ceph/health",
            "storage.util": f"{_schema_root}/ceph/util",
            "osd": f"{_schema_root}/ceph/osds",
            "pool": f"{_schema_root}/ceph/pools",
@ -608,6 +609,18 @@ class ZKSchema(object):
            "sriov": "/sriov",
            "sriov.pf": "/sriov/pf",
            "sriov.vf": "/sriov/vf",
+            "monitoring.plugins": "/monitoring_plugins",
+            "monitoring.data": "/monitoring_data",
+            "monitoring.health": "/monitoring_health",
+        },
+        # The schema of an individual monitoring plugin data entry (/nodes/{node_name}/monitoring_data/{plugin})
+        "monitoring_plugin": {
+            "name": "",  # The root key
+            "last_run": "/last_run",
+            "health_delta": "/health_delta",
+            "message": "/message",
+            "data": "/data",
+            "runtime": "/runtime",
        },
        # The schema of an individual SR-IOV PF entry (/nodes/{node_name}/sriov/pf/{pf})
        "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"},  # The root key
@ -874,9 +887,10 @@ class ZKSchema(object):
                                if not zkhandler.zk_conn.exists(nkipath):
                                    result = False

-                    # One might expect child keys under node (specifically, sriov.pf and sriov.vf) to be
-                    # managed here as well, but those are created automatically every time pvcnoded starts
-                    # and thus never need to be validated or applied.
+                    # One might expect child keys under node (specifically, sriov.pf, sriov.vf,
+                    # monitoring.data) to be managed here as well, but those are created
+                    # automatically every time pvcnoded started and thus never need to be validated
+                    # or applied.

        # These two have several children layers that must be parsed through
        for elem in ["volume"]:
--- a/debian/changelog
+++ b/debian/changelog
@ -1,3 +1,58 @@
+pvc (0.9.65-0) unstable; urgency=high
+
+  * [CLI] Fixes a bug in the node list filtering command
+  * [CLI] Fixes a bug/default when no connection is specified
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Wed, 23 Aug 2023 01:56:57 -0400
+
+pvc (0.9.64-0) unstable; urgency=high
+
+  **Breaking Change [CLI]**: The CLI client root commands have been reorganized. The following commands have changed:
+
+   * `pvc cluster` -> `pvc connection` (all subcommands)
+   * `pvc task` -> `pvc cluster` (all subcommands)
+   * `pvc maintenance` -> `pvc cluster maintenance`
+   * `pvc status` -> `pvc cluster status`
+
+Ensure you have updated to the latest version of the PVC Ansible repository before deploying this version or using PVC Ansible oneshot playbooks for management.
+
+  **Breaking Change [CLI]**: The `--restart` option for VM configuration changes now has an explicit `--no-restart` to disable restarting, or a prompt if neither is specified; `--unsafe` no longer bypasses this prompt which was a bug. Applies to most `vm <cmd> set` commands like `vm vcpu set`, `vm memory set`, etc. All instances also feature restart confirmation afterwards, which, if `--restart` is provided, will prompt for confirmation unless `--yes` or `--unsafe` is specified.
+
+  **Breaking Change [CLI]**: The `--long` option previously on some `info` commands no longer exists; use `-f long`/`--format long` instead.
+
+  * [CLI] Significantly refactors the CLI client code for consistency and cleanliness
+  * [CLI] Implements `-f`/`--format` options for all `list` and `info` commands in a consistent way
+  * [CLI] Changes the behaviour of VM modification options with "--restart" to provide a "--no-restart"; defaults to a prompt if neither is specified and ignores the "--unsafe" global entirely
+  * [API] Fixes several bugs in the 3-debootstrap.py provisioner example script
+  * [Node] Fixes some bugs around VM shutdown on node flush
+  * [Documentation] Adds mentions of Ganeti and Harvester
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Fri, 18 Aug 2023 12:20:43 -0400
+
+pvc (0.9.63-0) unstable; urgency=high
+
+  * Mentions Ganeti in the docs
+  * Increases API timeout back to 2s
+  * Adds .update-* configs to dpkg plugin
+  * Adds full/nearfull OSD warnings
+  * Improves size value handling for volumes
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Fri, 28 Apr 2023 14:47:04 -0400
+
+pvc (0.9.62-0) unstable; urgency=high
+
+  * [all] Adds an enhanced health checking, monitoring, and reporting system for nodes and clusters
+  * [cli] Adds a cluster detail command
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Wed, 22 Feb 2023 18:13:45 -0500
+
+pvc (0.9.61-0) unstable; urgency=high
+
+  * [provisioner] Fixes a bug in network comparison
+  * [api] Fixes a bug being unable to rename disabled VMs
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Wed, 08 Feb 2023 10:08:05 -0500
+
 pvc (0.9.60-0) unstable; urgency=high

  * [Provisioner] Cleans up several remaining bugs in the example scripts; they should all be valid now
--- a/debian/pvc-daemon-node.install
+++ b/debian/pvc-daemon-node.install
@ -5,3 +5,4 @@ node-daemon/pvcnoded.service lib/systemd/system
 node-daemon/pvc.target lib/systemd/system
 node-daemon/pvcautoready.service lib/systemd/system
 node-daemon/monitoring usr/share/pvc
+node-daemon/plugins usr/share/pvc
--- a/docs/about.md
+++ b/docs/about.md
@ -25,13 +25,13 @@ As part of these trends, Infrastructure-as-a-Service (IaaS) has become a critica

 However, the current state of the free and open source virtualization ecosystem is lacking.

-At the lower end, projects like ProxMox provide an easy way to administer small virtualization clusters, but these projects tend to lack advanced redundancy facilities that are built-in by default. While there are some new contenders in this space, such as Harvester, the options are limited and their feature-sets and tool stacks can be cumbersome or unproven.
+At the lower- to middle-end, projects like ProxMox provide an easy way to administer small virtualization clusters, but these projects tend to lack advanced redundancy facilities that are built-in by default. Ganeti, a former Google tool, was long-dead when PVC was initially conceived, but has recently been given new life by the FLOSS community, and was the inspiration for much of PVC's functionality. Harvester is also a newer player in the space, created by Rancher Labs after PVC was established, but its use of custom solutions for everything, especially the storage backend, gives us some pause.

-At the higher end, very large projects like OpenStack and CloudStack provide very advanced functionality, but these project are sprawling and complicated for Administrators to use, and are very focused on large enterprise deployments, not suitable for smaller clusters and teams.
+At the high-end, very large projects like OpenStack and CloudStack provide very advanced functionality, but these project are sprawling and complicated for Administrators to use, and are very focused on large enterprise deployments, not suitable for smaller clusters and teams.

 Finally, proprietary solutions dominate this space. VMWare and Nutanix are the two largest names, with these products providing functionality for both small and large clusters, but proprietary software limits both flexibility and freedom, and the costs associated with these solutions is immense.

-PVC aims to bridge the gaps between these three categories. Like the larger FLOSS and proprietary projects, PVC can scale up to very large cluster sizes, while remaining usable even for small clusters as well. Like the smaller FLOSS and proprietary projects, PVC aims to be very simple to use, with a fully programmable API, allowing administrators to get on with more important things. Like the other FLOSS solutions, PVC is free, both as in beer and as in speech, allowing the administrator to inspect, modify, and tailor it to their needs. And finally, PVC is built from the ground-up to support host-level redundancy at every layer, rather than this being an expensive, optional, or tacked on feature.
+PVC aims to bridge the gaps between these categories. Like the larger FLOSS and proprietary projects, PVC can scale up to very large cluster sizes, while remaining usable even for small clusters as well. Like the smaller FLOSS and proprietary projects, PVC aims to be very simple to use, with a fully programmable API, allowing administrators to get on with more important things. Like the other FLOSS solutions, PVC is free, both as in beer and as in speech, allowing the administrator to inspect, modify, and tailor it to their needs. And finally, PVC is built from the ground-up to support host-level redundancy at every layer, rather than this being an expensive, optional, or tacked on feature, using standard, well-tested and well-supported components.

 In short, it is a Free Software, scalable, redundant, self-healing, and self-managing private cloud solution designed with administrator simplicity in mind. 

@ -71,6 +71,8 @@ Nodes are networked together via a set of statically-configured, simple layer-2

 Further information about the general cluster architecture, including important considerations for node specifications/sizing and network configuration, [can be found at the cluster architecture page](/cluster-architecture). It is imperative that potential PVC administrators read this document thoroughly to understand the specific requirements of PVC and avoid potential missteps in obtaining and deploying their cluster.

+More information about the node daemon can be found at the [Node Daemon manual page](/manuals/daemon) and details about the health system and health plugins for nodes can be found at the [health plugin manual page](/manuals/health-plugins).
+
 ## Clients

 ### API Client
--- a/docs/index.md
+++ b/docs/index.md
@ -8,7 +8,7 @@

 ## What is PVC?

-PVC is a Linux KVM-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.
+PVC is a Linux KVM-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Ganeti, Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.

 PVC is a complete HCI solution, built from well-known and well-trusted Free Software tools, to assist an administrator in creating and managing a cluster of servers to run virtual machines, as well as self-managing several important aspects including storage failover, node failure and recovery, virtual machine failure and recovery, and network plumbing. It is designed to act consistently, reliably, and unobtrusively, letting the administrator concentrate on more important things.

--- a/docs/manuals/daemon.md
+++ b/docs/manuals/daemon.md
@ -52,9 +52,11 @@ The daemon startup sequence is documented below. The main daemon entry-point is

 0. The node activates its keepalived timer and begins sending keepalive updates to the cluster. The daemon state transitions from `init` to `run` and the system has started fully.

-# PVC Node Daemon manual
+## Node health plugins

-The PVC node daemon ins build with Python 3 and is run directly on nodes. For details of the startup sequence and general layout, see the [architecture document](/architecture/daemon).
+The PVC node daemon includes a node health plugin system. These plugins are run during keepalives to check various aspects of node health and adjust the overall node and cluster health accordingly. For example, a plugin might check that all configured network interfaces are online and operating at their correct speed, or that all operating system packages are up-to-date.
+
+For the full details of the health and node health plugin system, see the [node health plugin manual](/manuals/health-plugins).

 ## Configuration

@ -132,6 +134,7 @@ pvc:
      target_selector: mem
    configuration:
      directories:
+        plugin_directory: "/usr/share/pvc/plugins"
        dynamic_directory: "/run/pvc"
        log_directory: "/var/log/pvc"
        console_log_directory: "/var/log/libvirt"
@ -142,7 +145,7 @@ pvc:
        log_dates: True
        log_keepalives: True
        log_keepalive_cluster_details: True
-        log_keepalive_storage_details: True
+        log_keepalive_plugin_details: True
        console_log_lines: 1000
      networking:
        bridge_device: ens4
@ -367,6 +370,12 @@ For most clusters, `mem` should be sufficient, but others may be used based on t
  * `memprov` looks at the provisioned memory, not the allocated memory; thus, stopped or disabled VMs are counted towards a node's memory for this selector, even though their memory is not actively in use.
  * `load` looks at the system load of the node in general, ignoring load in any particular VMs; if any VM's CPU usage changes, this value would be affected. This might be preferable on clusters with some very CPU intensive VMs.

+#### `system` → `configuration` → `directories` → `plugin_directory`
+
+* *optional*
+
+The directory to load node health plugins from. Defaults to `/usr/share/pvc/plugins` if unset as per default packaging; should only be overridden by advanced users.
+
 #### `system` → `configuration` → `directories` → `dynamic_directory`

 * *required*
@ -421,11 +430,11 @@ Whether to log keepalive messages or not.

 Whether to log node status information during keepalives or not.

-#### `system` → `configuration` → `logging` → `log_keepalive_storage_details`
+#### `system` → `configuration` → `logging` → `log_keepalive_plugin_details`

 * *required*

-Whether to log storage cluster status information during keepalives or not.
+Whether to log node health plugin status information during keepalives or not.

 #### `system` → `configuration` → `logging` → `console_log_lines`

--- a/docs/manuals/health-plugins.md
+++ b/docs/manuals/health-plugins.md
@ -0,0 +1,210 @@
+# Node health plugins
+
+The PVC node daemon includes a node health plugin system. These plugins are run during keepalives to check various aspects of node health and adjust the overall node and cluster health accordingly. For example, a plugin might check that all configured network interfaces are online and operating at their correct speed, or that all operating system packages are up-to-date.
+
+## Configuration
+
+### Plugin Directory
+
+The PVC node configuration includes a configuration option at `system` → `configuration` → `directories` → `plugin_directory` to configure the location of health plugin files on the system. By default if unset, this directory is `/usr/share/pvc/plugins`. An administrator can override this directory if they wish, though custom plugins can be installed to this directory without problems, and thus it is not recommended that it be changed.
+
+### Plugin Logging
+
+Plugin output is logged by default during keepalive messages. This is controlled by the node configuration option at `system` → `configuration` → `logging` → `log_keepalive_plugin_details`. Regardless of this setting, the overall node health is logged at the end of the plugin run.
+
+### Disabling Node Plugins
+
+Node plugins cannot be disabled; at best, a suite of zero plugins can be specified by pointing the above plugin directory to an empty folder. This will effectively render the node at a permanent 100% health. Note however that overall cluster health will still be affected by cluster-wide events (e.g. nodes or VMs being stopped, OSDs going out, etc.).
+
+## Health Plugin Architecture
+
+### Node and Cluster Health
+
+A core concept leveraged by the PVC system is that of node and cluster health. Starting with PVC version 0.9.61, these two health statistics are represented as percentages, with 100% representing optimal health, 51-90% representing a "warning" degraded state, and 0-50% representing a "critical" degraded state.
+
+While a cluster is in maintenance mode (set via `pvc maintenance on` and unset via `pvc maintenance off`), the health values continue to aggregate, but the value is ignored for the purposes of "health" output, i.e. its output colour will not change, and the reference monitoring plugins (for CheckMK and Munin) will not trigger alerting. This allows the administrator to specify that abnormal conditions are OK for some amount of time without triggering upstream alerting. Additionally, while a node is not in `run` Daemon state, its health will be reported as `N/A`, which is treated as 100% but displayed as such to make clear that the node has not initialized and run its health check plugins (yet).
+
+The node health is affected primarily by health plugins as discussed in this manual. Any plugin that adjusts node health lowers the node's health by its `health_delta` value, as well as the cluster health by its `health_delta` value. For example, a plugin might have a `health_delta` in a current state of `10`, which reduces its own node's health value to 90%, and the overall cluster health value to 90%.
+
+In addition, cluster health is affected by several fixed states within the PVC system. These are:
+
+* A node in `flushed` Domain state lowers the cluster health by 10; a node in `stop` Daemon state lowers the cluster health by 50.
+
+* A VM in `stop` state lowers the cluster health by 10 (hint: use `disable` state to avoid this).
+
+* An OSD in `down` state lowers the cluster health by 10; an OSD in `out` state lowers the cluster health by 50.
+
+* Memory overprovisioning (total provisioned and running guest memory allocation exceeds the total N-1 cluster memory availability) lowers the cluster health by 50.
+
+* Each Ceph health check message lowers the cluster health by 10 for a `HEALTH_WARN` severity or by 50 for a `HEALTH_ERR` severity. For example, the `OSDMAP_FLAGS` check (reporting, e.g. `noout` state) reports as a `HEALTH_WARN` severity and will thus decrease the cluster health by 10; if an additional `PG_DEGRADED` check fires (also reporting as `HEALTH_WARN` severity), this will decrease the cluster health by a further 10, or 20 total for both. This cumulative effect ensures that multiple simultaneous Ceph issues escalate in severity. For a full list of possible Ceph health check messages, [please see the Ceph documentation](https://docs.ceph.com/en/nautilus/rados/operations/health-checks/).
+
+### Built-in Health Plugins
+
+PVC ships with several node health plugins installed and loaded by default, to ensure several common aspects of node operation are validated and checked. The following plugins are included:
+
+#### `disk`
+
+This plugin checks all SATA/SAS and NVMe block devices for SMART health, if available, and reports any errors.
+
+For SATA/SAS disks reporting standard ATA SMART attributes, a health delta of 10 is raised for each SMART error on each disk, based on the `when_failed` value being set to true. Note that due to this design, several disks with multiple errors can quickly escalate to a critical condition, quickly alerting the administrator of possible major faults.
+
+For NVMe disks, only 3 specific NVMe health information messages are checked: `critical_warning`, `media_errors`, and `percentage_used` at > 90. Each check can only be reported once per disk and each raises a health delta of 10.
+
+#### `dpkg`
+
+This plugin checks for Debian package updates, invalid package states (i.e. not `ii` state), and obsolete configuration files that require cleanup. It will raise a health delta of 1 for each type of inconsistency, for a maximum of 3. It will thus never, on its own, trigger a node or cluster to be in a warning or critical state, but will show the errors for administrator analysis, as an example of a more "configuration anomaly"-type plugin.
+
+#### `edac`
+
+This plugin checks the EDAC utility for messages about errors, primarily in the ECC memory subsystem. It will raise a health delta of 50 if any `Uncorrected` EDAC errors are detected, possibly indicating failing memory.
+
+#### `ipmi`
+
+This plugin checks whether the daemon can reach its own IPMI address and connect. If it cannot, it raises a health delta of 10.
+
+#### `lbvt`
+
+This plugin checks whether the daemon can connect to the local Libvirt daemon instance. If it cannot, it raises a health delta of 50.
+
+#### `load`
+
+This plugin checks the current 1-minute system load (as reported during keepalives) against the number of total CPU threads available on the node. If the load average is greater, i.e. the node is overloaded, it raises a health delta of 50.
+
+#### `nics`
+
+This plugin checks that all NICs underlying PVC networks and bridges are operating correctly, specifically that bond interfaces have at least 2 active slaves and that all physical NICs are operating at their maximum possible speed. It takes into account several possible options to determine this.
+
+* For each device defined (`bridge_dev`, `upstream_dev`, `cluster_dev`, and `storage_dev`), it determines the type of device. If it is a vLAN, it obtains the underlying device; otherwise, it uses the specified device. It then adds this device to a list of core NICs. Ideally, this list will contain either bonding interfaces or actual ethernet NICs.
+
+* For each core NIC, it checks its type. If it is a `bond` device, it checks the bonding state to ensure that at least 2 slave interfaces are up and operating. If there are not, it raises a health delta of 10.
+
+* For each core NIC, it checks its maximum possible speed as reported by `ethtool` as well as the current active speed. If the NIC is operating at less than its maximum possible speed, it raises a health delta of 10.
+
+Note that this check may pose problems in some deployment scenarios (e.g. running 25GbE NICs at 10GbE by design). Currently the plugin logic cannot handle this and manual modifications may be required. This is left to the administrator if applicable.
+
+#### `psql`
+
+This plugin checks whether the daemon can connect to the local PostgreSQL/Patroni daemon instance. If it cannot, it raises a health delta of 50.
+
+#### `zkpr`
+
+This plugin checks whether the daemon can connect to the local Zookeeper daemon instance. If it cannot, it raises a health delta of 50.
+
+### Custom Health Plugins
+
+In addition to the included health plugins, the plugin architecture allows administrators to write their own plugins as required to check specific node details that might not be checked by the default plugins. While the author has endeavoured to cover as many important aspects as possible with the default plugins, there is always the possibility that some other condition becomes important and thus the system is flexible to this need. That said, we would welcome pull requests of new plugins to future version of PVC should they be widely applicable.
+
+As a warning, health plugins are run in a `root` context by PVC. They must therefore be carefully vetted to avoid damaging the system. DO NOT run untrusted health plugins.
+
+To create a health plugin, first reference the existing health plugins and create a base template.
+
+Each health plugin consists of three main parts:
+
+* An import, which must at least include the `MonitoringPlugin` class from the `pvcnoded.objects.MonitoringInstance` library. You can also load additional imports here, or import them within the functions (which is recommended for namespace simplicity).
+
+```
+# This import is always required here, as MonitoringPlugin is used by the MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+```
+
+
+* A `PLUGIN_NAME` variable which defines the name of the plugin. This must match the filename. Generally, a plugin name will be 4 characters, but this is purely a convention and not a requirement.
+
+```
+# A monitoring plugin script must always expose its nice name, which must be identical to the file name
+PLUGIN_NAME = "nics"
+```
+
+* An instance of a `MonitoringPluginScript` class which extends the `MonitoringPlugin` class.
+
+```
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    ...
+```
+
+Within the `MonitoringPluginScript` class must be 3 primary functions as detailed below. While it is possible to do nothing except `pass` in these functions, or even exclude them (the parent includes empty defaults), all 3 should be included for consistency.
+
+#### `def setup(self):`
+
+This function is run once during the node daemon startup, when the plugin is loaded. It can be used to get one-time setup information, populate plugin instance variables, etc.
+
+The function must take no arguments except `self` and anything returned is ignored.
+
+A plugin can also be disabled live in the setup function by throwing any `Exception`. Such exceptions will be caught and the plugin will not be loaded in such a case.
+
+#### `def cleanup(self):`
+
+This function mirrors the setup function, and is run once during the node daemon shutdown process. It can be used to clean up any lingering items (e.g. temporary files) created by the setup or run functions, if required; generally plugins do not need to do any cleanup.
+
+#### `def run(self):`
+
+This function is run each time the plugin is called during a keepalive. It performs the main work of the plugin before returning the end result in a specific format.
+
+Note that this function runs once for each keepalive, which by default is every 5 seconds. It is thus important to keep the runtime as short as possible and avoid doing complex calculations, file I/O, etc. during the plugin run. Do as much as possible in the setup function to keep the run function as quick as possible. A good safe maximum time for any plugin (e.g. if implementing an internal timeout) is 2 seconds.
+
+What happens during the run function is of course completely up to the plugin, but it must return a standardized set of details upon completing the run.
+
+An instance of the `PluginResult` object is helpfully created by the caller and passed in via `self.plugin_result`. This can be used to set the results as follows:
+
+* The `self.plugin_result.set_health_delta()` function can be used to set the current health delta of the result. This should be `0` unless the plugin detects a fault, at which point it can be any integer value below 100, and affects the node and cluster health as detailed above.
+
+* The `self.plugin_result.set_message()` function can be used to set the message text of the result, explaining in a short but human-readable way what the plugin result is. This will be shown in several places, including the node logs (if enabled), the node info output, and for results that have a health delta above 0, in the cluster status output.
+
+Finally, the `PluginResult` instance stored as `self.plugin_result` must be returned by the run function to the caller upon completion so that it can be added to the node state.
+
+### Logging
+
+The MonitoringPlugin class provides a helper logging method (usable as `self.log()`) to assist a plugin author in logging messages to the node daemon console log. This function takes one primary argument, a string message, and an optional `state` keyword argument for alternate states.
+
+The default state is `d` for debug, e.g. `state="d"`. The possible states for log messages are:
+
+* `"d"`: Debug, only printed when the administrator has debug logging enabled. Useful for detailed analysis of the plugin run state.
+* `"i"`: Informational, printed at all times but with no intrinsic severity. Use these very sparingly if at all.
+* `"t"`: Tick, matches the output of the keepalive itself. Use these very sparingly if at all.
+* `"w"`: Warning, prints a warning message. Use these for non-fatal error conditions within the plugin.
+* `"e"`: Error, prints an error message. Use these for fatal error conditions within the plugin.
+
+None of the example plugins make use of the logging interface, but it is available for custom plugins should it be required.
+
+The final output message of each plugin is automatically logged to the node daemon console log with `"t"` state at the completion of all plugins, if the `log_keepalive_plugin_details` configuration option is true. Otherwise, no final output is displayed. This setting does not affect messages printed from within a plugin.
+
+### Example Health Plugin
+
+This is a terse example of the `load` plugin, which is an extremely simple example that shows all the above requirements clearly. Comments are omitted here for simplicity, but these can be seen in the actual plugin file (at `/usr/share/pvc/plugins/load` on any node).
+
+```
+#!/usr/bin/env python3
+
+# load.py: PVC monitoring plugin example
+
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+PLUGIN_NAME = "load"
+
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        pass
+
+    def cleanup(self):
+        pass
+
+    def run(self):
+        from os import getloadavg
+        from psutil import cpu_count
+
+        load_average = getloadavg()[0]
+        cpu_cores = cpu_count()
+
+        if load_average > float(cpu_cores):
+            health_delta = 50
+        else:
+            health_delta = 0
+
+        message = f"Current load is {load_average} out pf {cpu_cores} CPU cores"
+
+        self.plugin_result.set_health_delta(health_delta)
+        self.plugin_result.set_message(message)
+
+        return self.plugin_result
+```
--- a/docs/manuals/swagger.json
+++ b/docs/manuals/swagger.json
@ -15,15 +15,57 @@
        },
        "ClusterStatus": {
            "properties": {
-                "health": {
-                    "description": "The overall cluster health",
-                    "example": "Optimal",
+                "cluster_health": {
+                    "properties": {
+                        "health": {
+                            "description": "The overall health (%) of the cluster",
+                            "example": 100,
+                            "type": "integer"
+                        },
+                        "messages": {
+                            "description": "A list of health event strings",
+                            "items": {
+                                "example": "hv1: plugin 'nics': bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps",
+                                "type": "string"
+                            },
+                            "type": "array"
+                        }
+                    },
+                    "type": "object"
+                },
+                "maintenance": {
+                    "description": "Whether the cluster is in maintenance mode or not (string boolean)",
+                    "example": true,
                    "type": "string"
                },
                "networks": {
                    "description": "The total number of networks in the cluster",
                    "type": "integer"
                },
+                "node_health": {
+                    "properties": {
+                        "hvX": {
+                            "description": "A node entry for per-node health details, one per node in the cluster",
+                            "properties": {
+                                "health": {
+                                    "description": "The health (%) of the node",
+                                    "example": 100,
+                                    "type": "integer"
+                                },
+                                "messages": {
+                                    "description": "A list of health event strings",
+                                    "items": {
+                                        "example": "'nics': bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps",
+                                        "type": "string"
+                                    },
+                                    "type": "array"
+                                }
+                            },
+                            "type": "object"
+                        }
+                    },
+                    "type": "object"
+                },
                "nodes": {
                    "properties": {
                        "state-combination": {
@ -61,15 +103,15 @@
                    "example": "pvchv1",
                    "type": "string"
                },
+                "pvc_version": {
+                    "description": "The PVC version of the current primary coordinator node",
+                    "example": "0.9.61",
+                    "type": "string"
+                },
                "snapshots": {
                    "description": "The total number of snapshots in the storage cluster",
                    "type": "integer"
                },
-                "storage_health": {
-                    "description": "The overall storage cluster health",
-                    "example": "Optimal",
-                    "type": "string"
-                },
                "upstream_ip": {
                    "description": "The cluster upstream IP address in CIDR format",
                    "example": "10.0.0.254/24",
@ -456,6 +498,48 @@
                    "description": "The number of running domains (VMs)",
                    "type": "integer"
                },
+                "health": {
+                    "description": "The overall health (%) of the node",
+                    "example": 100,
+                    "type": "integer"
+                },
+                "health_details": {
+                    "description": "A list of health plugin results",
+                    "items": {
+                        "properties": {
+                            "health_delta": {
+                                "description": "The health delta (negatively applied to the health percentage) of the plugin's current state",
+                                "example": 10,
+                                "type": "integer"
+                            },
+                            "last_run": {
+                                "description": "The UNIX timestamp (s) of the last plugin run",
+                                "example": 1676786078,
+                                "type": "integer"
+                            },
+                            "message": {
+                                "description": "The output message of the plugin",
+                                "example": "bond0 DEGRADED with 1 active slaves, bond0 OK at 10000 Mbps",
+                                "type": "string"
+                            },
+                            "name": {
+                                "description": "The name of the health plugin",
+                                "example": "nics",
+                                "type": "string"
+                            }
+                        },
+                        "type": "object"
+                    },
+                    "type": "array"
+                },
+                "health_plugins": {
+                    "description": "A list of health plugin names currently loaded on the node",
+                    "items": {
+                        "example": "nics",
+                        "type": "string"
+                    },
+                    "type": "array"
+                },
                "kernel": {
                    "desription": "The running kernel version from uname",
                    "type": "string"
@ -6177,7 +6261,7 @@
                        "description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
                        "enum": [
                            "mem",
-                            "memfree",
+                            "memprov",
                            "vcpus",
                            "load",
                            "vms",
@ -6336,7 +6420,7 @@
                        "description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
                        "enum": [
                            "mem",
-                            "memfree",
+                            "memprov",
                            "vcpus",
                            "load",
                            "vms",
@ -6597,7 +6681,7 @@
                        "description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
                        "enum": [
                            "mem",
-                            "memfree",
+                            "memprov",
                            "vcpus",
                            "load",
                            "vms",
--- a/node-daemon/monitoring/README.md
+++ b/node-daemon/monitoring/README.md
@ -2,23 +2,34 @@

 This directory contains several monitoring resources that can be used with various monitoring systems to track and alert on a PVC cluster system.

-### Munin
+## Munin

-The included munin plugin can be activated by linking to it from `/etc/munin/plugins/pvc`. By default, this plugin triggers a CRITICAL state when either the PVC or Storage cluster becomes Degraded, and is otherwise OK. The overall health is graphed numerically (Optimal is 0, Maintenance is 1, Degraded is 2) so that the cluster health can be tracked over time.
+The included Munin plugins can be activated by linking to them from `/etc/munin/plugins/`. Two plugins are provided:

-When using this plugin, it might be useful to adjust the thresholds with a plugin configuration. For instance, one could adjust the Degraded value from CRITICAL to WARNING by adjusting the critical threshold to a value higher than 1.99 (e.g. 3, 10, etc.) so that only the WARNING threshold will be hit. Alternatively one could instead make Maintenance mode trigger a WARNING by lowering the threshold to 0.99.
+* `pvc`: Checks the PVC cluster and node health, as well as their status (OK/Warning/Critical, based on maintenance status), providing 4 graphs.

-Example plugin configuration:
+* `ceph_utilization`: Checks the Ceph cluster statistics, providing multiple graphs. Note that this plugin is independent of PVC itself, and makes local calls to various Ceph commands itself.

-```
-[pvc]
-# Make cluster warn on maintenance
-env.pvc_cluster_warning 0.99
-# Disable critical threshold (>2)
-env.pvc_cluster_critical 3
-# Make storage warn on maintenance, crit on degraded (latter is default)
-env.pvc_storage_warning 0.99
-env.pvc_storage_critical 1.99
-```
+The `pvc` plugin provides no configuration; the status is hardcoded such that <=90% health is warning, <=50% health is critical, and maintenance state forces OK. The alerting is provided by two separate graphs from the health graph so that actual health state is logged regardless of alerting.

-### Check_MK
+The `ceph_utilization` plugin provides no configuration; only the cluster utilization graph alerts such that >80% used is warning and >90% used is critical. Ceph itself begins warning above 80% as well.
+
+## CheckMK
+
+The included CheckMK plugin is divided into two parts: the agent plugin, and the monitoring server plugin. This monitoring server plugin requires CheckMK version 2.0 or higher. The two parts can be installed as follows:
+
+* `pvc`: Place this file in the `/usr/lib/check_mk_agent/plugins/` directory on each node.
+
+* `pvc.py`: Place this file in the `~/local/lib/python3/cmk/base/plugins/agent_based/` directory on the CheckMK monitoring host for each monitoring site.
+
+The plugin provides no configuration: the status is hardcoded such that <=90% health is warning, <=50% health is critical, and maintenance state forces OK.
+
+With both the agent and server plugins installed, you can then run `cmk -II <node>` (or use WATO) to inventory each node, which should produce two new checks:
+
+* `PVC Cluster`: Provides the cluster-wide health. Note that this will be identical for all nodes in the cluster (i.e. if the cluster health drops, all nodes in the cluster will alert this check).
+
+* `PVC Node <shortname>`: Provides the per-node health.
+
+The "Summary" text, shown in the check lists, will be simplistic, only showing the current health percentage.
+
+The "Details" text, found in the specific check details, will show the full list of problem(s) the check finds, as shown by `pvc status` itself.
--- a/node-daemon/monitoring/checkmk/pvc
+++ b/node-daemon/monitoring/checkmk/pvc
@ -0,0 +1,6 @@
+#!/bin/bash
+
+# PVC cluster status check for Check_MK (agent-side)
+
+echo "<<<pvc>>>"
+pvc --quiet status --format json
--- a/node-daemon/monitoring/checkmk/pvc.py
+++ b/node-daemon/monitoring/checkmk/pvc.py
@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+#
+# Check_MK PVC plugin
+#
+# Copyright 2017-2021, Joshua Boniface <joshua@boniface.me>
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+from .agent_based_api.v1 import *
+from cmk.base.check_api import host_name
+from time import time
+from json import loads
+
+
+def discover_pvc(section):
+    my_node = host_name().split(".")[0]
+    yield Service(item=f"PVC Node {my_node}")
+    yield Service(item="PVC Cluster")
+
+
+def check_pvc(item, params, section):
+    state = State.OK
+    summary = "Stuff"
+    details = None
+    data = loads(" ".join(section[0]))
+    my_node = host_name().split(".")[0]
+
+    maintenance_map = {
+        "true": "on",
+        "false": "off",
+    }
+    maintenance = maintenance_map[data["maintenance"]]
+
+    # Node check
+    if item == f"PVC Node {my_node}":
+        my_node = host_name().split(".")[0]
+        node_health = data["node_health"][my_node]["health"]
+        node_messages = data["node_health"][my_node]["messages"]
+
+        summary = f"Node health is {node_health}% (maintenance {maintenance})"
+
+        if len(node_messages) > 0:
+            details = ", ".join(node_messages)
+
+        if node_health <= 50 and maintenance == "off":
+            state = State.CRIT
+        elif node_health <= 90 and maintenance == "off":
+            state = State.WARN
+        else:
+            state = State.OK
+
+        yield Metric(name="node-health", value=node_health)
+
+    # Cluster check
+    elif item == "PVC Cluster":
+        cluster_health = data["cluster_health"]["health"]
+        cluster_messages = data["cluster_health"]["messages"]
+
+        summary = f"Cluster health is {cluster_health}% (maintenance {maintenance})"
+
+        if len(cluster_messages) > 0:
+            details = ", ".join(cluster_messages)
+
+        if cluster_health <= 50 and maintenance == "off":
+            state = State.CRIT
+        elif cluster_health <= 90 and maintenance == "off":
+            state = State.WARN
+        else:
+            state = State.OK
+
+        yield Metric(name="cluster-health", value=cluster_health)
+
+    yield Result(state=state, summary=summary, details=details)
+    return
+
+
+register.check_plugin(
+    name="pvc",
+    service_name="%s",
+    check_ruleset_name="pvc",
+    discovery_function=discover_pvc,
+    check_function=check_pvc,
+    check_default_parameters={},
+)
--- a/node-daemon/monitoring/munin/pvc
+++ b/node-daemon/monitoring/munin/pvc
@ -7,23 +7,6 @@

 pvc - Plugin to monitor a PVC cluster.

-=head1 CONFIGURATION
-
-Note that due to how Munin thresholds work, these values must always be slightly less than 1 or 2 respectively,
-or the alerts will never be triggered.
-
-Defaults (no config required):
-
-[pvc]
-env.warning 1.99
-env.critical 1.99
-
-Make degraded cluster WARN only (max value is 2, so 3 effectively disables):
-
-[pvc]
-env.pvc_cluster_warning 1.99
-env.pvc_cluster_critical 3
-
 =head1 AUTHOR

 Joshua Boniface <joshua@boniface.me>
@ -45,7 +28,9 @@ GPLv3

 . "$MUNIN_LIBDIR/plugins/plugin.sh"

-warning=1.99
+is_multigraph
+
+warning=0.99
 critical=1.99

 export PVC_CLIENT_DIR="/run/shm/munin-pvc"
@ -53,16 +38,7 @@ PVC_CMD="/usr/bin/pvc --quiet --cluster local status --format json-pretty"
 JQ_CMD="/usr/bin/jq"

 output_usage() {
-    echo "This plugin outputs numerical values based on the health of the PVC cluster."
-    echo
-    echo "There are separate outputs for both the PVC cluster itself as well as the Ceph storage cluster."
-    echo "In normal operation, i.e. when both clusters are in 'Optimal' state, the plugin returns 0 for"
-    echo "each cluster. When the cluster is placed into 'Maintenance' mode,the plugin returns 1 for each"
-    echo "cluster, and goes into WARN state (limit 0.99); this can be adjusted by overriding the WARNING"
-    echo "threshold of the plugin to something other than 0.99 - note that due to Munin's alerting design,"
-    echo "the warning value must always be very slightly below the whole number. When either cluster"
-    echo "element becomes 'Degraded', the plugin returns 2 for the relevant cluster, which is treated as a"
-    echo "critical. Like the WARNING threshold, this can be overridden, and with the same caveat about limit."
+    echo "This plugin outputs information about a PVC cluster and node"
    exit 0
 }

@ -84,72 +60,102 @@ output_autoconf() {
 }

 output_config() {
-    echo 'graph_title PVC Clusters'
+    echo 'multigraph pvc_cluster_health'
+    echo 'graph_title PVC Cluster Health'
    echo 'graph_args --base 1000'
-    echo 'graph_vlabel Count'
+    echo 'graph_vlabel Health%'
    echo 'graph_category pvc'
-    echo 'graph_period second'
-    echo 'graph_info This graph shows the nodes in the PVC cluster.'
+    echo 'graph_info Health of the PVC cluster'

-    echo 'pvc_cluster.label Cluster Degradation'
-    echo 'pvc_cluster.type GAUGE'
-    echo 'pvc_cluster.max 2'
-    echo 'pvc_cluster.info Whether the PVC cluster is in a degraded state.'
-    print_warning pvc_cluster
-    print_critical pvc_cluster
+    echo 'pvc_cluster_health.label Cluster Health'
+    echo 'pvc_cluster_health.type GAUGE'
+    echo 'pvc_cluster_health.max 100'
+    echo 'pvc_cluster_health.min 0'
+    echo 'pvc_cluster_health.info Health of the PVC cluster in %'

-    echo 'pvc_storage.label Storage Degradation'
-    echo 'pvc_storage.type GAUGE'
-    echo 'pvc_storage.max 2'
-    echo 'pvc_storage.info Whether the storage cluster is in a degraded state.'
-    print_warning pvc_storage
-    print_critical pvc_storage
+    echo 'multigraph pvc_cluster_alert'
+    echo 'graph_title PVC Cluster Alerting'
+    echo 'graph_args --base 1000'
+    echo 'graph_vlabel State'
+    echo 'graph_category pvc'
+    echo 'graph_info Alerting state of the PVC cluster health'
+
+    echo 'pvc_cluster_alert.label Cluster Health State'
+    echo 'pvc_cluster_alert.type GAUGE'
+    echo 'pvc_cluster_alert.max 2'
+    echo 'pvc_cluster_alert.min 0'
+    echo 'pvc_cluster_alert.info Alerting state of the PVC cluster health'
+    print_warning pvc_cluster_alert
+    print_critical pvc_cluster_alert
+
+    echo 'multigraph pvc_node_health'
+    echo 'graph_title PVC Node Health'
+    echo 'graph_args --base 1000'
+    echo 'graph_vlabel Health%'
+    echo 'graph_category pvc'
+    echo 'graph_info Health of the PVC node'
+
+    echo 'pvc_node_health.label Node Health'
+    echo 'pvc_node_health.type GAUGE'
+    echo 'pvc_node_health.max 100'
+    echo 'pvc_node_health.min 0'
+    echo 'pvc_node_health.info Health of the PVC node in %'
+
+    echo 'multigraph pvc_node_alert'
+    echo 'graph_title PVC Node Alerting'
+    echo 'graph_args --base 1000'
+    echo 'graph_vlabel State'
+    echo 'graph_category pvc'
+    echo 'graph_info Alerting state of the PVC node health'
+
+    echo 'pvc_node_alert.label Node Health State'
+    echo 'pvc_node_alert.type GAUGE'
+    echo 'pvc_node_alert.max 2'
+    echo 'pvc_node_alert.min 0'
+    echo 'pvc_node_alert.info Alerting state of the PVC node health'
+    print_warning pvc_node_alert
+    print_critical pvc_node_alert

    exit 0
 }

 output_values() {
    PVC_OUTPUT="$( $PVC_CMD )"
+    HOST="$( hostname --short )"

-    cluster_health="$( $JQ_CMD '.health' <<<"${PVC_OUTPUT}" | tr -d '"' )"
-    cluster_failed_reason="$( $JQ_CMD -r '.health_msg | @csv' <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
-    case $cluster_health in
-        "Optimal")
-            cluster_value="0"
-            ;;
-        "Maintenance")
-            cluster_value="1"
-            ;;
-        "Degraded")
-            cluster_value="2"
-    esac
+    is_maintenance="$( $JQ_CMD ".maintenance" <<<"${PVC_OUTPUT}" | tr -d '"' )"

-    storage_health="$( $JQ_CMD '.storage_health' <<<"${PVC_OUTPUT}" | tr -d '"' )"
-    storage_failed_reason="$( $JQ_CMD -r '.storage_health_msg | @csv' <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
-    case $storage_health in
-        "Optimal")
-            storage_value="0"
-            ;;
-        "Maintenance")
-            storage_value="1"
-            ;;
-        "Degraded")
-            storage_value="2"
-    esac
+    cluster_health="$( $JQ_CMD ".cluster_health.health" <<<"${PVC_OUTPUT}" | tr -d '"' )"
+    cluster_health_messages="$( $JQ_CMD -r ".cluster_health.messages | @csv" <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
+    echo 'multigraph pvc_cluster_health'
+    echo "pvc_cluster_health.value ${cluster_health}"
+    echo "pvc_cluster_health.extinfo ${cluster_health_messages}"

+    if [[ ${cluster_health} -le 50 && ${is_maintenance} == "false" ]]; then
+        cluster_health_alert=2
+    elif [[ ${cluster_health} -le 90 && ${is_maintenance} == "false" ]]; then
+        cluster_health_alert=1
+    else
+        cluster_health_alert=0
+    fi
+    echo 'multigraph pvc_cluster_alert'
+    echo "pvc_cluster_alert.value ${cluster_health_alert}"

-    echo "pvc_cluster.value $cluster_value"
-    if [[ $cluster_value -eq 1 ]]; then
-        echo "pvc_cluster.extinfo Cluster in maintenance mode"
-    elif [[ $cluster_value -eq 2 ]]; then
-        echo "pvc_cluster.extinfo ${cluster_failed_reason}"
-    fi 
-    echo "pvc_storage.value $storage_value"
-    if [[ $storage_value -eq 1 ]]; then
-        echo "pvc_storage.extinfo Cluster in maintenance mode"
-    elif [[ $storage_value -eq 2 ]]; then
-        echo "pvc_storage.extinfo ${storage_failed_reason}"
-    fi 
+    node_health="$( $JQ_CMD ".node_health.${HOST}.health" <<<"${PVC_OUTPUT}" | tr -d '"' )"
+    node_health_messages="$( $JQ_CMD -r ".node_health.${HOST}.messages | @csv" <<<"${PVC_OUTPUT}" | tr -d '"' | sed 's/,/, /g' )"
+    echo 'multigraph pvc_node_health'
+    echo "pvc_node_health.value ${node_health}"
+    echo "pvc_node_health.extinfo ${node_health_messages}"
+
+    if [[ ${node_health} -le 50 && ${is_maintenance} != "true" ]]; then
+        node_health_alert=2
+    elif [[ ${node_health} -le 90 && ${is_maintenance} != "true" ]]; then
+        node_health_alert=1
+    else
+        node_health_alert=0
+    fi
+    echo 'multigraph pvc_node_alert'
+    echo "pvc_node_alert.value ${node_health_alert}"
 }

 case $# in
--- a/node-daemon/plugins/disk
+++ b/node-daemon/plugins/disk
@ -0,0 +1,167 @@
+#!/usr/bin/env python3
+
+# disk.py - PVC Monitoring example plugin for disk (system + OSD)
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the system and OSD disks for errors and faults and return
+# a health delta corresponding to severity.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "disk"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        from daemon_lib.common import run_os_command
+        from json import loads
+
+        _, _all_disks, _ = run_os_command("lsblk --json --paths --include 8,259")
+        try:
+            all_disks = loads(_all_disks)
+        except Exception as e:
+            return f"Error loading lsblk JSON: {e}"
+
+        disk_details = list()
+
+        def get_smartinfo(disk, extra_opt=""):
+            _, _smart_info, _ = run_os_command(f"smartctl --info --json {extra_opt} {disk}")
+            try:
+                smart_info = loads(_smart_info)
+            except Exception as e:
+                return None
+
+            return smart_info
+
+        for disk in [disk["name"] for disk in all_disks['blockdevices']]:
+            extra_opt = ""
+            smart_info = get_smartinfo(disk)
+            if smart_info is None or smart_info["smartctl"]["exit_status"] > 1:
+                continue
+            elif smart_info["smartctl"]["exit_status"] == 1:
+                if "requires option" in smart_info["smartctl"]["messages"][0]["string"]:
+                    extra_opt = smart_info["smartctl"]["messages"][0]["string"].split("'")[1].replace('N','0')
+                    smart_info = get_smartinfo(disk, extra_opt)
+                    if smart_info is None or smart_info["smartctl"]["exit_status"] > 0:
+                        continue
+                else:
+                    continue
+
+            disk_type = smart_info["device"]["type"]
+
+            disk_details.append((disk, extra_opt, disk_type))
+
+        self.disk_details = disk_details
+
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Re-run setup each time to ensure the disk details are current
+        self.setup()
+
+        # Run any imports first
+        from daemon_lib.common import run_os_command
+        from json import loads
+
+        health_delta = 0
+        messages = list()
+
+        for _disk in self.disk_details:
+            disk = _disk[0]
+            extra_opt = _disk[1]
+            disk_type = _disk[2]
+
+            _, _smart_info, _ = run_os_command(f"smartctl --all --json {extra_opt} {disk}")
+            try:
+                smart_info = loads(_smart_info)
+            except Exception as e:
+                health_delta += 10
+                messages.append(f"{disk} failed to load SMART data")
+                continue
+
+            if disk_type == 'nvme':
+                for attribute in smart_info.get('nvme_smart_health_information_log', {}).items():
+                    if attribute[0] == "critical_warning" and attribute[1] > 0:
+                        health_delta += 10
+                        messages.append(f"{disk} critical warning value {attribute[1]}")
+                    if attribute[0] == "media_errors" and attribute[1] > 0:
+                        health_delta += 10
+                        messages.append(f"{disk} media errors value {attribute[1]}")
+                    if attribute[0] == "percentage_used" and attribute[1] > 90:
+                        health_delta += 10
+                        messages.append(f"{disk} percentage used value {attribute[1]}%")
+            else:
+                for attribute in smart_info.get('ata_smart_attributes', {}).get('table', []):
+                    if attribute["when_failed"]:
+                        health_delta += 10
+                        messages.append(f"{disk} attribute {attribute['name']} value {attribute['raw']['value']}")
+
+        if len(messages) < 1:
+            messages.append(f"All {len(self.disk_details)} checked disks report OK: {', '.join([disk[0] for disk in self.disk_details])}")
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(', '.join(messages))
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/dpkg
+++ b/node-daemon/plugins/dpkg
@ -0,0 +1,160 @@
+#!/usr/bin/env python3
+
+# dpkg.py - PVC Monitoring example plugin for dpkg status
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the system dpkg status is as expected, with no invalid
+# packages or obsolete configuration files, and will return a 1 health delta for each
+# flaw in invalid packages, upgradable packages, and obsolete config files.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "dpkg"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from re import match
+        import daemon_lib.common as pvc_common
+
+        # Get Debian version
+        with open('/etc/debian_version', 'r') as fh:
+            debian_version = fh.read().strip()
+
+        # Get a list of dpkg packages for analysis
+        retcode, stdout, stderr = pvc_common.run_os_command("/usr/bin/dpkg --list")
+
+        # Get a list of installed packages and states
+        packages = list()
+        for dpkg_line in stdout.split('\n'):
+            if match('^[a-z][a-z] ', dpkg_line):
+                line_split = dpkg_line.split()
+                package_state = line_split[0]
+                package_name = line_split[1]
+                packages.append((package_name, package_state))
+
+        count_ok = 0
+        count_inconsistent = 0
+        list_inconsistent = list()
+
+        for package in packages:
+            if package[1] == "ii":
+                count_ok += 1
+            else:
+                count_inconsistent += 1
+                list_inconsistent.append(package[0])
+
+        # Get upgradable packages
+        retcode, stdout, stderr = pvc_common.run_os_command("/usr/bin/apt list --upgradable")
+
+        list_upgradable = list()
+        for apt_line in stdout.split('\n'):
+            if match('^[a-z][a-z] ', apt_line):
+                line_split = apt_line.split('/')
+                package_name = line_split[0]
+                list_upgradable.append(package_name)
+
+        count_upgradable = len(list_upgradable)
+
+        # Get obsolete config files (dpkg-*, ucf-*, or update-* under /etc)
+        retcode, stdout, stderr = pvc_common.run_os_command("/usr/bin/find /etc -type f -a \( -name '*.dpkg-*' -o -name '*.ucf-*' -o -name '*.update-*' \)")
+
+        obsolete_conffiles = list()
+        for conffile_line in stdout.split('\n'):
+            if conffile_line:
+                obsolete_conffiles.append(conffile_line)
+
+        count_obsolete_conffiles = len(obsolete_conffiles)
+
+        # Set health_delta based on the results
+        health_delta = 0
+        if count_inconsistent > 0:
+            health_delta += 1
+        if count_upgradable > 0:
+            health_delta += 1
+        if count_obsolete_conffiles > 0:
+            health_delta += 1
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Craft the message
+        message = f"Debian {debian_version}; Obsolete conffiles: {count_obsolete_conffiles}; Packages inconsistent: {count_inconsistent}, upgradable: {count_upgradable}"
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Set the detailed data in our local PluginResult object
+        detailed_data = {
+            "debian_version": debian_version,
+            "obsolete_conffiles": obsolete_conffiles,
+            "inconsistent_packages": list_inconsistent,
+            "upgradable_packages": list_upgradable,
+        }
+        self.plugin_result.set_data(detailed_data)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/edac
+++ b/node-daemon/plugins/edac
@ -0,0 +1,106 @@
+#!/usr/bin/env python3
+
+# edac.py - PVC Monitoring example plugin for EDAC
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the system's EDAC registers and report any failures.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "edac"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        import daemon_lib.common as common
+        from re import match, search
+
+        # Get edac-util output
+        retcode, stdout, stderr = common.run_os_command('/usr/bin/edac-util')
+
+        # If there's no errors, we're OK
+        if match(r'^edac-util: No errors to report.', stdout):
+            health_delta = 0
+            message = "EDAC reports no errors"
+        else:
+            health_delta = 0
+            message = "EDAC reports errors: "
+            errors = list()
+            for line in stdout.split('\n'):
+                if match(r'^mc[0-9]: csrow', line):
+                    if 'Uncorrected' in line:
+                        health_delta = 50
+                    errors.append(' '.join(line.split()[2:]))
+            message += ', '.join(errors)
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/ipmi
+++ b/node-daemon/plugins/ipmi
@ -0,0 +1,107 @@
+#!/usr/bin/env python3
+
+# ipmi.py - PVC Monitoring example plugin for IPMI
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check whether the system IPMI is reachable.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "ipmi"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not ipmi in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from daemon_lib.common import run_os_command
+
+        # Check the node's IPMI interface
+        ipmi_hostname = self.config["ipmi_hostname"]
+        ipmi_username = self.config["ipmi_username"]
+        ipmi_password = self.config["ipmi_password"]
+        retcode, _, _ = run_os_command(
+            f"/usr/bin/ipmitool -I lanplus -H {ipmi_hostname} -U {ipmi_username} -P {ipmi_password} chassis power status",
+            timeout=2
+        )
+
+        if retcode > 0:
+            # Set the health delta to 10 (subtract 10 from the total of 100)
+            health_delta = 10
+            # Craft a message that can be used by the clients
+            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is NOT responding"
+        else:
+            # Set the health delta to 0 (no change)
+            health_delta = 0
+            # Craft a message that can be used by the clients
+            message = f"IPMI via {ipmi_username}@{ipmi_hostname} is responding"
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/lbvt
+++ b/node-daemon/plugins/lbvt
@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+
+# lbvt.py - PVC Monitoring example plugin for Libvirtd
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the Libvirt daemon instance on the node for operation.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "lbvt"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not lbvt in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from libvirt import openReadOnly as lvopen
+
+        lv_conn = None
+
+        # Set the health delta to 0 (no change)
+        health_delta = 0
+        # Craft a message that can be used by the clients
+        message = "Successfully connected to Libvirtd on localhost"
+
+        # Check the Zookeeper connection
+        try:
+            lv_conn = lvopen(f"qemu+tcp://{self.this_node.name}/system")
+            data = lv_conn.getHostname()
+        except Exception as e:
+            health_delta = 50
+            message = f"Failed to connect to Libvirtd: {e}"
+        finally:
+            if lv_conn is not None:
+                lv_conn.close()
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/load
+++ b/node-daemon/plugins/load
@ -0,0 +1,107 @@
+#!/usr/bin/env python3
+
+# load.py - PVC Monitoring example plugin for load
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the system load against the total number of CPU cores.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "load"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from os import getloadavg
+        from psutil import cpu_count
+
+        # Get the current 1-minute system load average
+        load_average = getloadavg()[0]
+
+        # Get the number of CPU cores
+        cpu_cores = cpu_count()
+
+        # Check that the load average is greater or equal to the cpu count
+        if load_average > float(cpu_cores):
+            # Set the health delta to 10 (subtract 10 from the total of 100)
+            health_delta = 50
+            # Craft a message that can be used by the clients
+            message = f"Current load is {load_average} out of {cpu_cores} CPU cores"
+
+        else:
+            # Set the health delta to 0 (no change)
+            health_delta = 0
+            # Craft a message that can be used by the clients
+            message = f"Current load is {load_average} out of {cpu_cores} CPU cores"
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/nics
+++ b/node-daemon/plugins/nics
@ -0,0 +1,196 @@
+#!/usr/bin/env python3
+
+# nics.py - PVC Monitoring example plugin for NIC interfaces
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the network interfaces of the host, specifically for speed
+# and 802.3ad status (if applicable).
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "nics"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        import daemon_lib.common as common
+        from re import match, search, findall
+
+        messages = list()
+        health_delta = 0
+
+        # Get a list of the various underlying devices
+        _core_nics = set()
+
+        for dev in [
+                self.config['bridge_dev'],
+                self.config['upstream_dev'],
+                self.config['cluster_dev'],
+                self.config['storage_dev'],
+        ]:
+            with open(f'/sys/class/net/{dev}/uevent', 'r') as uevent:
+                _devtype = uevent.readlines()[0].split('=')[-1].strip()
+
+            if _devtype == 'vlan':
+                with open(f"/proc/net/vlan/{dev}") as devfh:
+                    vlan_info = devfh.read().split('\n')
+                for line in vlan_info:
+                    if match(r'^Device:', line):
+                        dev = line.split()[-1]
+
+            _core_nics.add(dev)
+
+        core_nics = sorted(list(_core_nics))
+
+        for dev in core_nics:
+            with open(f'/sys/class/net/{dev}/uevent', 'r') as uevent:
+                _devtype = uevent.readlines()[0].split('=')[-1].strip()
+
+            if _devtype == "bond":
+                syspath = f"/proc/net/bonding/{dev}"
+
+                with open(syspath) as devfh:
+                    bonding_stats = devfh.read()
+
+                _, _mode, _info, *_slaves = bonding_stats.split('\n\n')
+
+                slave_interfaces = list()
+                for slavedev in _slaves:
+                    lines = slavedev.split('\n')
+                    for line in lines:
+                        if match(r'^Slave Interface:', line):
+                            interface_name = line.split()[-1]
+                        if match(r'^MII Status:', line):
+                            interface_status = line.split()[-1]
+                        if match(r'^Speed:', line):
+                            try:
+                                interface_speed_mbps = int(line.split()[-2])
+                            except Exception:
+                                interface_speed_mbps = 0
+                        if match(r'^Duplex:', line):
+                            interface_duplex = line.split()[-1]
+                    slave_interfaces.append((interface_name, interface_status, interface_speed_mbps, interface_duplex))
+
+                # Ensure at least 2 slave interfaces are up
+                slave_interface_up_count = 0
+                for slave_interface in slave_interfaces:
+                    if slave_interface[1] == 'up':
+                        slave_interface_up_count += 1
+                if slave_interface_up_count < 2:
+                    messages.append(f"{dev} DEGRADED with {slave_interface_up_count} active slaves")
+                    health_delta += 10
+                else:
+                    messages.append(f"{dev} OK with {slave_interface_up_count} active slaves")
+
+                # Get ethtool supported speeds for slave interfaces
+                supported_link_speeds = set()
+                for slave_interface in slave_interfaces:
+                    slave_dev = slave_interface[0]
+                    _, ethtool_stdout, _ = common.run_os_command(f"ethtool {slave_dev}")
+                    in_modes = False
+                    for line in ethtool_stdout.split('\n'):
+                        if search('Supported link modes:', line):
+                            in_modes = True
+                        if search('Supported pause frame use:', line):
+                            in_modes = False
+                            break
+                        if in_modes:
+                            speed = int(findall(r'\d+', line.split()[-1])[0])
+                            supported_link_speeds.add(speed)
+            else:
+                # Get ethtool supported speeds for interface
+                supported_link_speeds = set()
+                _, ethtool_stdout, _ = common.run_os_command(f"ethtool {dev}")
+                in_modes = False
+                for line in ethtool_stdout.split('\n'):
+                    if search('Supported link modes:', line):
+                        in_modes = True
+                    if search('Supported pause frame use:', line):
+                        in_modes = False
+                        break
+                    if in_modes:
+                        speed = int(line.split()[-1].replace('baseT', '').split('/')[0])
+                        supported_link_speeds.add(speed)
+
+            max_supported_link_speed = sorted(list(supported_link_speeds))[-1]
+
+            # Ensure interface is running at its maximum speed
+            with open(f"/sys/class/net/{dev}/speed") as devfh:
+                dev_speed = int(devfh.read())
+            if dev_speed < max_supported_link_speed:
+                messages.append(f"{dev} DEGRADED at {dev_speed} Mbps")
+                health_delta += 10
+            else:
+                messages.append(f"{dev} OK at {dev_speed} Mbps")
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(', '.join(messages))
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/psql
+++ b/node-daemon/plugins/psql
@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+
+# psql.py - PVC Monitoring example plugin for Postgres/Patroni
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the Patroni PostgreSQL instance on the node for operation.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "psql"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from psycopg2 import connect
+
+        conn_metadata = None
+        cur_metadata = None
+        conn_dns = None
+        cur_dns = None
+
+        # Set the health delta to 0 (no change)
+        health_delta = 0
+        # Craft a message that can be used by the clients
+        message = "Successfully connected to PostgreSQL databases on localhost"
+
+        # Check the Metadata database (primary)
+        try:
+            conn_metadata = connect(
+                host=self.this_node.name,
+                port=self.config["metadata_postgresql_port"],
+                dbname=self.config["metadata_postgresql_dbname"],
+                user=self.config["metadata_postgresql_user"],
+                password=self.config["metadata_postgresql_password"],
+            )
+            cur_metadata = conn_metadata.cursor()
+            cur_metadata.execute("""SELECT * FROM alembic_version""")
+            data = cur_metadata.fetchone()
+        except Exception as e:
+            health_delta = 50
+            err = str(e).split('\n')[0]
+            message = f"Failed to connect to PostgreSQL database {self.config['metadata_postgresql_dbname']}: {err}"
+        finally:
+            if cur_metadata is not None:
+                cur_metadata.close()
+            if conn_metadata is not None:
+                conn_metadata.close()
+
+        if health_delta == 0:
+            # Check the PowerDNS database (secondary)
+            try:
+                conn_pdns = connect(
+                    host=self.this_node.name,
+                    port=self.config["pdns_postgresql_port"],
+                    dbname=self.config["pdns_postgresql_dbname"],
+                    user=self.config["pdns_postgresql_user"],
+                    password=self.config["pdns_postgresql_password"],
+                )
+                cur_pdns = conn_pdns.cursor()
+                cur_pdns.execute("""SELECT * FROM supermasters""")
+                data = cur_pdns.fetchone()
+            except Exception as e:
+                health_delta = 50
+                err = str(e).split('\n')[0]
+                message = f"Failed to connect to PostgreSQL database {self.config['pdns_postgresql_dbname']}: {err}"
+            finally:
+                if cur_pdns is not None:
+                    cur_pdns.close()
+                if conn_pdns is not None:
+                    conn_pdns.close()
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/plugins/zkpr
+++ b/node-daemon/plugins/zkpr
@ -0,0 +1,107 @@
+#!/usr/bin/env python3
+
+# zkpr.py - PVC Monitoring example plugin for Zookeeper
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+# This script provides an example of a PVC monitoring plugin script. It will create
+# a simple plugin to check the Zookeeper instance on the node for operation.
+
+# This script can thus be used as an example or reference implementation of a
+# PVC monitoring pluginscript and expanded upon as required.
+
+# A monitoring plugin script must implement the class "MonitoringPluginScript" which
+# extends "MonitoringPlugin", providing the 3 functions indicated. Detailed explanation
+# of the role of each function is provided in context of the example; see the other
+# examples for more potential uses.
+
+# WARNING:
+#
+# This script will run in the context of the node daemon keepalives as root.
+# DO NOT install untrusted, unvetted plugins under any circumstances.
+
+
+# This import is always required here, as MonitoringPlugin is used by the
+# MonitoringPluginScript class
+from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+
+
+# A monitoring plugin script must always expose its nice name, which must be identical to
+# the file name
+PLUGIN_NAME = "zkpr"
+
+
+# The MonitoringPluginScript class must be named as such, and extend MonitoringPlugin.
+class MonitoringPluginScript(MonitoringPlugin):
+    def setup(self):
+        """
+        setup(): Perform special setup steps during node daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not zkpr in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+
+        pass
+
+    def run(self):
+        """
+        run(): Perform the check actions and return a PluginResult object
+        """
+
+        # Run any imports first
+        from kazoo.client import KazooClient, KazooState
+
+        zk_conn = None
+
+        # Set the health delta to 0 (no change)
+        health_delta = 0
+        # Craft a message that can be used by the clients
+        message = "Successfully connected to Zookeeper on localhost"
+
+        # Check the Zookeeper connection
+        try:
+            zk_conn = KazooClient(hosts=[f"{self.this_node.name}:2181"], timeout=1, read_only=True)
+            zk_conn.start(timeout=1)
+            data = zk_conn.get('/schema/version')
+        except Exception as e:
+            health_delta = 50
+            message = f"Failed to connect to Zookeeper: {e}"
+        finally:
+            if zk_conn is not None:
+                zk_conn.stop()
+                zk_conn.close()
+
+        # Set the health delta in our local PluginResult object
+        self.plugin_result.set_health_delta(health_delta)
+
+        # Set the message in our local PluginResult object
+        self.plugin_result.set_message(message)
+
+        # Return our local PluginResult object
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Perform special cleanup steps during node daemon termination
+
+        This step is optional and should be used sparingly.
+        """
+
+        pass
--- a/node-daemon/pvcnoded.sample.yaml
+++ b/node-daemon/pvcnoded.sample.yaml
@ -128,6 +128,8 @@ pvc:
    configuration:
      # directories: PVC system directories
      directories:
+        # plugin_directory: Directory containing node monitoring plugins
+        plugin_directory: "/usr/share/pvc/plugins"
        # dynamic_directory: Temporary in-memory directory for active configurations
        dynamic_directory: "/run/pvc"
        # log_directory: Logging directory
@ -150,8 +152,8 @@ pvc:
        log_keepalives: True
        # log_keepalive_cluster_details: Enable or disable node status logging during keepalive
        log_keepalive_cluster_details: True
-        # log_keepalive_storage_details: Enable or disable node storage logging during keepalive
-        log_keepalive_storage_details: True
+        # log_keepalive_plugin_details: Enable or disable node health plugin logging during keepalive
+        log_keepalive_plugin_details: True
        # console_log_lines: Number of console log lines to store in Zookeeper per VM
        console_log_lines: 1000
        # node_log_lines: Number of node log lines to store in Zookeeper per node
--- a/node-daemon/pvcnoded/Daemon.py
+++ b/node-daemon/pvcnoded/Daemon.py
@ -27,6 +27,7 @@ import pvcnoded.util.services
 import pvcnoded.util.libvirt
 import pvcnoded.util.zookeeper

+import pvcnoded.objects.MonitoringInstance as MonitoringInstance
 import pvcnoded.objects.DNSAggregatorInstance as DNSAggregatorInstance
 import pvcnoded.objects.MetadataAPIInstance as MetadataAPIInstance
 import pvcnoded.objects.VMInstance as VMInstance
@ -48,7 +49,7 @@ import re
 import json

 # Daemon version
-version = "0.9.60"
+version = "0.9.65"


 ##########################################################
@ -58,6 +59,7 @@ version = "0.9.60"

 def entrypoint():
    keepalive_timer = None
+    monitoring_instance = None

    # Get our configuration
    config = pvcnoded.util.config.get_configuration()
@ -204,7 +206,7 @@ def entrypoint():

    # Define a cleanup function
    def cleanup(failure=False):
-        nonlocal logger, zkhandler, keepalive_timer, d_domain
+        nonlocal logger, zkhandler, keepalive_timer, d_domain, monitoring_instance

        logger.out("Terminating pvcnoded and cleaning up", state="s")

@ -253,6 +255,13 @@ def entrypoint():
        except Exception:
            pass

+        # Clean up any monitoring plugins that have cleanup
+        try:
+            logger.out("Performing monitoring plugin cleanup", state="s")
+            monitoring_instance.run_cleanups()
+        except Exception:
+            pass
+
        # Set stop state in Zookeeper
        zkhandler.write([(("node.state.daemon", config["node_hostname"]), "stop")])

@ -1015,9 +1024,14 @@ def entrypoint():
                        state="i",
                    )

+    # Set up the node monitoring instance
+    monitoring_instance = MonitoringInstance.MonitoringInstance(
+        zkhandler, config, logger, this_node
+    )
+
    # Start keepalived thread
    keepalive_timer = pvcnoded.util.keepalive.start_keepalive_timer(
-        logger, config, zkhandler, this_node
+        logger, config, zkhandler, this_node, monitoring_instance
    )

    # Tick loop; does nothing since everything is async
--- a/node-daemon/pvcnoded/objects/MonitoringInstance.py
+++ b/node-daemon/pvcnoded/objects/MonitoringInstance.py
@ -0,0 +1,411 @@
+#!/usr/bin/env python3
+
+# PluginInstance.py - Class implementing a PVC monitoring instance
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import concurrent.futures
+import time
+import importlib.util
+
+from os import walk
+from datetime import datetime
+from json import dumps
+
+
+class PluginError(Exception):
+    """
+    An exception that results from a plugin failing setup
+    """
+
+    pass
+
+
+class PluginResult(object):
+    def __init__(self, zkhandler, config, logger, this_node, plugin_name):
+        self.zkhandler = zkhandler
+        self.config = config
+        self.logger = logger
+        self.this_node = this_node
+        self.plugin_name = plugin_name
+        self.current_time = int(time.time())
+        self.health_delta = 0
+        self.message = "N/A"
+        self.data = {}
+        self.runtime = "0.00"
+
+    def set_health_delta(self, new_delta):
+        self.health_delta = new_delta
+
+    def set_message(self, new_message):
+        self.message = new_message
+
+    def set_data(self, new_data):
+        self.data = new_data
+
+    def set_runtime(self, new_runtime):
+        self.runtime = new_runtime
+
+    def to_zookeeper(self):
+        self.zkhandler.write(
+            [
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.name",
+                        self.plugin_name,
+                    ),
+                    self.plugin_name,
+                ),
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.last_run",
+                        self.plugin_name,
+                    ),
+                    self.current_time,
+                ),
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.health_delta",
+                        self.plugin_name,
+                    ),
+                    self.health_delta,
+                ),
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.message",
+                        self.plugin_name,
+                    ),
+                    self.message,
+                ),
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.data",
+                        self.plugin_name,
+                    ),
+                    dumps(self.data),
+                ),
+                (
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin.runtime",
+                        self.plugin_name,
+                    ),
+                    self.runtime,
+                ),
+            ]
+        )
+
+
+class MonitoringPlugin(object):
+    def __init__(self, zkhandler, config, logger, this_node, plugin_name):
+        self.zkhandler = zkhandler
+        self.config = config
+        self.logger = logger
+        self.this_node = this_node
+        self.plugin_name = plugin_name
+
+        self.plugin_result = PluginResult(
+            self.zkhandler,
+            self.config,
+            self.logger,
+            self.this_node,
+            self.plugin_name,
+        )
+
+    def __str__(self):
+        return self.plugin_name
+
+    #
+    # Helper functions; exposed to child MonitoringPluginScript instances
+    #
+    def log(self, message, state="d"):
+        """
+        Log a message to the PVC logger instance using the plugin name as a prefix
+        Takes "state" values as defined by the PVC logger instance, defaulting to debug:
+            "d": debug
+            "i": informational
+            "t": tick/keepalive
+            "w": warning
+            "e": error
+        """
+        if state == "d" and not self.config["debug"]:
+            return
+
+        self.logger.out(message, state=state, prefix=self.plugin_name)
+
+    #
+    # Primary class functions; implemented by the individual plugins
+    #
+    def setup(self):
+        """
+        setup(): Perform setup of the plugin; run once during daemon startup
+
+        This step is optional and should be used sparingly.
+
+        If you wish for the plugin to not load in certain conditions, do any checks here
+        and return a non-None failure message to indicate the error.
+        """
+        pass
+
+    def run(self):
+        """
+        run(): Run the plugin, returning a PluginResult object
+        """
+        return self.plugin_result
+
+    def cleanup(self):
+        """
+        cleanup(): Clean up after the plugin; run once during daemon shutdown
+        OPTIONAL
+        """
+        pass
+
+
+class MonitoringInstance(object):
+    def __init__(self, zkhandler, config, logger, this_node):
+        self.zkhandler = zkhandler
+        self.config = config
+        self.logger = logger
+        self.this_node = this_node
+
+        # Get a list of plugins from the plugin_directory
+        plugin_files = next(walk(self.config["plugin_directory"]), (None, None, []))[
+            2
+        ]  # [] if no file
+
+        self.all_plugins = list()
+        self.all_plugin_names = list()
+
+        successful_plugins = 0
+
+        # Load each plugin file into the all_plugins list
+        for plugin_file in sorted(plugin_files):
+            try:
+                self.logger.out(
+                    f"Loading monitoring plugin from {self.config['plugin_directory']}/{plugin_file}",
+                    state="i",
+                )
+                loader = importlib.machinery.SourceFileLoader(
+                    "plugin_script", f"{self.config['plugin_directory']}/{plugin_file}"
+                )
+                spec = importlib.util.spec_from_loader(loader.name, loader)
+                plugin_script = importlib.util.module_from_spec(spec)
+                spec.loader.exec_module(plugin_script)
+
+                plugin = plugin_script.MonitoringPluginScript(
+                    self.zkhandler,
+                    self.config,
+                    self.logger,
+                    self.this_node,
+                    plugin_script.PLUGIN_NAME,
+                )
+
+                failed_setup = plugin.setup()
+                if failed_setup is not None:
+                    raise PluginError(f"{failed_setup}")
+
+                # Create plugin key
+                self.zkhandler.write(
+                    [
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.name",
+                                plugin.plugin_name,
+                            ),
+                            plugin.plugin_name,
+                        ),
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.last_run",
+                                plugin.plugin_name,
+                            ),
+                            "0",
+                        ),
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.health_delta",
+                                plugin.plugin_name,
+                            ),
+                            "0",
+                        ),
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.message",
+                                plugin.plugin_name,
+                            ),
+                            "Initializing",
+                        ),
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.data",
+                                plugin.plugin_name,
+                            ),
+                            dumps({}),
+                        ),
+                        (
+                            (
+                                "node.monitoring.data",
+                                self.this_node.name,
+                                "monitoring_plugin.runtime",
+                                plugin.plugin_name,
+                            ),
+                            "0.00",
+                        ),
+                    ]
+                )
+
+                self.all_plugins.append(plugin)
+                self.all_plugin_names.append(plugin.plugin_name)
+                successful_plugins += 1
+
+                self.logger.out(
+                    f"Successfully loaded monitoring plugin '{plugin.plugin_name}'",
+                    state="o",
+                )
+            except Exception as e:
+                self.logger.out(
+                    f"Failed to load monitoring plugin: {e}",
+                    state="w",
+                )
+
+        self.zkhandler.write(
+            [
+                (
+                    ("node.monitoring.plugins", self.this_node.name),
+                    " ".join(self.all_plugin_names),
+                ),
+            ]
+        )
+
+        if successful_plugins < 1:
+            return
+
+        # Clean up any old plugin data for which a plugin file no longer exists
+        for plugin_key in self.zkhandler.children(
+            ("node.monitoring.data", self.this_node.name)
+        ):
+            if plugin_key not in self.all_plugin_names:
+                self.zkhandler.delete(
+                    (
+                        "node.monitoring.data",
+                        self.this_node.name,
+                        "monitoring_plugin",
+                        plugin_key,
+                    )
+                )
+
+    def run_plugin(self, plugin):
+        time_start = datetime.now()
+        try:
+            result = plugin.run()
+        except Exception as e:
+            self.logger.out(
+                f"Monitoring plugin {plugin.plugin_name} failed: {type(e).__name__}: {e}",
+                state="e",
+            )
+            # Whatever it had, we try to return
+            return plugin.plugin_result
+        time_end = datetime.now()
+        time_delta = time_end - time_start
+        runtime = "{:0.02f}".format(time_delta.total_seconds())
+        result.set_runtime(runtime)
+        result.to_zookeeper()
+        return result
+
+    def run_plugins(self):
+        total_health = 100
+        if self.config["log_keepalive_plugin_details"]:
+            self.logger.out(
+                f"Running monitoring plugins: {', '.join([x.plugin_name for x in self.all_plugins])}",
+                state="t",
+            )
+        plugin_results = list()
+        with concurrent.futures.ThreadPoolExecutor(max_workers=99) as executor:
+            to_future_plugin_results = {
+                executor.submit(self.run_plugin, plugin): plugin
+                for plugin in self.all_plugins
+            }
+            for future in concurrent.futures.as_completed(to_future_plugin_results):
+                plugin_results.append(future.result())
+
+        for result in sorted(plugin_results, key=lambda x: x.plugin_name):
+            if (
+                self.config["log_keepalives"]
+                and self.config["log_keepalive_plugin_details"]
+            ):
+                self.logger.out(
+                    result.message + f" [-{result.health_delta}]",
+                    state="t",
+                    prefix=f"{result.plugin_name} ({result.runtime}s)",
+                )
+            total_health -= result.health_delta
+
+        if total_health < 0:
+            total_health = 0
+
+        self.zkhandler.write(
+            [
+                (
+                    ("node.monitoring.health", self.this_node.name),
+                    total_health,
+                ),
+            ]
+        )
+
+    def run_cleanup(self, plugin):
+        return plugin.cleanup()
+
+    def run_cleanups(self):
+        with concurrent.futures.ThreadPoolExecutor(max_workers=99) as executor:
+            to_future_plugin_results = {
+                executor.submit(self.run_cleanup, plugin): plugin
+                for plugin in self.all_plugins
+            }
+            for future in concurrent.futures.as_completed(to_future_plugin_results):
+                # This doesn't do anything, just lets us wait for them all to complete
+                pass
+        # Set the node health to None as no previous checks are now valid
+        self.zkhandler.write(
+            [
+                (
+                    ("node.monitoring.health", self.this_node.name),
+                    None,
+                ),
+            ]
+        )
--- a/node-daemon/pvcnoded/objects/NodeInstance.py
+++ b/node-daemon/pvcnoded/objects/NodeInstance.py
@ -67,6 +67,7 @@ class NodeInstance(object):
        self.network_list = []
        self.domain_list = []
        # Node resources
+        self.health = 100
        self.domains_count = 0
        self.memused = 0
        self.memfree = 0
@ -224,6 +225,33 @@ class NodeInstance(object):
                        )
                        self.flush_thread.start()

+        try:
+
+            @self.zkhandler.zk_conn.DataWatch(
+                self.zkhandler.schema.path("node.monitoring.health", self.name)
+            )
+            def watch_node_health(data, stat, event=""):
+                if event and event.type == "DELETED":
+                    # The key has been deleted after existing before; terminate this watcher
+                    # because this class instance is about to be reaped in Daemon.py
+                    return False
+
+                try:
+                    data = data.decode("ascii")
+                except AttributeError:
+                    data = 100
+
+                try:
+                    data = int(data)
+                except ValueError:
+                    pass
+
+                if data != self.health:
+                    self.health = data
+
+        except Exception:
+            pass
+
        @self.zkhandler.zk_conn.DataWatch(
            self.zkhandler.schema.path("node.memory.free", self.name)
        )
@ -762,6 +790,19 @@ class NodeInstance(object):
                self.flush_stopper = False
                return

+            # Wait for a VM in "restart" or "shutdown" state to complete transition
+            while self.zkhandler.read(("domain.state", dom_uuid)) in [
+                "restart",
+                "shutdown",
+            ]:
+                self.logger.out(
+                    'Waiting 2s for VM state change completion for VM "{}"'.format(
+                        dom_uuid
+                    ),
+                    state="i",
+                )
+                time.sleep(2)
+
            self.logger.out(
                'Selecting target to migrate VM "{}"'.format(dom_uuid), state="i"
            )
@ -778,17 +819,19 @@ class NodeInstance(object):

            if target_node is None:
                self.logger.out(
-                    'Failed to find migration target for VM "{}"; shutting down and setting autostart flag'.format(
+                    'Failed to find migration target for running VM "{}"; shutting down and setting autostart flag'.format(
                        dom_uuid
                    ),
                    state="e",
                )
-                self.zkhandler.write(
-                    [
-                        (("domain.state", dom_uuid), "shutdown"),
-                        (("domain.meta.autostart", dom_uuid), "True"),
-                    ]
-                )
+
+                if self.zkhandler.read(("domain.state", dom_uuid)) in ["start"]:
+                    self.zkhandler.write(
+                        [
+                            (("domain.state", dom_uuid), "shutdown"),
+                            (("domain.meta.autostart", dom_uuid), "True"),
+                        ]
+                    )
            else:
                self.logger.out(
                    'Migrating VM "{}" to node "{}"'.format(dom_uuid, target_node),
--- a/node-daemon/pvcnoded/util/config.py
+++ b/node-daemon/pvcnoded/util/config.py
@ -180,6 +180,9 @@ def get_configuration():
        raise MalformedConfigurationError(e)

    config_directories = {
+        "plugin_directory": o_directories.get(
+            "plugin_directory", "/usr/share/pvc/plugins"
+        ),
        "dynamic_directory": o_directories.get("dynamic_directory", None),
        "log_directory": o_directories.get("log_directory", None),
        "console_log_directory": o_directories.get("console_log_directory", None),
@ -225,8 +228,8 @@ def get_configuration():
        "log_keepalive_cluster_details": o_logging.get(
            "log_keepalive_cluster_details", False
        ),
-        "log_keepalive_storage_details": o_logging.get(
-            "log_keepalive_storage_details", False
+        "log_keepalive_plugin_details": o_logging.get(
+            "log_keepalive_plugin_details", False
        ),
        "console_log_lines": o_logging.get("console_log_lines", False),
        "node_log_lines": o_logging.get("node_log_lines", False),
--- a/node-daemon/pvcnoded/util/keepalive.py
+++ b/node-daemon/pvcnoded/util/keepalive.py
@ -51,7 +51,7 @@ libvirt_vm_states = {
 }


-def start_keepalive_timer(logger, config, zkhandler, this_node):
+def start_keepalive_timer(logger, config, zkhandler, this_node, monitoring_instance):
    keepalive_interval = config["keepalive_interval"]
    logger.out(
        f"Starting keepalive timer ({keepalive_interval} second interval)", state="s"
@ -59,7 +59,7 @@ def start_keepalive_timer(logger, config, zkhandler, this_node):
    keepalive_timer = BackgroundScheduler()
    keepalive_timer.add_job(
        node_keepalive,
-        args=(logger, config, zkhandler, this_node),
+        args=(logger, config, zkhandler, this_node, monitoring_instance),
        trigger="interval",
        seconds=keepalive_interval,
    )
@ -97,34 +97,12 @@ def collect_ceph_stats(logger, config, zkhandler, this_node, queue):
        logger.out("Failed to open connection to Ceph cluster: {}".format(e), state="e")
        return

-    if debug:
-        logger.out("Getting health stats from monitor", state="d", prefix="ceph-thread")
-
-    # Get Ceph cluster health for local status output
-    command = {"prefix": "health", "format": "json"}
-    try:
-        health_status = json.loads(
-            ceph_conn.mon_command(json.dumps(command), b"", timeout=1)[1]
-        )
-        ceph_health = health_status["status"]
-    except Exception as e:
-        logger.out("Failed to obtain Ceph health data: {}".format(e), state="e")
-        ceph_health = "HEALTH_UNKN"
-
-    if ceph_health in ["HEALTH_OK"]:
-        ceph_health_colour = logger.fmt_green
-    elif ceph_health in ["HEALTH_UNKN"]:
-        ceph_health_colour = logger.fmt_cyan
-    elif ceph_health in ["HEALTH_WARN"]:
-        ceph_health_colour = logger.fmt_yellow
-    else:
-        ceph_health_colour = logger.fmt_red
-
    # Primary-only functions
    if this_node.router_state == "primary":
+        # Get Ceph status information (pretty)
        if debug:
            logger.out(
-                "Set ceph health information in zookeeper (primary only)",
+                "Set Ceph status information in zookeeper (primary only)",
                state="d",
                prefix="ceph-thread",
            )
@ -138,9 +116,27 @@ def collect_ceph_stats(logger, config, zkhandler, this_node, queue):
        except Exception as e:
            logger.out("Failed to set Ceph status data: {}".format(e), state="e")

+        # Get Ceph health information (JSON)
        if debug:
            logger.out(
-                "Set ceph rados df information in zookeeper (primary only)",
+                "Set Ceph health information in zookeeper (primary only)",
+                state="d",
+                prefix="ceph-thread",
+            )
+
+        command = {"prefix": "health", "format": "json"}
+        ceph_health = ceph_conn.mon_command(json.dumps(command), b"", timeout=1)[
+            1
+        ].decode("ascii")
+        try:
+            zkhandler.write([("base.storage.health", str(ceph_health))])
+        except Exception as e:
+            logger.out("Failed to set Ceph health data: {}".format(e), state="e")
+
+        # Get Ceph df information (pretty)
+        if debug:
+            logger.out(
+                "Set Ceph rados df information in zookeeper (primary only)",
                state="d",
                prefix="ceph-thread",
            )
@ -408,8 +404,6 @@ def collect_ceph_stats(logger, config, zkhandler, this_node, queue):

    ceph_conn.shutdown()

-    queue.put(ceph_health_colour)
-    queue.put(ceph_health)
    queue.put(osds_this_node)

    if debug:
@ -648,10 +642,29 @@ def collect_vm_stats(logger, config, zkhandler, this_node, queue):


 # Keepalive update function
-def node_keepalive(logger, config, zkhandler, this_node):
+def node_keepalive(logger, config, zkhandler, this_node, monitoring_instance):
    debug = config["debug"]
-    if debug:
-        logger.out("Keepalive starting", state="d", prefix="main-thread")
+
+    # Display node information to the terminal
+    if config["log_keepalives"]:
+        if this_node.router_state == "primary":
+            cst_colour = logger.fmt_green
+        elif this_node.router_state == "secondary":
+            cst_colour = logger.fmt_blue
+        else:
+            cst_colour = logger.fmt_cyan
+        logger.out(
+            "{}{} keepalive @ {}{} [{}{}{}]".format(
+                logger.fmt_purple,
+                config["node_hostname"],
+                datetime.now(),
+                logger.fmt_end,
+                logger.fmt_bold + cst_colour,
+                this_node.router_state,
+                logger.fmt_end,
+            ),
+            state="t",
+        )

    # Set the migration selector in Zookeeper for clients to read
    if config["enable_hypervisor"]:
@ -777,16 +790,14 @@ def node_keepalive(logger, config, zkhandler, this_node):

    if config["enable_storage"]:
        try:
-            ceph_health_colour = ceph_thread_queue.get(
-                timeout=config["keepalive_interval"]
+            osds_this_node = ceph_thread_queue.get(
+                timeout=(config["keepalive_interval"] - 1)
            )
-            ceph_health = ceph_thread_queue.get(timeout=config["keepalive_interval"])
-            osds_this_node = ceph_thread_queue.get(timeout=config["keepalive_interval"])
        except Exception:
            logger.out("Ceph stats queue get exceeded timeout, continuing", state="w")
-            ceph_health_colour = logger.fmt_cyan
-            ceph_health = "UNKNOWN"
            osds_this_node = "?"
+    else:
+        osds_this_node = "0"

    # Set our information in zookeeper
    keepalive_time = int(time.time())
@ -816,60 +827,51 @@ def node_keepalive(logger, config, zkhandler, this_node):
    except Exception:
        logger.out("Failed to set keepalive data", state="e")

-    # Display node information to the terminal
+    # Run this here since monitoring plugins output directly
+    monitoring_instance.run_plugins()
+    # Allow the health value to update in the Node instance
+    time.sleep(0.1)
+
    if config["log_keepalives"]:
-        if this_node.router_state == "primary":
-            cst_colour = logger.fmt_green
-        elif this_node.router_state == "secondary":
-            cst_colour = logger.fmt_blue
+        if this_node.maintenance is True:
+            maintenance_colour = logger.fmt_blue
        else:
-            cst_colour = logger.fmt_cyan
-        logger.out(
-            "{}{} keepalive @ {}{} [{}{}{}]".format(
-                logger.fmt_purple,
-                config["node_hostname"],
-                datetime.now(),
-                logger.fmt_end,
-                logger.fmt_bold + cst_colour,
-                this_node.router_state,
-                logger.fmt_end,
-            ),
-            state="t",
-        )
+            maintenance_colour = logger.fmt_green
+
+        if isinstance(this_node.health, int):
+            if this_node.health > 90:
+                health_colour = logger.fmt_green
+            elif this_node.health > 50:
+                health_colour = logger.fmt_yellow
+            else:
+                health_colour = logger.fmt_red
+            health_text = str(this_node.health) + "%"
+
+        else:
+            health_colour = logger.fmt_blue
+            health_text = "N/A"
+
        if config["log_keepalive_cluster_details"]:
            logger.out(
-                "{bold}Maintenance:{nofmt} {maint}  "
-                "{bold}Active VMs:{nofmt} {domcount}  "
-                "{bold}Networks:{nofmt} {netcount}  "
+                "{bold}Maintenance:{nofmt} {maintenance_colour}{maintenance}{nofmt}  "
+                "{bold}Health:{nofmt} {health_colour}{health}{nofmt}  "
+                "{bold}VMs:{nofmt} {domcount}  "
+                "{bold}OSDs:{nofmt} {osdcount}  "
                "{bold}Load:{nofmt} {load}  "
-                "{bold}Memory [MiB]: VMs:{nofmt} {allocmem}  "
+                "{bold}Memory [MiB]: "
                "{bold}Used:{nofmt} {usedmem}  "
                "{bold}Free:{nofmt} {freemem}".format(
                    bold=logger.fmt_bold,
+                    maintenance_colour=maintenance_colour,
+                    health_colour=health_colour,
                    nofmt=logger.fmt_end,
-                    maint=this_node.maintenance,
+                    maintenance=this_node.maintenance,
+                    health=health_text,
                    domcount=this_node.domains_count,
-                    netcount=len(zkhandler.children("base.network")),
+                    osdcount=osds_this_node,
                    load=this_node.cpuload,
                    freemem=this_node.memfree,
                    usedmem=this_node.memused,
-                    allocmem=this_node.memalloc,
-                ),
-                state="t",
-            )
-        if config["enable_storage"] and config["log_keepalive_storage_details"]:
-            logger.out(
-                "{bold}Ceph cluster status:{nofmt} {health_colour}{health}{nofmt}  "
-                "{bold}Total OSDs:{nofmt} {total_osds}  "
-                "{bold}Node OSDs:{nofmt} {node_osds}  "
-                "{bold}Pools:{nofmt} {total_pools}  ".format(
-                    bold=logger.fmt_bold,
-                    health_colour=ceph_health_colour,
-                    nofmt=logger.fmt_end,
-                    health=ceph_health,
-                    total_osds=len(zkhandler.children("base.osd")),
-                    node_osds=osds_this_node,
-                    total_pools=len(zkhandler.children("base.pool")),
                ),
                state="t",
            )
@ -917,6 +919,3 @@ def node_keepalive(logger, config, zkhandler, this_node):
                            zkhandler.write(
                                [(("node.state.daemon", node_name), "dead")]
                            )
-
-    if debug:
-        logger.out("Keepalive finished", state="d", prefix="main-thread")
--- a/test-cluster.sh
+++ b/test-cluster.sh
@ -1,31 +1,54 @@
 #!/usr/bin/env bash

-set -o errexit
-
 if [[ -z ${1} ]]; then
    echo "Please specify a cluster to run tests against."
    exit 1
 fi
 test_cluster="${1}"
+shift
+
+if [[ ${1} == "--test-dangerously" ]]; then
+    test_dangerously="y"
+else
+    test_dangerously=""
+fi

 _pvc() {
-    echo "> pvc --cluster ${test_cluster} $@"
-    pvc --quiet --cluster ${test_cluster} "$@"
+    echo "> pvc --connection ${test_cluster} $@"
+    pvc --quiet --connection ${test_cluster} "$@"
    sleep 1
 }

 time_start=$(date +%s)

+set -o errexit
+
+pushd $( git rev-parse --show-toplevel ) &>/dev/null
+
 # Cluster tests
-_pvc maintenance on
-_pvc maintenance off
+_pvc connection list
+_pvc connection detail
+
+_pvc cluster maintenance on
+_pvc cluster maintenance off
+_pvc cluster status
 backup_tmp=$(mktemp)
-_pvc task backup --file ${backup_tmp}
-_pvc task restore --yes --file ${backup_tmp}
+_pvc cluster backup --file ${backup_tmp}
+if [[ -n ${test_dangerously} ]]; then
+    # This is dangerous, so don't test it unless option given
+    _pvc cluster restore --yes --file ${backup_tmp}
+fi
 rm ${backup_tmp} || true

 # Provisioner tests
-_pvc provisioner profile list test
+_pvc provisioner profile list test || true
+_pvc provisioner template system add --vcpus 1 --vram 1024 --serial --vnc --vnc-bind 0.0.0.0 --node-limit hv1 --node-selector mem --node-autostart --migration-method live system-test || true
+_pvc provisioner template network add network-test || true
+_pvc provisioner template network vni add network-test 10000 || true
+_pvc provisioner template storage add storage-test || true
+_pvc provisioner template storage disk add --pool vms --size 8 --filesystem ext4 --mountpoint / storage-test sda || true
+_pvc provisioner script add script-test $( find . -name "3-debootstrap.py" ) || true
+_pvc provisioner profile add --profile-type provisioner --system-template system-test --network-template network-test --storage-template storage-test --userdata empty --script script-test --script-arg deb_release=bullseye test || true
 _pvc provisioner create --wait testx test
 sleep 30

@ -36,7 +59,7 @@ _pvc vm shutdown --yes --wait testx
 _pvc vm start testx
 sleep 30
 _pvc vm stop --yes testx
-_pvc vm disable testx
+_pvc vm disable --yes testx
 _pvc vm undefine --yes testx
 _pvc vm define --target hv3 --tag pvc-test ${vm_tmp}
 _pvc vm start testx
@ -49,21 +72,21 @@ _pvc vm unmigrate --wait testx
 sleep 5
 _pvc vm move --wait --target hv1 testx
 sleep 5
-_pvc vm meta testx --limit hv1 --selector vms --method live --profile test --no-autostart
+_pvc vm meta testx --limit hv1 --node-selector vms --method live --profile test --no-autostart
 _pvc vm tag add testx mytag
 _pvc vm tag get testx
 _pvc vm list --tag mytag
 _pvc vm tag remove testx mytag
 _pvc vm network get testx
-_pvc vm vcpu set testx 4
+_pvc vm vcpu set --no-restart testx 4
 _pvc vm vcpu get testx
-_pvc vm memory set testx 4096
+_pvc vm memory set --no-restart testx 4096
 _pvc vm memory get testx
-_pvc vm vcpu set testx 2
+_pvc vm vcpu set --no-restart testx 2
 _pvc vm memory set testx 2048 --restart --yes
-sleep 5
+sleep 15
 _pvc vm list testx
-_pvc vm info --long testx
+_pvc vm info --format long testx
 rm ${vm_tmp} || true

 # Node tests
@ -77,6 +100,7 @@ _pvc node flush --wait hv1
 _pvc node ready --wait hv1
 _pvc node list hv1
 _pvc node info hv1
+sleep 15

 # Network tests
 _pvc network add 10001 --description testing --type managed --domain testing.local --ipnet 10.100.100.0/24 --gateway 10.100.100.1 --dhcp --dhcp-start 10.100.100.100 --dhcp-end 10.100.100.199
@ -84,7 +108,7 @@ sleep 5
 _pvc vm network add --restart --yes testx 10001
 sleep 30
 _pvc vm network remove --restart --yes testx 10001
-sleep 5
+sleep 15

 _pvc network acl add 10001 --in --description test-acl --order 0 --rule "'ip daddr 10.0.0.0/8 counter'"
 _pvc network acl list 10001
@ -95,31 +119,34 @@ _pvc network dhcp remove --yes 10001 12:34:56:78:90:ab

 _pvc network modify --domain test10001.local 10001
 _pvc network list
-_pvc network info --long 10001
+_pvc network info --format long 10001

 # Network-VM interaction tests
 _pvc vm network add testx 10001 --model virtio --restart --yes
 sleep 30
 _pvc vm network get testx
 _pvc vm network remove testx 10001 --restart --yes
-sleep 5
+sleep 15

 _pvc network remove --yes 10001

 # Storage tests
 _pvc storage status
 _pvc storage util
-_pvc storage osd set noout
-_pvc storage osd out 0
-_pvc storage osd in 0
-_pvc storage osd unset noout
+if [[ -n ${test_dangerously} ]]; then
+    # This is dangerous, so don't test it unless option given
+    _pvc storage osd set noout
+    _pvc storage osd out 0
+    _pvc storage osd in 0
+    _pvc storage osd unset noout
+fi
 _pvc storage osd list
 _pvc storage pool add testing 64 --replcfg "copies=3,mincopies=2"
 sleep 5
 _pvc storage pool list
 _pvc storage volume add testing testx 1G
-_pvc storage volume resize testing testx 2G
-_pvc storage volume rename testing testx testerX
+_pvc storage volume resize --yes testing testx 2G
+_pvc storage volume rename --yes testing testx testerX
 _pvc storage volume clone testing testerX testerY
 _pvc storage volume list --pool testing
 _pvc storage volume snapshot add testing testerX asnapshotX
@ -132,7 +159,7 @@ _pvc vm volume add testx --type rbd --disk-id sdh --bus scsi testing/testerY --r
 sleep 30
 _pvc vm volume get testx
 _pvc vm volume remove testx testing/testerY --restart --yes
-sleep 5
+sleep 15

 _pvc storage volume remove --yes testing testerY
 _pvc storage volume remove --yes testing testerX
@ -142,6 +169,14 @@ _pvc storage pool remove --yes testing
 _pvc vm stop --yes testx
 _pvc vm remove --yes testx

+_pvc provisioner profile remove --yes test
+_pvc provisioner script remove --yes script-test
+_pvc provisioner template system remove --yes system-test
+_pvc provisioner template network remove --yes network-test
+_pvc provisioner template storage remove --yes storage-test
+
+popd
+
 time_end=$(date +%s)

 echo
Author	SHA1	Message	Date
Joshua M. Boniface	1e083d7652	Bump version to 0.9.65	2023-08-23 01:56:57 -04:00
Joshua M. Boniface	65d2b7869c	Restore original no-connection behavior Previously not specifying a connection when multiple were available would error. This restores that behaviour.	2023-08-23 01:38:50 -04:00
Joshua M. Boniface	66aee73f1d	Fix incorrect short flags in node list	2023-08-22 09:26:35 -04:00
Joshua M. Boniface	075dbe7cc9	Bump version to 0.9.64	2023-08-18 12:34:27 -04:00
Joshua M. Boniface	2ff7a6865b	Avoid none entries in VM state list	2023-08-18 12:34:27 -04:00
Joshua M. Boniface	2002394a51	Improve timing in test script	2023-08-18 11:58:13 -04:00
Joshua M. Boniface	0e8bdfad15	Improve testing with more tests	2023-08-18 11:44:39 -04:00
Joshua M. Boniface	b5f996febd	Fix bugs for node flush for stop/shutdown/restart Previously VMs in stop/shutdown/restart states wouldn't be properly handled during a node flush. This fixes the bugs and ensures that the transient VM states (shutdown/restart) are completed before proceeding, and then avoids setting a stopped/shutdown VM to shutdown/auotstart.	2023-08-18 11:25:59 -04:00
Joshua M. Boniface	3a4914fa5e	Readd errexit to test script	2023-08-18 10:33:59 -04:00
Joshua M. Boniface	dcda7b5748	Revamp cluster test script	2023-08-17 23:01:38 -04:00
Joshua M. Boniface	ae7950e9b7	Fix bad import	2023-08-17 22:45:50 -04:00
Joshua M. Boniface	d769071799	Revamp behaviour of VM "--restart" options Previously, either "--restart" was specified or a prompt was given, with the prompt being ignored with "--unsafe" in favour of a reboot. This failed to provide an explicit way to prevent VM restarts with these commands, which might be desired in some non-interactive situations, and the interaction of "--unsafe" with this option was an undesired bug. This is now a complete binary flag with --restart and --no-restart versions, while still defaulting to a prompt if neither is specified. This allows full non-interactive control of this option.	2023-08-17 22:19:36 -04:00
Joshua M. Boniface	e298d10561	Ensure ACPI is included in Deb VMs	2023-08-17 11:16:08 -04:00
Joshua M. Boniface	fc8cf9ed44	Ensure consistency in variable names and fix bug	2023-08-17 11:09:51 -04:00
Joshua M. Boniface	4ccdd6347e	Move provisioner wait to helpers and fix	2023-08-17 10:26:19 -04:00
Joshua M. Boniface	b32f478633	Work around strange Python anomaly Apparently, `True` is both an instance of `int` and `bool`, which is a change and is very strange. Instead flip the conditional here.	2023-08-17 09:55:19 -04:00
Joshua M. Boniface	cf442fcc2d	Correct entrypoint for CLI package	2023-08-17 00:27:45 -04:00
Joshua M. Boniface	b753f85410	Update linting options for new CLI client	2023-08-16 23:55:44 -04:00
Joshua M. Boniface	d2bcaec28f	Move new CLI client into place	2023-08-16 23:55:27 -04:00
Joshua M. Boniface	a70273dbae	Move old CLI client out of the way	2023-08-16 23:54:51 -04:00
Joshua M. Boniface	30ebd6b42c	Add provisioner formatters	2023-08-16 23:48:56 -04:00
Joshua M. Boniface	b2e6feeba3	Add storage formatters	2023-08-16 22:46:13 -04:00
Joshua M. Boniface	c9b06ffdb2	Add network formatters	2023-08-10 00:58:36 -04:00
Joshua M. Boniface	a032dcc5c8	Add formatters for Node and VM, fix handling	2023-08-09 13:13:03 -04:00
Joshua M. Boniface	01122415f6	Add provisioner management commands TODO: Add proper new formatters as required	2023-08-09 11:44:43 -04:00
Joshua M. Boniface	bd3e3829b3	Add storage management commands TODO: Add proper new formatters as required	2023-08-09 10:51:44 -04:00
Joshua M. Boniface	e01bbe9764	Add network management commands TODO: Add proper new formatters as required	2023-07-03 00:18:07 -04:00
Joshua M. Boniface	3e7953531c	Add VM management commands TODO: Add proper new formatters as required	2023-07-02 01:03:09 -04:00
Joshua M. Boniface	c7b7ad0cf7	Fix key display and add stubs	2023-07-01 21:51:46 -04:00
Joshua M. Boniface	776daac267	Add node management commands	2023-05-05 02:10:02 -04:00
Joshua M. Boniface	653b95ee25	Normalize return messages for node commands	2023-05-04 17:02:46 -04:00
Joshua M. Boniface	59c9d89986	Port cluster management functions	2023-05-04 03:04:10 -04:00
Joshua M. Boniface	e294e1c087	Initial work on new CLI client rewrite 1. lib copied verbatim from existing client 2. initial reworking of Click to split logic from Click definitions	2023-05-02 17:28:52 -04:00
Joshua M. Boniface	4685ba1ec4	Move cli_lib to lib directory	2023-05-01 13:43:54 -04:00
Joshua M. Boniface	969091ed22	Another slight wording tweak	2023-05-01 11:03:58 -04:00
Joshua M. Boniface	148f04b256	Reword the sections to add clarity	2023-05-01 10:59:23 -04:00
Joshua M. Boniface	dc9e43fbee	Add a bit of shade	2023-05-01 10:56:42 -04:00
Joshua M. Boniface	d8dcec254d	Add another reference to Ganeti and Harvester	2023-05-01 10:54:42 -04:00
Joshua M. Boniface	3a90fda109	Bump version to 0.9.63	2023-04-28 14:47:04 -04:00
Joshua M. Boniface	78322f4de4	Improve size handling during volume add/resize	2023-04-28 12:16:16 -04:00
Joshua M. Boniface	c1782c5004	Add full/nearfull OSD health detection	2023-04-28 11:33:39 -04:00
Joshua M. Boniface	9114255af5	Add .update- obsolete configs to dpkg plugin	2023-04-10 15:39:40 -04:00
Joshua M. Boniface	b26bb5cb65	Mention Ganeti in the docs	2023-03-19 21:23:21 -04:00
Joshua M. Boniface	74c4ce3ec7	Increase timeout for connections to API	2023-03-14 09:19:13 -04:00
Joshua M. Boniface	2c3a3cdf52	Use try when watching health value in NodeInstance	2023-03-07 09:53:01 -05:00
Joshua M. Boniface	0b583bfdaf	Bump IPMI timeout to 2 seconds	2023-03-07 09:25:27 -05:00
Joshua M. Boniface	7c07fbefff	Adjust keepalive health printing and ordering	2023-02-24 11:08:30 -05:00
Joshua M. Boniface	202dc3ed59	Correct error handling if monitoring plugins fail	2023-02-24 10:19:41 -05:00
Joshua M. Boniface	8667f4d03b	Add documentation details about plugin logging	2023-02-23 22:24:07 -05:00
Joshua M. Boniface	4c2d99f8a6	Fix bug with SMART info	2023-02-23 13:21:23 -05:00
Joshua M. Boniface	bcff6650d0	Set timeout on IPMI command	2023-02-23 11:10:09 -05:00
Joshua M. Boniface	a11206253d	Fix ZK check location	2023-02-23 11:04:02 -05:00
Joshua M. Boniface	7f57c6dbf7	Adjust the main location too	2023-02-23 10:32:31 -05:00
Joshua M. Boniface	6865979e08	Show possible version minimum	2023-02-23 10:30:45 -05:00
Joshua M. Boniface	5126bc3272	Handle old clusters in cluster detail list	2023-02-23 10:28:55 -05:00
Joshua M. Boniface	765f0ef13d	Better handle N/A health from old versions	2023-02-23 10:22:00 -05:00
Joshua M. Boniface	fe258d9d56	Correct bad health text call for old clusters	2023-02-23 10:19:18 -05:00
Joshua M. Boniface	93d89a2414	Fix status when connecting to old clusters	2023-02-23 10:16:29 -05:00
Joshua M. Boniface	a49f3810d3	Set maintenance colour in cluster detail	2023-02-22 18:20:18 -05:00
Joshua M. Boniface	45ad3b9a17	Bump version to 0.9.62	2023-02-22 18:13:45 -05:00
Joshua M. Boniface	07623fad1a	Merge branch 'revamp-health' Add detailed health checking, status reporting, and enhancements to the PVC system. Closes #161 #154 #159	2023-02-22 18:12:35 -05:00
Joshua M. Boniface	8331b7ecd8	Add cluster detail list Adds a command to show a list of details including health and item counts for all configured clusters in the client.	2023-02-22 18:09:11 -05:00
Joshua M. Boniface	94d4ee5b9b	Lower default connect timeout to 1s	2023-02-22 18:09:01 -05:00
Joshua M. Boniface	e773211293	Add PVC version to cluster status output	2023-02-22 16:09:24 -05:00
Joshua M. Boniface	32c36c866b	Add additional plugins to manual	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	dc4e56db4b	Add IPMI monitoring check	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	e45b3108a2	Add health delta change to message output	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	118237a53b	Fix bad string value for message	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	9805681f94	Use consistent connection with other checks	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	6c9abb2abe	Add Libvirtd monitoring check	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	a1122c6e71	Add Zookeeper monitoring check	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	3696f81597	Add PostgreSQL monitoring check	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	5ca0d903b6	Adjust comment message	2023-02-22 15:02:08 -05:00
Joshua M. Boniface	6ddbde763e	Correct lint error E741	2023-02-22 12:21:29 -05:00
Joshua M. Boniface	626424b74a	Adjust Munin threshold values	2023-02-22 10:42:43 -05:00
Joshua M. Boniface	b3d99827f5	Add documentation about new health and plugins	2023-02-22 01:40:48 -05:00
Joshua M. Boniface	c9ceb3159b	Remove obsolete LINKSPEED variable	2023-02-22 01:04:25 -05:00
Joshua M. Boniface	6525a2568b	Adjust health delta of load to 50 This is a very bad situation and should be critical.	2023-02-22 01:03:12 -05:00
Joshua M. Boniface	09a005d3d7	Adjust health delta of EDAC Uncorrected to 50 This is a very bad situation and should be critical.	2023-02-22 01:01:54 -05:00
Joshua M. Boniface	96defebd0b	Add last item to swagger doc	2023-02-22 00:25:27 -05:00
Joshua M. Boniface	d00b8aa6cd	Add plugin directory and plugin details log fields	2023-02-22 00:19:05 -05:00
Joshua M. Boniface	e9aa545e9b	Update API specification	2023-02-22 00:06:52 -05:00
Joshua M. Boniface	fb0fcc0597	Update readme for Munin plugin	2023-02-18 00:00:04 -05:00
Joshua M. Boniface	3009f24910	Fix typo in var and flip conditional	2023-02-17 16:18:42 -05:00
Joshua M. Boniface	5ae836f1c5	Fix various issues with PVC Munin plugin	2023-02-17 15:41:16 -05:00
Joshua M. Boniface	70ba364f1d	Flip VM state condition to remove shutdown Don't cause health degredation for shutdown state, and flip the list around to make it clearer.	2023-02-16 20:32:33 -05:00
Joshua M. Boniface	eda1b95d5f	Update Munin plugin example	2023-02-16 16:06:00 -05:00
Joshua M. Boniface	3bd93563e6	Add CheckMK monitoring example plugins	2023-02-16 16:05:47 -05:00
Joshua M. Boniface	1f8561d59a	Format cluster health like node healths Make a cleaner construct here.	2023-02-16 12:33:36 -05:00
Joshua M. Boniface	a2efc83953	Exclude monitoring examples from flake8	2023-02-16 12:33:18 -05:00
Joshua M. Boniface	f2d2537e1c	Add JSON output format for node info	2023-02-15 21:35:44 -05:00
Joshua M. Boniface	1093ca6264	Disallow health less than 0	2023-02-15 16:50:24 -05:00
Joshua M. Boniface	15ff729f83	Fix comparison in maintenance check	2023-02-15 16:47:31 -05:00
Joshua M. Boniface	29584e5636	Add per-node health entries for 3rd party checks	2023-02-15 16:44:49 -05:00
Joshua M. Boniface	f4e8449356	Fix bugs and formatting of health messages	2023-02-15 16:28:56 -05:00
Joshua M. Boniface	388f6556c0	Remove extra text from packages plugin	2023-02-15 16:28:41 -05:00
Joshua M. Boniface	ec79acf061	Fix linting of cluster.py file	2023-02-15 15:48:31 -05:00
Joshua M. Boniface	6c7be492b8	Move Ceph health to global cluster health	2023-02-15 15:46:13 -05:00
Joshua M. Boniface	00586074cf	Modify cluster health to use new values	2023-02-15 15:45:43 -05:00
Joshua M. Boniface	f4eef30770	Add JSON health to cluster data	2023-02-15 15:26:57 -05:00
Joshua M. Boniface	8565cf26b3	Add disk monitoring plugin	2023-02-15 11:30:49 -05:00
Joshua M. Boniface	0ecf219910	Run setup during plugin loads	2023-02-15 10:11:38 -05:00
Joshua M. Boniface	0f4edc54d1	Use percentage in keepalie output	2023-02-15 01:56:02 -05:00
Joshua M. Boniface	ca91be51e1	Improve ethtool parsing speeds	2023-02-14 15:49:58 -05:00
Joshua M. Boniface	e29d0e89eb	Add NIC monitoring plugin	2023-02-14 15:43:52 -05:00
Joshua M. Boniface	14d29f2986	Adjust text on log message	2023-02-13 22:21:23 -05:00
Joshua M. Boniface	bc88d764b0	Add logging flag for montioring plugin output	2023-02-13 22:04:39 -05:00
Joshua M. Boniface	a3c31564ca	Flip condition in EDAC check	2023-02-13 21:58:56 -05:00
Joshua M. Boniface	b07396c39a	Fix bugs if plugins fail to load	2023-02-13 21:51:48 -05:00
Joshua M. Boniface	71139fa66d	Add EDAC check plugin	2023-02-13 21:43:13 -05:00
Joshua M. Boniface	e6f9e6e0e8	Fix several bugs and optimize output	2023-02-13 16:36:15 -05:00
Joshua M. Boniface	1ea4800212	Set node health to None when restarting	2023-02-13 15:54:46 -05:00
Joshua M. Boniface	9c14d84bfc	Add node health value and send out API	2023-02-13 15:53:39 -05:00
Joshua M. Boniface	d8f346abdd	Move Ceph cluster health reporting to plugin Also removes several outputs from the normal keepalive that were superfluous/static so that the main output fits on one line.	2023-02-13 13:29:40 -05:00
Joshua M. Boniface	2ee52e44d3	Move Ceph cluster health reporting to plugin Also removes several outputs from the normal keepalive that were superfluous/static so that the main output fits on one line.	2023-02-13 12:13:56 -05:00
Joshua M. Boniface	3c742a827b	Initial implementation of monitoring plugin system	2023-02-13 12:06:26 -05:00
Joshua M. Boniface	aeb238f43c	Bump version to 0.9.61	2023-02-08 10:08:05 -05:00
Joshua M. Boniface	671a907236	Allow rename in disable state	2023-01-30 11:48:43 -05:00
Joshua M. Boniface	e945fd8590	Remove bad casting to int in string compare	2023-01-01 13:55:10 -05:00
 @ -1 +1 @@
 .9.60
 .9.65
				`@ -0,0 +1 @@`
				{"version": "9", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "logs": "/logs", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.health": "/ceph/health", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf", "monitoring.plugins": "/monitoring_plugins", "monitoring.data": "/monitoring_data", "monitoring.health": "/monitoring_health"}, "monitoring_plugin": {"name": "", "last_run": "/last_run", "health_delta": "/health_delta", "message": "/message", "data": "/data", "runtime": "/runtime"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "fsid": "/fsid", "ofsid": "/fsid/osd", "cfsid": "/fsid/cluster", "lvm": "/lvm", "vg": "/lvm/vg", "lv": "/lvm/lv", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "tier": "/tier", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}