Bump version to 0.9.83

Fix bad file paths
Avoid removing changelog file until the end
2023-12-01 17:37:42 -05:00 · 2023-12-01 17:25:12 -05:00 · 2023-12-01 17:23:43 -05:00 · 2023-12-01 03:58:21 -05:00 · 2023-12-01 02:01:24 -05:00 · 2023-12-01 01:53:33 -05:00
79 changed files with 2247 additions and 1375 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,6 +1,8 @@
 *.pyc
 *.tmp
 *.swp
+# Ignore swagger output (for pvc-docs instead)
+swagger.json
 # Ignore build artifacts
 debian/pvc-*/
 debian/*.log
--- a/.version
+++ b/.version
@ -1 +1 @@
-0.9.81
+0.9.83
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,21 @@
 ## PVC Changelog

+###### [v0.9.83](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.83)
+
+  **Breaking Changes:** This release features a breaking change for the daemon config. A new unified "pvc.conf" file is required for all daemons (and the CLI client for Autobackup and API-on-this-host functionality), which will be written by the "pvc" role in the PVC Ansible framework. Using the "update-pvc-daemons" oneshot playbook from PVC Ansible is **required** to update to this release, as it will ensure this file is written to the proper place before deploying the new package versions, and also ensures that the old entires are cleaned up afterwards. In addition, this release fully splits the node worker and health subsystems into discrete daemons ("pvcworkerd" and "pvchealthd") and packages ("pvc-daemon-worker" and "pvc-daemon-health") respectively. The "pvc-daemon-node" package also now depends on both packages, and the "pvc-daemon-api" package can now be reliably used outside of the PVC nodes themselves (for instance, in a VM) without any strange cross-dependency issues.
+
+  * [All] Unifies all daemon (and on-node CLI task) configuration into a "pvc.conf" YAML configuration.
+  * [All] Splits the node worker subsystem into a discrete codebase and package ("pvc-daemon-worker"), still named "pvcworkerd".
+  * [All] Splits the node health subsystem into a discrete codebase and package ("pvc-daemon-health"), named "pvchealthd".
+  * [All] Improves Zookeeper node logging to avoid bugs and to support multiple simultaneous daemon writes.
+  * [All] Fixes several bugs in file logging and splits file logs by daemon.
+  * [Node Daemon] Improves several log messages to match new standards from Health daemon.
+  * [API Daemon] Reworks Celery task routing and handling to move all worker tasks to Worker daemon.
+
+###### [v0.9.82](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.82)
+
+  * [API Daemon] Fixes a bug where the Celery result_backend was not loading properly on Celery <5.2.x (Debian 10/11).
+
 ###### [v0.9.81](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.81)

  **Breaking Changes:** This large release features a number of major changes. While these should all be a seamless transition, the behaviour of several commands and the backend system for handling them has changed significantly, along with new dependencies from PVC Ansible. A full cluster configuration update via `pvc.yml` is recommended after installing this version. Redis is replaced with KeyDB on coordinator nodes as a Celery backend; this transition will be handled gracefully by the `pvc-ansible` playbooks, though note that KeyDB will be exposed on the Upstream interface. The Celery worker system is renamed `pvcworkerd`, is now active on all nodes (coordinator and non-coordinator), and is expanded to encompass several commands that previously used a similar, custom setup within the node daemons, including "pvc vm flush-locks" and all "pvc storage osd" tasks. The previously-mentioned CLI commands now all feature "--wait"/"--no-wait" flags, with wait showing a progress bar and status output of the task run. The "pvc cluster task" command can now used for viewing all task types, replacing the previously-custom/specific "pvc provisioner status" command. All example provisioner scripts have been updated to leverage new helper functions in the Celery system; while updating these is optional, an administrator is recommended to do so for optimal log output behaviour.
--- a/api-daemon/pvcapid.sample.yaml
+++ b/api-daemon/pvcapid.sample.yaml
@ -1,80 +0,0 @@
---
-# pvcapid configuration file example
-#
-# This configuration file specifies details for the PVC API daemon running on
-# this machine. Default values are not supported; the values in this sample
-# configuration are considered defaults and can be used as-is.
-#
-# Copy this example to /etc/pvc/pvcapid.conf and edit to your needs
-
-pvc:
-    # debug: Enable/disable API debug mode
-    debug: True
-    # coordinators: The list of cluster coordinator hostnames
-    coordinators:
-      - pvchv1
-      - pvchv2
-      - pvchv3
-    # api: Configuration of the API listener
-    api:
-        # listen_address: IP address(es) to listen on; use 0.0.0.0 for all interfaces
-        listen_address: "127.0.0.1"
-        # listen_port: TCP port to listen on, usually 7370
-        listen_port: "7370"
-        # authentication: Authentication and security settings
-        authentication:
-            # enabled: Enable or disable authentication (True/False)
-            enabled: False
-            # secret_key: Per-cluster secret key for API cookies; generate with uuidgen or pwgen
-            secret_key: ""
-            # tokens: a list of authentication tokens; leave as an empty list to disable authentication
-            tokens:
-                  # description: token description for management
-                - description: "testing"
-                  # token: random token for authentication; generate with uuidgen or pwgen
-                  token: ""
-        # ssl: SSL configuration
-        ssl:
-            # enabled: Enabled or disable SSL operation (True/False)
-            enabled: False
-            # cert_file: SSL certificate file
-            cert_file: ""
-            # key_file: SSL certificate key file
-            key_file: ""
-    # provisioner: Configuration of the Provisioner API listener
-    provisioner:
-        # database: Backend database configuration
-        database:
-            # host: PostgreSQL hostname, usually 'localhost'
-            host: localhost
-            # port: PostgreSQL port, invariably '5432'
-            port: 5432
-            # name: PostgreSQL database name, invariably 'pvcapi'
-            name: pvcapi
-            # user: PostgreSQL username, invariable 'pvcapi'
-            user: pvcapi
-            # pass: PostgreSQL user password, randomly generated
-            pass: pvcapi
-        # queue: Celery backend queue using the PVC Zookeeper cluster
-        queue:
-            # host: Redis hostname, usually 'localhost'
-            host: localhost
-            # port: Redis port, invariably '6279'
-            port: 6379
-            # path: Redis queue path, invariably '/0'
-            path: /0
-        # ceph_cluster: Information about the Ceph storage cluster
-        ceph_cluster:
-            # storage_hosts: The list of hosts that the Ceph monitors are valid on; if empty (the default),
-            #                uses the list of coordinators
-            storage_hosts:
-              - pvchv1
-              - pvchv2
-              - pvchv3
-            # storage_domain: The storage domain name, concatenated with the coordinators list names
-            #                 to form monitor access strings
-            storage_domain: "pvc.storage"
-            # ceph_monitor_port: The port that the Ceph monitor on each coordinator listens on
-            ceph_monitor_port: 6789
-            # ceph_storage_secret_uuid: Libvirt secret UUID for Ceph storage access
-            ceph_storage_secret_uuid: ""
--- a/api-daemon/pvcapid.service
+++ b/api-daemon/pvcapid.service
@ -8,7 +8,7 @@ After = network-online.target
 Type = simple
 WorkingDirectory = /usr/share/pvc
 Environment = PYTHONUNBUFFERED=true
-Environment = PVC_CONFIG_FILE=/etc/pvc/pvcapid.yaml
+Environment = PVC_CONFIG_FILE=/etc/pvc/pvc.conf
 ExecStart = /usr/share/pvc/pvcapid.py
 Restart = on-failure

--- a/api-daemon/pvcapid/Daemon.py
+++ b/api-daemon/pvcapid/Daemon.py
@ -19,15 +19,15 @@
 #
 ###############################################################################

-import os
-import yaml

 from ssl import SSLContext, TLSVersion

 from distutils.util import strtobool as dustrtobool

+import daemon_lib.config as cfg
+
 # Daemon version
-version = "0.9.81"
+version = "0.9.83"

 # API version
 API_VERSION = 1.0
@ -53,67 +53,11 @@ def strtobool(stringv):
 # Configuration Parsing
 ##########################################################

-# Parse the configuration file
-try:
-    pvcapid_config_file = os.environ["PVC_CONFIG_FILE"]
-except Exception:
-    print(
-        'Error: The "PVC_CONFIG_FILE" environment variable must be set before starting pvcapid.'
-    )
-    exit(1)

-print('Loading configuration from file "{}"'.format(pvcapid_config_file))
-
-# Read in the config
-try:
-    with open(pvcapid_config_file, "r") as cfgfile:
-        o_config = yaml.load(cfgfile, Loader=yaml.BaseLoader)
-except Exception as e:
-    print("ERROR: Failed to parse configuration file: {}".format(e))
-    exit(1)
-
-try:
-    # Create the config object
-    config = {
-        "debug": strtobool(o_config["pvc"]["debug"]),
-        "coordinators": o_config["pvc"]["coordinators"],
-        "listen_address": o_config["pvc"]["api"]["listen_address"],
-        "listen_port": int(o_config["pvc"]["api"]["listen_port"]),
-        "auth_enabled": strtobool(o_config["pvc"]["api"]["authentication"]["enabled"]),
-        "auth_secret_key": o_config["pvc"]["api"]["authentication"]["secret_key"],
-        "auth_tokens": o_config["pvc"]["api"]["authentication"]["tokens"],
-        "ssl_enabled": strtobool(o_config["pvc"]["api"]["ssl"]["enabled"]),
-        "ssl_key_file": o_config["pvc"]["api"]["ssl"]["key_file"],
-        "ssl_cert_file": o_config["pvc"]["api"]["ssl"]["cert_file"],
-        "database_host": o_config["pvc"]["provisioner"]["database"]["host"],
-        "database_port": int(o_config["pvc"]["provisioner"]["database"]["port"]),
-        "database_name": o_config["pvc"]["provisioner"]["database"]["name"],
-        "database_user": o_config["pvc"]["provisioner"]["database"]["user"],
-        "database_password": o_config["pvc"]["provisioner"]["database"]["pass"],
-        "queue_host": o_config["pvc"]["provisioner"]["queue"]["host"],
-        "queue_port": o_config["pvc"]["provisioner"]["queue"]["port"],
-        "queue_path": o_config["pvc"]["provisioner"]["queue"]["path"],
-        "storage_hosts": o_config["pvc"]["provisioner"]["ceph_cluster"][
-            "storage_hosts"
-        ],
-        "storage_domain": o_config["pvc"]["provisioner"]["ceph_cluster"][
-            "storage_domain"
-        ],
-        "ceph_monitor_port": o_config["pvc"]["provisioner"]["ceph_cluster"][
-            "ceph_monitor_port"
-        ],
-        "ceph_storage_secret_uuid": o_config["pvc"]["provisioner"]["ceph_cluster"][
-            "ceph_storage_secret_uuid"
-        ],
-    }
-
-    # Use coordinators as storage hosts if not explicitly specified
-    if not config["storage_hosts"]:
-        config["storage_hosts"] = config["coordinators"]
-
-except Exception as e:
-    print("ERROR: Failed to load configuration: {}".format(e))
-    exit(1)
+# Get our configuration
+config = cfg.get_configuration()
+config["daemon_name"] = "pvcapid"
+config["daemon_version"] = version


 ##########################################################
@ -124,41 +68,43 @@ except Exception as e:
 def entrypoint():
    import pvcapid.flaskapi as pvc_api  # noqa: E402

-    if config["ssl_enabled"]:
+    if config["api_ssl_enabled"]:
        context = SSLContext()
        context.minimum_version = TLSVersion.TLSv1
        context.get_ca_certs()
-        context.load_cert_chain(config["ssl_cert_file"], keyfile=config["ssl_key_file"])
+        context.load_cert_chain(
+            config["api_ssl_cert_file"], keyfile=config["api_ssl_key_file"]
+        )
    else:
        context = None

    # Print our startup messages
    print("")
-    print("|----------------------------------------------------------|")
-    print("|                                                          |")
-    print("|           ███████████ ▜█▙      ▟█▛ █████ █ █ █           |")
-    print("|                    ██  ▜█▙    ▟█▛  ██                    |")
-    print("|           ███████████   ▜█▙  ▟█▛   ██                    |")
-    print("|           ██             ▜█▙▟█▛    ███████████           |")
-    print("|                                                          |")
-    print("|----------------------------------------------------------|")
-    print("| Parallel Virtual Cluster API daemon v{0: <19} |".format(version))
-    print("| Debug: {0: <49} |".format(str(config["debug"])))
-    print("| API version: v{0: <42} |".format(API_VERSION))
+    print("|--------------------------------------------------------------|")
+    print("|                                                              |")
+    print("|             ███████████ ▜█▙      ▟█▛ █████ █ █ █             |")
+    print("|                      ██  ▜█▙    ▟█▛  ██                      |")
+    print("|             ███████████   ▜█▙  ▟█▛   ██                      |")
+    print("|             ██             ▜█▙▟█▛    ███████████             |")
+    print("|                                                              |")
+    print("|--------------------------------------------------------------|")
+    print("| Parallel Virtual Cluster API daemon v{0: <23} |".format(version))
+    print("| Debug: {0: <53} |".format(str(config["debug"])))
+    print("| API version: v{0: <46} |".format(API_VERSION))
    print(
-        "| Listen: {0: <48} |".format(
-            "{}:{}".format(config["listen_address"], config["listen_port"])
+        "| Listen: {0: <52} |".format(
+            "{}:{}".format(config["api_listen_address"], config["api_listen_port"])
        )
    )
-    print("| SSL: {0: <51} |".format(str(config["ssl_enabled"])))
-    print("| Authentication: {0: <40} |".format(str(config["auth_enabled"])))
-    print("|----------------------------------------------------------|")
+    print("| SSL: {0: <55} |".format(str(config["api_ssl_enabled"])))
+    print("| Authentication: {0: <44} |".format(str(config["api_auth_enabled"])))
+    print("|--------------------------------------------------------------|")
    print("")

    pvc_api.celery_startup()
    pvc_api.app.run(
-        config["listen_address"],
-        config["listen_port"],
+        config["api_listen_address"],
+        config["api_listen_port"],
        threaded=True,
        ssl_context=context,
    )
--- a/api-daemon/pvcapid/flaskapi.py
+++ b/api-daemon/pvcapid/flaskapi.py
@ -31,25 +31,12 @@ from uuid import uuid4
 from daemon_lib.common import getPrimaryNode
 from daemon_lib.zkhandler import ZKConnection
 from daemon_lib.node import get_list as get_node_list
-from daemon_lib.vm import (
-    vm_worker_flush_locks,
-    vm_worker_attach_device,
-    vm_worker_detach_device,
-)
-from daemon_lib.ceph import (
-    osd_worker_add_osd,
-    osd_worker_replace_osd,
-    osd_worker_refresh_osd,
-    osd_worker_remove_osd,
-    osd_worker_add_db_vg,
-)
+from daemon_lib.benchmark import list_benchmarks

 from pvcapid.Daemon import config, strtobool, API_VERSION

 import pvcapid.helper as api_helper
 import pvcapid.provisioner as api_provisioner
-import pvcapid.vmbuilder as api_vmbuilder
-import pvcapid.benchmark as api_benchmark
 import pvcapid.ova as api_ova

 from flask_sqlalchemy import SQLAlchemy
@ -61,11 +48,11 @@ app = flask.Flask(__name__)
 # Set up SQLAlchemy backend
 app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
 app.config["SQLALCHEMY_DATABASE_URI"] = "postgresql://{}:{}@{}:{}/{}".format(
-    config["database_user"],
-    config["database_password"],
-    config["database_host"],
-    config["database_port"],
-    config["database_name"],
+    config["api_postgresql_user"],
+    config["api_postgresql_password"],
+    config["api_postgresql_host"],
+    config["api_postgresql_port"],
+    config["api_postgresql_dbname"],
 )

 if config["debug"]:
@ -73,8 +60,8 @@ if config["debug"]:
 else:
    app.config["DEBUG"] = False

-if config["auth_enabled"]:
-    app.config["SECRET_KEY"] = config["auth_secret_key"]
+if config["api_auth_enabled"]:
+    app.config["SECRET_KEY"] = config["api_auth_secret_key"]

 # Create SQLAlchemy database
 db = SQLAlchemy(app)
@ -99,41 +86,34 @@ def get_primary_node(zkhandler):
    return getPrimaryNode(zkhandler)


-# Set up Celery queue routing
-def route_task(name, args, kwargs, options, task=None, **kw):
-    print("----")
-    print(f"Incoming Celery task: '{name}' with args {args}, kwargs {kwargs}")
+# Set up Celery task ID generator
+# 1. Lets us make our own IDs (first section of UUID)
+# 2. Lets us distribute jobs to the required pvcworkerd instances
+def run_celery_task(task_name, **kwargs):
+    task_id = str(uuid4()).split("-")[0]

-    # If an explicit routing_key is set and it's in the kwargs of the function, use it to set the queue
-    if options["routing_key"] != "default" and options["routing_key"] in kwargs.keys():
-        run_on = kwargs[options["routing_key"]]
-        if run_on == "primary":
-            run_on = get_primary_node()
-    # Otherwise, use the primary node
+    if "run_on" in kwargs and kwargs["run_on"] != "primary":
+        run_on = kwargs["run_on"]
    else:
        run_on = get_primary_node()

-    print(f"Selected Celery worker: {run_on}")
-    print("----")
-
-    return run_on
-
-
-# Set up Celery task ID generator
-# WHY? We don't want to use UUIDs; they're too long and cumbersome. Instead, use a shorter partial UUID.
-def run_celery_task(task_def, **kwargs):
-    task_id = str(uuid4()).split("-")[0]
-    task = task_def.apply_async(
-        (),
-        kwargs,
-        task_id=task_id,
+    print(
+        f"Incoming pvcworkerd task: '{task_name}' ({task_id}) assigned to worker {run_on} with args {kwargs}"
    )
+
+    task = celery.send_task(
+        task_name,
+        task_id=task_id,
+        kwargs=kwargs,
+        queue=run_on,
+    )
+
    return task


 # Create celery definition
 celery_task_uri = "redis://{}:{}{}".format(
-    config["queue_host"], config["queue_port"], config["queue_path"]
+    config["keydb_host"], config["keydb_port"], config["keydb_path"]
 )
 celery = Celery(
    app.name,
@ -141,15 +121,18 @@ celery = Celery(
    result_backend=celery_task_uri,
    result_extended=True,
 )
+app.config["broker_url"] = celery_task_uri
+app.config["result_backend"] = celery_task_uri
+celery.conf.update(app.config)


 def celery_startup():
-    app.config["CELERY_broker_url"] = celery_task_uri
-    app.config["result_backend"] = celery_task_uri
+    """
+    Runs when the API daemon starts, but not the Celery workers or the API doc generator
+    """
    app.config["task_queues"] = tuple(
        [Queue(h, routing_key=f"{h}.#") for h in get_all_nodes()]
    )
-    app.config["task_routes"] = (route_task,)
    celery.conf.update(app.config)


@ -195,7 +178,7 @@ def Authenticator(function):
    @wraps(function)
    def authenticate(*args, **kwargs):
        # No authentication required
-        if not config["auth_enabled"]:
+        if not config["api_auth_enabled"]:
            return function(*args, **kwargs)
        # Session-based authentication
        if "token" in flask.session:
@ -204,7 +187,7 @@ def Authenticator(function):
        if "X-Api-Key" in flask.request.headers:
            if any(
                token
-                for token in config["auth_tokens"]
+                for token in config["api_auth_tokens"]
                if flask.request.headers.get("X-Api-Key") == token.get("token")
            ):
                return function(*args, **kwargs)
@ -216,171 +199,6 @@ def Authenticator(function):
    return authenticate


-#
-# Job functions
-#
-@celery.task(name="provisioner.create", bind=True, routing_key="run_on")
-def create_vm(
-    self,
-    vm_name=None,
-    profile_name=None,
-    define_vm=True,
-    start_vm=True,
-    script_run_args=[],
-    run_on="primary",
-):
-    return api_vmbuilder.create_vm(
-        self,
-        vm_name,
-        profile_name,
-        define_vm=define_vm,
-        start_vm=start_vm,
-        script_run_args=script_run_args,
-    )
-
-
-@celery.task(name="storage.benchmark", bind=True, routing_key="run_on")
-def run_benchmark(self, pool=None, run_on="primary"):
-    return api_benchmark.run_benchmark(self, pool)
-
-
-@celery.task(name="vm.flush_locks", bind=True, routing_key="run_on")
-def vm_flush_locks(self, domain=None, force_unlock=False, run_on="primary"):
-    @ZKConnection(config)
-    def run_vm_flush_locks(zkhandler, self, domain, force_unlock=False):
-        return vm_worker_flush_locks(zkhandler, self, domain, force_unlock=force_unlock)
-
-    return run_vm_flush_locks(self, domain, force_unlock=force_unlock)
-
-
-@celery.task(name="vm.device_attach", bind=True, routing_key="run_on")
-def vm_device_attach(self, domain=None, xml=None, run_on=None):
-    @ZKConnection(config)
-    def run_vm_device_attach(zkhandler, self, domain, xml):
-        return vm_worker_attach_device(zkhandler, self, domain, xml)
-
-    return run_vm_device_attach(self, domain, xml)
-
-
-@celery.task(name="vm.device_detach", bind=True, routing_key="run_on")
-def vm_device_detach(self, domain=None, xml=None, run_on=None):
-    @ZKConnection(config)
-    def run_vm_device_detach(zkhandler, self, domain, xml):
-        return vm_worker_detach_device(zkhandler, self, domain, xml)
-
-    return run_vm_device_detach(self, domain, xml)
-
-
-@celery.task(name="osd.add", bind=True, routing_key="run_on")
-def osd_add(
-    self,
-    device=None,
-    weight=None,
-    ext_db_ratio=None,
-    ext_db_size=None,
-    split_count=None,
-    run_on=None,
-):
-    @ZKConnection(config)
-    def run_osd_add(
-        zkhandler,
-        self,
-        run_on,
-        device,
-        weight,
-        ext_db_ratio=None,
-        ext_db_size=None,
-        split_count=None,
-    ):
-        return osd_worker_add_osd(
-            zkhandler,
-            self,
-            run_on,
-            device,
-            weight,
-            ext_db_ratio,
-            ext_db_size,
-            split_count,
-        )
-
-    return run_osd_add(
-        self, run_on, device, weight, ext_db_ratio, ext_db_size, split_count
-    )
-
-
-@celery.task(name="osd.replace", bind=True, routing_key="run_on")
-def osd_replace(
-    self,
-    osd_id=None,
-    new_device=None,
-    old_device=None,
-    weight=None,
-    ext_db_ratio=None,
-    ext_db_size=None,
-    run_on=None,
-):
-    @ZKConnection(config)
-    def run_osd_replace(
-        zkhandler,
-        self,
-        run_on,
-        osd_id,
-        new_device,
-        old_device=None,
-        weight=None,
-        ext_db_ratio=None,
-        ext_db_size=None,
-    ):
-        return osd_worker_replace_osd(
-            zkhandler,
-            self,
-            run_on,
-            osd_id,
-            new_device,
-            old_device,
-            weight,
-            ext_db_ratio,
-            ext_db_size,
-        )
-
-    return run_osd_replace(
-        self, run_on, osd_id, new_device, old_device, weight, ext_db_ratio, ext_db_size
-    )
-
-
-@celery.task(name="osd.refresh", bind=True, routing_key="run_on")
-def osd_refresh(self, osd_id=None, device=None, ext_db_flag=False, run_on=None):
-    @ZKConnection(config)
-    def run_osd_refresh(zkhandler, self, run_on, osd_id, device, ext_db_flag=False):
-        return osd_worker_refresh_osd(
-            zkhandler, self, run_on, osd_id, device, ext_db_flag
-        )
-
-    return run_osd_refresh(self, run_on, osd_id, device, ext_db_flag)
-
-
-@celery.task(name="osd.remove", bind=True, routing_key="run_on")
-def osd_remove(self, osd_id=None, force_flag=False, skip_zap_flag=False, run_on=None):
-    @ZKConnection(config)
-    def run_osd_remove(
-        zkhandler, self, run_on, osd_id, force_flag=False, skip_zap_flag=False
-    ):
-        return osd_worker_remove_osd(
-            zkhandler, self, run_on, osd_id, force_flag, skip_zap_flag
-        )
-
-    return run_osd_remove(self, run_on, osd_id, force_flag, skip_zap_flag)
-
-
-@celery.task(name="osd.add_db_vg", bind=True, routing_key="run_on")
-def osd_add_db_vg(self, device=None, run_on=None):
-    @ZKConnection(config)
-    def run_osd_add_db_vg(zkhandler, self, run_on, device):
-        return osd_worker_add_db_vg(zkhandler, self, run_on, device)
-
-    return run_osd_add_db_vg(self, run_on, device)
-
-
 ##########################################################
 # API Root/Authentication
 ##########################################################
@ -465,12 +283,12 @@ class API_Login(Resource):
              type: object
              id: Message
        """
-        if not config["auth_enabled"]:
+        if not config["api_auth_enabled"]:
            return flask.redirect(Api.url_for(api, API_Root))

        if any(
            token
-            for token in config["auth_tokens"]
+            for token in config["api_auth_tokens"]
            if flask.request.values["token"] in token["token"]
        ):
            flask.session["token"] = flask.request.form["token"]
@ -499,7 +317,7 @@ class API_Logout(Resource):
          302:
            description: Authentication disabled
        """
-        if not config["auth_enabled"]:
+        if not config["api_auth_enabled"]:
            return flask.redirect(Api.url_for(api, API_Root))

        flask.session.pop("token", None)
@ -2455,7 +2273,7 @@ class API_VM_Locks(Resource):
        else:
            return vm_node_detail, retcode

-        task = run_celery_task(vm_flush_locks, domain=vm, run_on=vm_node)
+        task = run_celery_task("vm.flush_locks", domain=vm, run_on=vm_node)

        return (
            {"task_id": task.id, "task_name": "vm.flush_locks", "run_on": vm_node},
@ -2590,7 +2408,7 @@ class API_VM_Device(Resource):
        else:
            return vm_node_detail, retcode

-        task = run_celery_task(vm_device_attach, domain=vm, xml=xml, run_on=vm_node)
+        task = run_celery_task("vm.device_attach", domain=vm, xml=xml, run_on=vm_node)

        return (
            {"task_id": task.id, "task_name": "vm.device_attach", "run_on": vm_node},
@ -2644,7 +2462,7 @@ class API_VM_Device(Resource):
        else:
            return vm_node_detail, retcode

-        task = run_celery_task(vm_device_detach, domain=vm, xml=xml, run_on=vm_node)
+        task = run_celery_task("vm.device_detach", domain=vm, xml=xml, run_on=vm_node)

        return (
            {"task_id": task.id, "task_name": "vm.device_detach", "run_on": vm_node},
@ -4360,7 +4178,7 @@ class API_Storage_Ceph_Benchmark(Resource):
                              type: string (integer)
                              description: The number of minor page faults during the test
        """
-        return api_benchmark.list_benchmarks(reqargs.get("job", None))
+        return list_benchmarks(config, reqargs.get("job", None))

    @RequestParser(
        [
@ -4401,7 +4219,7 @@ class API_Storage_Ceph_Benchmark(Resource):
            }, 400

        task = run_celery_task(
-            run_benchmark, pool=reqargs.get("pool", None), run_on="primary"
+            "storage.benchmark", pool=reqargs.get("pool", None), run_on="primary"
        )
        return (
            {
@ -4527,7 +4345,7 @@ class API_Storage_Ceph_OSDDB_Root(Resource):
        node = reqargs.get("node", None)

        task = run_celery_task(
-            osd_add_db_vg, device=reqargs.get("device", None), run_on=node
+            "osd.add_db_vg", device=reqargs.get("device", None), run_on=node
        )

        return (
@ -4726,7 +4544,7 @@ class API_Storage_Ceph_OSD_Root(Resource):
        node = reqargs.get("node", None)

        task = run_celery_task(
-            osd_add,
+            "osd.add",
            device=reqargs.get("device", None),
            weight=reqargs.get("weight", None),
            ext_db_ratio=reqargs.get("ext_db_ratio", None),
@ -4845,7 +4663,7 @@ class API_Storage_Ceph_OSD_Element(Resource):
            return osd_node_detail, retcode

        task = run_celery_task(
-            osd_replace,
+            "osd.replace",
            osd_id=osdid,
            new_device=reqargs.get("new_device"),
            old_device=reqargs.get("old_device", None),
@ -4903,7 +4721,7 @@ class API_Storage_Ceph_OSD_Element(Resource):
            return osd_node_detail, retcode

        task = run_celery_task(
-            osd_refresh,
+            "osd.refresh",
            osd_id=osdid,
            device=reqargs.get("device", None),
            ext_db_flag=False,
@ -4974,7 +4792,7 @@ class API_Storage_Ceph_OSD_Element(Resource):
            return osd_node_detail, retcode

        task = run_celery_task(
-            osd_remove,
+            "osd.remove",
            osd_id=osdid,
            force_flag=reqargs.get("force", False),
            run_on=node,
@ -8503,7 +8321,7 @@ class API_Provisioner_Create_Root(Resource):
            start_vm = False

        task = run_celery_task(
-            create_vm,
+            "provisioner.create",
            vm_name=reqargs.get("name", None),
            profile_name=reqargs.get("profile", None),
            define_vm=define_vm,
--- a/api-daemon/pvcapid/ova.py
+++ b/api-daemon/pvcapid/ova.py
@ -48,11 +48,11 @@ import pvcapid.provisioner as provisioner
 # Database connections
 def open_database(config):
    conn = psycopg2.connect(
-        host=config["database_host"],
-        port=config["database_port"],
-        dbname=config["database_name"],
-        user=config["database_user"],
-        password=config["database_password"],
+        host=config["api_postgresql_host"],
+        port=config["api_postgresql_port"],
+        dbname=config["api_postgresql_name"],
+        user=config["api_postgresql_user"],
+        password=config["api_postgresql_password"],
    )
    cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    return conn, cur
--- a/api-daemon/pvcapid/provisioner.py
+++ b/api-daemon/pvcapid/provisioner.py
@ -63,11 +63,11 @@ class ProvisioningError(Exception):
 # Database connections
 def open_database(config):
    conn = psycopg2.connect(
-        host=config["database_host"],
-        port=config["database_port"],
-        dbname=config["database_name"],
-        user=config["database_user"],
-        password=config["database_password"],
+        host=config["api_postgresql_host"],
+        port=config["api_postgresql_port"],
+        dbname=config["api_postgresql_dbname"],
+        user=config["api_postgresql_user"],
+        password=config["api_postgresql_password"],
    )
    cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    return conn, cur
--- a/api-daemon/pvcworkerd.service
+++ b/api-daemon/pvcworkerd.service
@ -1,16 +0,0 @@
-# Parallel Virtual Cluster Celery Worker daemon unit file
-
-[Unit]
-Description = Parallel Virtual Cluster Celery Worker daemon
-After = network-online.target
-
-[Service]
-Type = simple
-WorkingDirectory = /usr/share/pvc
-Environment = PYTHONUNBUFFERED=true
-Environment = PVC_CONFIG_FILE=/etc/pvc/pvcapid.yaml
-ExecStart = /usr/share/pvc/pvcworkerd.sh
-Restart = on-failure
-
-[Install]
-WantedBy = multi-user.target
--- a/build-and-deploy.sh
+++ b/build-and-deploy.sh
@ -42,23 +42,33 @@ echo " done. Package version ${version}."

 # Install the client(s) locally
 echo -n "Installing client packages locally..."
-$SUDO dpkg -i ../pvc-client*_${version}*.deb &>/dev/null
+$SUDO dpkg -i --force-all ../pvc-client*_${version}*.deb &>/dev/null
 echo " done".

 for HOST in ${HOSTS[@]}; do
-    echo "> Deploying packages to host ${HOST}"
-    echo -n "Copying packages..."
+    echo -n "Copying packages to host ${HOST}..."
    ssh $HOST $SUDO rm -rf /tmp/pvc &>/dev/null
    ssh $HOST mkdir /tmp/pvc &>/dev/null
    scp ../pvc-*_${version}*.deb $HOST:/tmp/pvc/ &>/dev/null
    echo " done."
+done
+if [[ -z ${KEEP_ARTIFACTS} ]]; then
+    rm ../pvc*_${version}*
+fi
+
+for HOST in ${HOSTS[@]}; do
+    echo "> Deploying packages to host ${HOST}"
    echo -n "Installing packages..."
-    ssh $HOST $SUDO dpkg -i /tmp/pvc/{pvc-client-cli,pvc-daemon-common,pvc-daemon-api,pvc-daemon-node}*.deb &>/dev/null
+    ssh $HOST $SUDO dpkg -i --force-all /tmp/pvc/*.deb &>/dev/null
    ssh $HOST rm -rf /tmp/pvc &>/dev/null
    echo " done."
    echo -n "Restarting PVC daemons..."
    ssh $HOST $SUDO systemctl restart pvcapid &>/dev/null
+    sleep 2
    ssh $HOST $SUDO systemctl restart pvcworkerd &>/dev/null
+    sleep 2
+    ssh $HOST $SUDO systemctl restart pvchealthd &>/dev/null
+    sleep 2
    ssh $HOST $SUDO systemctl restart pvcnoded &>/dev/null
    echo " done."
    echo -n "Waiting for node daemon to be running..."
@ -68,8 +78,5 @@ for HOST in ${HOSTS[@]}; do
    done
    echo " done."
 done
-if [[ -z ${KEEP_ARTIFACTS} ]]; then
-    rm ../pvc*_${version}*
-fi

 popd &>/dev/null
--- a/build-unstable-deb.sh
+++ b/build-unstable-deb.sh
@ -13,9 +13,11 @@ echo ${new_ver} >&3
 tmpdir=$( mktemp -d )
 cp -a debian/changelog client-cli/setup.py ${tmpdir}/
 cp -a node-daemon/pvcnoded/Daemon.py ${tmpdir}/node-Daemon.py
+cp -a health-daemon/pvchealthd/Daemon.py ${tmpdir}/health-Daemon.py
+cp -a worker-daemon/pvcworkerd/Daemon.py ${tmpdir}/worker-Daemon.py
 cp -a api-daemon/pvcapid/Daemon.py ${tmpdir}/api-Daemon.py
 # Replace the "base" version with the git revision version
-sed -i "s/version = \"${base_ver}\"/version = \"${new_ver}\"/" node-daemon/pvcnoded/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py
+sed -i "s/version = \"${base_ver}\"/version = \"${new_ver}\"/" node-daemon/pvcnoded/Daemon.py health-daemon/pvchealthd/Daemon.py worker-daemon/pvcworkerd/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py
 sed -i "s/${base_ver}-0/${new_ver}/" debian/changelog 
 cat <<EOF > debian/changelog
 pvc (${new_ver}) unstable; urgency=medium
@ -33,6 +35,8 @@ dpkg-buildpackage -us -uc
 cp -a ${tmpdir}/changelog debian/changelog
 cp -a ${tmpdir}/setup.py client-cli/setup.py
 cp -a ${tmpdir}/node-Daemon.py node-daemon/pvcnoded/Daemon.py
+cp -a ${tmpdir}/health-Daemon.py health-daemon/pvchealthd/Daemon.py
+cp -a ${tmpdir}/worker-Daemon.py worker-daemon/pvcworkerd/Daemon.py
 cp -a ${tmpdir}/api-Daemon.py api-daemon/pvcapid/Daemon.py

 # Clean up
--- a/7
+++ b/7
@ -17,9 +17,10 @@ echo "# Write the changelog below; comments will be ignored" >> ${changelog_file
 $EDITOR ${changelog_file}

 changelog="$( cat ${changelog_file} | grep -v '^#' | sed 's/^*/  */' )"
-rm ${changelog_file}

 sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," node-daemon/pvcnoded/Daemon.py
+sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," health-daemon/pvchealthd/Daemon.py
+sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," worker-daemon/pvcworkerd/Daemon.py
 sed -i "s,version = \"${current_version}\",version = \"${new_version}\"," api-daemon/pvcapid/Daemon.py
 sed -i "s,version=\"${current_version}\",version=\"${new_version}\"," client-cli/setup.py
 echo ${new_version} > .version
@ -46,11 +47,13 @@ echo -e "${deb_changelog_new}" >> ${deb_changelog_file}
 echo -e "${deb_changelog_orig}" >> ${deb_changelog_file}
 mv ${deb_changelog_file} debian/changelog

-git add node-daemon/pvcnoded/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py debian/changelog CHANGELOG.md .version
+git add node-daemon/pvcnoded/Daemon.py health-daemon/pvchealthd/Daemon.py worker-daemon/pvcworkerd/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py debian/changelog CHANGELOG.md .version
 git commit -v

 popd &>/dev/null

+rm ${changelog_file}
+
 echo
 echo "Release message:"
 echo
--- a/client-cli/autobackup.sample.yaml
+++ b/client-cli/autobackup.sample.yaml
@ -1,52 +0,0 @@
---
-# Root level configuration key
-autobackup:
-
-  # Backup root path on the node, used as the remote mountpoint
-  # Must be an absolute path beginning with '/'
-  # If remote_mount is enabled, the remote mount will be mounted on this directory
-  # If remote_mount is enabled, it is recommended to use a path under `/tmp` for this
-  # If remote_mount is disabled, a real filesystem must be mounted here (PVC system volumes are small!)
-  backup_root_path: "/tmp/backups"
-
-  # Suffix to the backup root path, used to allow multiple PVC systems to write to a single root path
-  # Must begin with '/'; leave empty to use the backup root path directly
-  # Note that most remote mount options can fake this if needed, but provided to ensure local compatability
-  backup_root_suffix: "/mycluster"
-
-  # VM tag(s) to back up
-  # Only VMs with at least one of the given tag(s) will be backed up; all others will be skipped
-  backup_tags:
-    - "backup"
-    - "mytag"
-
-  # Backup schedule: when and what format to take backups
-  backup_schedule:
-    full_interval:  7  # Number of total backups between full backups; others are incremental
-                       # > If this number is 1, every backup will be a full backup and no incremental
-                       #   backups will be taken
-                       # > If this number is 2, every second backup will be a full backup, etc.
-    full_retention: 2  # Keep this many full backups; the oldest will be deleted when a new one is
-                       # taken, along with all child incremental backups of that backup
-                       # > Should usually be at least 2 when using incrementals (full_interval > 1) to
-                       #   avoid there being too few backups after cleanup from a new full backup
-
-  # Automatic mount settings
-  # These settings permit running an arbitrary set of commands, ideally a "mount" command or similar, to
-  # ensure that a remote filesystem is mounted on the backup root path
-  # While the examples here show absolute paths, that is not required; they will run with the $PATH of the
-  # executing environment (either the "pvc" command on a CLI or a cron/systemd timer)
-  # A "{backup_root_path}" f-string/str.format type variable MAY be present in any cmds string to represent
-  # the above configured root backup path, which is interpolated at runtime
-  # If multiple commands are given, they will be executed in the order given; if no commands are given,
-  # nothing is executed, but the keys MUST be present
-  auto_mount:
-    enabled: no  # Enable automatic mount/unmount support
-    # These commands are executed at the start of the backup run and should mount a filesystem
-    mount_cmds:
-      # This example shows an NFS mount leveraging the backup_root_path variable
-      - "/usr/sbin/mount.nfs -o nfsvers=3 10.0.0.10:/backups {backup_root_path}"
-    # These commands are executed at the end of the backup run and should unmount a filesystem
-    unmount_cmds:
-      # This example shows a generic umount leveraging the backup_root_path variable
-      - "/usr/bin/umount {backup_root_path}"
--- a/client-cli/pvc/cli/cli.py
+++ b/client-cli/pvc/cli/cli.py
@ -5960,8 +5960,8 @@ def cli(
      "PVC_COLOUR": Force colour on the output even if Click determines it is not a console (e.g. with 'watch')

    If a "-c"/"--connection"/"PVC_CONNECTION" is not specified, the CLI will attempt to read a "local" connection
-    from the API configuration at "/etc/pvc/pvcapid.yaml". If no such configuration is found, the command will
-    abort with an error. This applies to all commands except those under "connection".
+    from the API configuration at "/etc/pvc/pvc.conf". If no such configuration is found, the command will abort
+    with an error. This applies to all commands except those under "connection".
    """

    global CLI_CONFIG
--- a/client-cli/pvc/cli/helpers.py
+++ b/client-cli/pvc/cli/helpers.py
@ -33,18 +33,18 @@ from subprocess import run, PIPE
 from sys import argv
 from syslog import syslog, openlog, closelog, LOG_AUTH
 from yaml import load as yload
-from yaml import BaseLoader, SafeLoader
+from yaml import SafeLoader

 import pvc.lib.provisioner
 import pvc.lib.vm
 import pvc.lib.node


-DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvcapid.yaml"}
+DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvc.conf"}
 DEFAULT_STORE_FILENAME = "pvc.json"
 DEFAULT_API_PREFIX = "/api/v1"
 DEFAULT_NODE_HOSTNAME = gethostname().split(".")[0]
-DEFAULT_AUTOBACKUP_FILENAME = "/etc/pvc/autobackup.yaml"
+DEFAULT_AUTOBACKUP_FILENAME = "/etc/pvc/pvc.conf"
 MAX_CONTENT_WIDTH = 120


@ -88,14 +88,15 @@ def read_config_from_yaml(cfgfile):

    try:
        with open(cfgfile) as fh:
-            api_config = yload(fh, Loader=BaseLoader)["pvc"]["api"]
+            api_config = yload(fh, Loader=SafeLoader)["api"]

-        host = api_config["listen_address"]
-        port = api_config["listen_port"]
-        scheme = "https" if strtobool(api_config["ssl"]["enabled"]) else "http"
+        host = api_config["listen"]["address"]
+        port = api_config["listen"]["port"]
+        scheme = "https" if api_config["ssl"]["enabled"] else "http"
        api_key = (
-            api_config["authentication"]["tokens"][0]["token"]
-            if strtobool(api_config["authentication"]["enabled"])
+            api_config["token"][0]["token"]
+            if api_config["authentication"]["enabled"]
+            and api_config["authentication"]["source"] == "token"
            else None
        )
    except KeyError:
--- a/client-cli/pvc/lib/node.py
+++ b/client-cli/pvc/lib/node.py
@ -21,6 +21,8 @@

 import time

+from collections import deque
+
 import pvc.lib.ansiprint as ansiprint
 from pvc.lib.common import call_api

@ -107,7 +109,9 @@ def follow_node_log(config, node, lines=10):
    API schema: {"name":"{nodename}","data":"{node_log}"}
    """
    # We always grab 200 to match the follow call, but only _show_ `lines` number
-    params = {"lines": 200}
+    max_lines = 200
+
+    params = {"lines": max_lines}
    response = call_api(
        config, "get", "/node/{node}/log".format(node=node), params=params
    )
@ -117,43 +121,50 @@ def follow_node_log(config, node, lines=10):

    # Shrink the log buffer to length lines
    node_log = response.json()["data"]
-    shrunk_log = node_log.split("\n")[-int(lines) :]
-    loglines = "\n".join(shrunk_log)
+    full_log = node_log.split("\n")
+    shrunk_log = full_log[-int(lines) :]

    # Print the initial data and begin following
-    print(loglines, end="")
-    print("\n", end="")
+    for line in shrunk_log:
+        print(line)
+
+    # Create the deque we'll use to buffer loglines
+    loglines = deque(full_log, max_lines)

    while True:
+        # Wait half a second
+        time.sleep(0.5)
+
        # Grab the next line set (200 is a reasonable number of lines per half-second; any more are skipped)
        try:
-            params = {"lines": 200}
+            params = {"lines": max_lines}
            response = call_api(
                config, "get", "/node/{node}/log".format(node=node), params=params
            )
            new_node_log = response.json()["data"]
        except Exception:
            break
+
        # Split the new and old log strings into constitutent lines
-        old_node_loglines = node_log.split("\n")
        new_node_loglines = new_node_log.split("\n")

-        # Set the node log to the new log value for the next iteration
-        node_log = new_node_log
+        # Find where in the new lines the last entryin the ongoing deque is
+        start_idx = 0
+        for idx, line in enumerate(new_node_loglines):
+            if line == loglines[-1]:
+                start_idx = idx

-        # Get the difference between the two sets of lines
-        old_node_loglines_set = set(old_node_loglines)
-        diff_node_loglines = [
-            x for x in new_node_loglines if x not in old_node_loglines_set
-        ]
+        # Get the new lines starting from the found index plus one
+        diff_node_loglines = new_node_loglines[start_idx + 1 :]

-        # If there's a difference, print it out
+        # If there's a difference, add the lines to the ongling deque and print then out
        if len(diff_node_loglines) > 0:
-            print("\n".join(diff_node_loglines), end="")
-            print("\n", end="")
+            for line in diff_node_loglines:
+                loglines.append(line)
+                print(line)

-        # Wait half a second
-        time.sleep(0.5)
+        del new_node_loglines
+        del diff_node_loglines

    return True, ""

--- a/client-cli/setup.py
+++ b/client-cli/setup.py
@ -2,7 +2,7 @@ from setuptools import setup

 setup(
    name="pvc",
-    version="0.9.81",
+    version="0.9.83",
    packages=["pvc.cli", "pvc.lib"],
    install_requires=[
        "Click",
--- a/api-daemon/pvcapid/benchmark.py
+++ b/api-daemon/pvcapid/benchmark.py
@ -25,9 +25,6 @@ import psycopg2.extras
 from datetime import datetime
 from json import loads, dumps

-from pvcapid.Daemon import config
-
-from daemon_lib.zkhandler import ZKHandler
 from daemon_lib.celery import start, fail, log_info, update, finish

 import daemon_lib.common as pvc_common
@ -135,11 +132,11 @@ def cleanup(job_name, db_conn=None, db_cur=None, zkhandler=None):
 # Database connections
 def open_database(config):
    conn = psycopg2.connect(
-        host=config["database_host"],
-        port=config["database_port"],
-        dbname=config["database_name"],
-        user=config["database_user"],
-        password=config["database_password"],
+        host=config["api_postgresql_host"],
+        port=config["api_postgresql_port"],
+        dbname=config["api_postgresql_dbname"],
+        user=config["api_postgresql_user"],
+        password=config["api_postgresql_password"],
    )
    cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    return conn, cur
@ -152,7 +149,7 @@ def close_database(conn, cur, failed=False):
    conn.close()


-def list_benchmarks(job=None):
+def list_benchmarks(config, job=None):
    if job is not None:
        query = "SELECT * FROM {} WHERE job = %s;".format("storage_benchmarks")
        args = (job,)
@ -278,17 +275,8 @@ def run_benchmark_job(
    return jstdout


-def run_benchmark(self, pool):
+def worker_run_benchmark(zkhandler, celery, config, pool):
    # Phase 0 - connect to databases
-    try:
-        zkhandler = ZKHandler(config)
-        zkhandler.connect()
-    except Exception:
-        fail(
-            self,
-            "Failed to connect to Zookeeper",
-        )
-
    cur_time = datetime.now().isoformat(timespec="seconds")
    cur_primary = zkhandler.read("base.config.primary_node")
    job_name = f"{cur_time}_{cur_primary}"
@ -296,7 +284,7 @@ def run_benchmark(self, pool):
    current_stage = 0
    total_stages = 13
    start(
-        self,
+        celery,
        f"Running storage benchmark '{job_name}' on pool '{pool}'",
        current=current_stage,
        total=total_stages,
@ -312,13 +300,13 @@ def run_benchmark(self, pool):
            zkhandler=zkhandler,
        )
        fail(
-            self,
+            celery,
            "Failed to connect to Postgres",
        )

    current_stage += 1
    update(
-        self,
+        celery,
        "Storing running status in database",
        current=current_stage,
        total=total_stages,
@ -340,11 +328,11 @@ def run_benchmark(self, pool):
            db_cur=db_cur,
            zkhandler=zkhandler,
        )
-        fail(self, f"Failed to store running status: {e}", exception=BenchmarkError)
+        fail(celery, f"Failed to store running status: {e}", exception=BenchmarkError)

    current_stage += 1
    update(
-        self,
+        celery,
        "Creating benchmark volume",
        current=current_stage,
        total=total_stages,
@ -363,7 +351,7 @@ def run_benchmark(self, pool):
    for test in test_matrix:
        current_stage += 1
        update(
-            self,
+            celery,
            f"Running benchmark job '{test}'",
            current=current_stage,
            total=total_stages,
@ -381,7 +369,7 @@ def run_benchmark(self, pool):
    # Phase 3 - cleanup
    current_stage += 1
    update(
-        self,
+        celery,
        "Cleaning up venchmark volume",
        current=current_stage,
        total=total_stages,
@ -397,7 +385,7 @@ def run_benchmark(self, pool):

    current_stage += 1
    update(
-        self,
+        celery,
        "Storing results in database",
        current=current_stage,
        total=total_stages,
@ -415,7 +403,7 @@ def run_benchmark(self, pool):
            db_cur=db_cur,
            zkhandler=zkhandler,
        )
-        fail(self, f"Failed to store test results: {e}", exception=BenchmarkError)
+        fail(celery, f"Failed to store test results: {e}", exception=BenchmarkError)

    cleanup(
        job_name,
@ -426,7 +414,7 @@ def run_benchmark(self, pool):

    current_stage += 1
    return finish(
-        self,
+        celery,
        f"Storage benchmark {job_name} completed successfully",
        current=current_stage,
        total=total_stages,
--- a/daemon-common/config.py
+++ b/daemon-common/config.py
@ -0,0 +1,419 @@
+#!/usr/bin/env python3
+
+# config.py - Utility functions for pvcnoded configuration parsing
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import os
+import subprocess
+import yaml
+
+from socket import gethostname
+from re import findall
+from psutil import cpu_count
+from ipaddress import ip_address, ip_network
+
+
+class MalformedConfigurationError(Exception):
+    """
+    An except when parsing the PVC Node daemon configuration file
+    """
+
+    def __init__(self, error=None):
+        self.msg = f"ERROR: Configuration file is malformed: {error}"
+
+    def __str__(self):
+        return str(self.msg)
+
+
+def get_static_data():
+    """
+    Data that is obtained once at node startup for use later
+    """
+    staticdata = list()
+    staticdata.append(str(cpu_count()))  # CPU count
+    staticdata.append(
+        subprocess.run(["uname", "-r"], stdout=subprocess.PIPE)
+        .stdout.decode("ascii")
+        .strip()
+    )
+    staticdata.append(
+        subprocess.run(["uname", "-o"], stdout=subprocess.PIPE)
+        .stdout.decode("ascii")
+        .strip()
+    )
+    staticdata.append(
+        subprocess.run(["uname", "-m"], stdout=subprocess.PIPE)
+        .stdout.decode("ascii")
+        .strip()
+    )
+
+    return staticdata
+
+
+def get_configuration_path():
+    try:
+        _config_file = os.environ["PVC_CONFIG_FILE"]
+        if not os.path.exists(_config_file):
+            raise
+        config_file = _config_file
+    except Exception:
+        print('ERROR: The "PVC_CONFIG_FILE" environment variable must be set.')
+        os._exit(1)
+
+    return config_file
+
+
+def get_hostname():
+    node_fqdn = gethostname()
+    node_hostname = node_fqdn.split(".", 1)[0]
+    node_domain = "".join(node_fqdn.split(".", 1)[1:])
+    try:
+        node_id = findall(r"\d+", node_hostname)[-1]
+    except IndexError:
+        node_id = 0
+
+    return node_fqdn, node_hostname, node_domain, node_id
+
+
+def validate_floating_ip(config, network):
+    if network not in ["cluster", "storage", "upstream"]:
+        return False, f'Specified network type "{network}" is not valid'
+
+    floating_key = f"{network}_floating_ip"
+    network_key = f"{network}_network"
+
+    # Verify the network provided is valid
+    try:
+        network = ip_network(config[network_key])
+    except Exception:
+        return (
+            False,
+            f"Network address {config[network_key]} for {network_key} is not valid",
+        )
+
+    # Verify that the floating IP is valid (and in the network)
+    try:
+        floating_address = ip_address(config[floating_key].split("/")[0])
+        if floating_address not in list(network.hosts()):
+            raise
+    except Exception:
+        return (
+            False,
+            f"Floating address {config[floating_key]} for {floating_key} is not valid",
+        )
+
+    return True, ""
+
+
+def get_parsed_configuration(config_file):
+    print('Loading configuration from file "{}"'.format(config_file))
+
+    with open(config_file, "r") as cfgfh:
+        try:
+            o_config = yaml.load(cfgfh, Loader=yaml.SafeLoader)
+        except Exception as e:
+            print(f"ERROR: Failed to parse configuration file: {e}")
+            os._exit(1)
+
+    config = dict()
+
+    node_fqdn, node_hostname, node_domain, node_id = get_hostname()
+
+    config_thisnode = {
+        "node": node_hostname,
+        "node_hostname": node_hostname,
+        "node_fqdn": node_fqdn,
+        "node_domain": node_domain,
+        "node_id": node_id,
+    }
+    config = {**config, **config_thisnode}
+
+    try:
+        o_path = o_config["path"]
+        config_path = {
+            "plugin_directory": o_path.get(
+                "plugin_directory", "/usr/share/pvc/plugins"
+            ),
+            "dynamic_directory": o_path["dynamic_directory"],
+            "log_directory": o_path["system_log_directory"],
+            "console_log_directory": o_path["console_log_directory"],
+            "ceph_directory": o_path["ceph_directory"],
+        }
+        # Define our dynamic directory schema
+        config_path["dnsmasq_dynamic_directory"] = (
+            config_path["dynamic_directory"] + "/dnsmasq"
+        )
+        config_path["pdns_dynamic_directory"] = (
+            config_path["dynamic_directory"] + "/pdns"
+        )
+        config_path["nft_dynamic_directory"] = config_path["dynamic_directory"] + "/nft"
+        # Define our log directory schema
+        config_path["dnsmasq_log_directory"] = config_path["log_directory"] + "/dnsmasq"
+        config_path["pdns_log_directory"] = config_path["log_directory"] + "/pdns"
+        config_path["nft_log_directory"] = config_path["log_directory"] + "/nft"
+        config = {**config, **config_path}
+
+        o_subsystem = o_config["subsystem"]
+        config_subsystem = {
+            "enable_hypervisor": o_subsystem.get("enable_hypervisor", True),
+            "enable_networking": o_subsystem.get("enable_networking", True),
+            "enable_storage": o_subsystem.get("enable_storage", True),
+            "enable_worker": o_subsystem.get("enable_worker", True),
+            "enable_api": o_subsystem.get("enable_api", True),
+        }
+        config = {**config, **config_subsystem}
+
+        o_cluster = o_config["cluster"]
+        config_cluster = {
+            "cluster_name": o_cluster["name"],
+            "all_nodes": o_cluster["all_nodes"],
+            "coordinators": o_cluster["coordinator_nodes"],
+        }
+        config = {**config, **config_cluster}
+
+        o_cluster_networks = o_cluster["networks"]
+        for network_type in ["cluster", "storage", "upstream"]:
+            o_cluster_networks_specific = o_cluster_networks[network_type]
+            config_cluster_networks_specific = {
+                f"{network_type}_domain": o_cluster_networks_specific["domain"],
+                f"{network_type}_dev": o_cluster_networks_specific["device"],
+                f"{network_type}_mtu": o_cluster_networks_specific["mtu"],
+                f"{network_type}_network": o_cluster_networks_specific["ipv4"][
+                    "network_address"
+                ]
+                + "/"
+                + str(o_cluster_networks_specific["ipv4"]["netmask"]),
+                f"{network_type}_floating_ip": o_cluster_networks_specific["ipv4"][
+                    "floating_address"
+                ]
+                + "/"
+                + str(o_cluster_networks_specific["ipv4"]["netmask"]),
+                f"{network_type}_node_ip_selection": o_cluster_networks_specific[
+                    "node_ip_selection"
+                ],
+            }
+
+            if (
+                o_cluster_networks_specific["ipv4"].get("gateway_address", None)
+                is not None
+            ):
+                config[f"{network_type}_gateway"] = o_cluster_networks_specific["ipv4"][
+                    "gateway_address"
+                ]
+
+            result, msg = validate_floating_ip(
+                config_cluster_networks_specific, network_type
+            )
+            if not result:
+                raise MalformedConfigurationError(msg)
+
+            network = ip_network(
+                config_cluster_networks_specific[f"{network_type}_network"]
+            )
+
+            if (
+                config_cluster_networks_specific[f"{network_type}_node_ip_selection"]
+                == "by-id"
+            ):
+                address_id = int(node_id) - 1
+            else:
+                # This roundabout solution ensures the given IP is in the subnet and is something valid
+                address_id = [
+                    idx
+                    for idx, ip in enumerate(list(network.hosts()))
+                    if str(ip)
+                    == config_cluster_networks_specific[
+                        f"{network_type}_node_ip_selection"
+                    ]
+                ][0]
+
+            config_cluster_networks_specific[
+                f"{network_type}_dev_ip"
+            ] = f"{list(network.hosts())[address_id]}/{network.prefixlen}"
+
+            config = {**config, **config_cluster_networks_specific}
+
+        o_database = o_config["database"]
+        config_database = {
+            "zookeeper_port": o_database["zookeeper"]["port"],
+            "keydb_port": o_database["keydb"]["port"],
+            "keydb_host": o_database["keydb"]["hostname"],
+            "keydb_path": o_database["keydb"]["path"],
+            "api_postgresql_port": o_database["postgres"]["port"],
+            "api_postgresql_host": o_database["postgres"]["hostname"],
+            "api_postgresql_dbname": o_database["postgres"]["credentials"]["api"][
+                "database"
+            ],
+            "api_postgresql_user": o_database["postgres"]["credentials"]["api"][
+                "username"
+            ],
+            "api_postgresql_password": o_database["postgres"]["credentials"]["api"][
+                "password"
+            ],
+            "pdns_postgresql_port": o_database["postgres"]["port"],
+            "pdns_postgresql_host": o_database["postgres"]["hostname"],
+            "pdns_postgresql_dbname": o_database["postgres"]["credentials"]["dns"][
+                "database"
+            ],
+            "pdns_postgresql_user": o_database["postgres"]["credentials"]["dns"][
+                "username"
+            ],
+            "pdns_postgresql_password": o_database["postgres"]["credentials"]["dns"][
+                "password"
+            ],
+        }
+        config = {**config, **config_database}
+
+        o_timer = o_config["timer"]
+        config_timer = {
+            "vm_shutdown_timeout": int(o_timer.get("vm_shutdown_timeout", 180)),
+            "keepalive_interval": int(o_timer.get("keepalive_interval", 5)),
+            "monitoring_interval": int(o_timer.get("monitoring_interval", 60)),
+        }
+        config = {**config, **config_timer}
+
+        o_fencing = o_config["fencing"]
+        config_fencing = {
+            "disable_on_ipmi_failure": o_fencing["disable_on_ipmi_failure"],
+            "fence_intervals": int(o_fencing["intervals"].get("fence_intervals", 6)),
+            "suicide_intervals": int(o_fencing["intervals"].get("suicide_interval", 0)),
+            "successful_fence": o_fencing["actions"].get("successful_fence", None),
+            "failed_fence": o_fencing["actions"].get("failed_fence", None),
+            "ipmi_hostname": o_fencing["ipmi"]["hostname"].format(node_id=node_id),
+            "ipmi_username": o_fencing["ipmi"]["username"],
+            "ipmi_password": o_fencing["ipmi"]["password"],
+        }
+        config = {**config, **config_fencing}
+
+        o_migration = o_config["migration"]
+        config_migration = {
+            "migration_target_selector": o_migration.get("target_selector", "mem"),
+        }
+        config = {**config, **config_migration}
+
+        o_logging = o_config["logging"]
+        config_logging = {
+            "debug": o_logging.get("debug_logging", False),
+            "file_logging": o_logging.get("file_logging", False),
+            "stdout_logging": o_logging.get("stdout_logging", False),
+            "zookeeper_logging": o_logging.get("zookeeper_logging", False),
+            "log_colours": o_logging.get("log_colours", False),
+            "log_dates": o_logging.get("log_dates", False),
+            "log_keepalives": o_logging.get("log_keepalives", False),
+            "log_keepalive_cluster_details": o_logging.get(
+                "log_cluster_details", False
+            ),
+            "log_monitoring_details": o_logging.get("log_monitoring_details", False),
+            "console_log_lines": o_logging.get("console_log_lines", False),
+            "node_log_lines": o_logging.get("node_log_lines", False),
+        }
+        config = {**config, **config_logging}
+
+        o_guest_networking = o_config["guest_networking"]
+        config_guest_networking = {
+            "bridge_dev": o_guest_networking["bridge_device"],
+            "bridge_mtu": o_guest_networking["bridge_mtu"],
+            "enable_sriov": o_guest_networking.get("sriov_enable", False),
+            "sriov_device": o_guest_networking.get("sriov_device", list()),
+        }
+        config = {**config, **config_guest_networking}
+
+        o_ceph = o_config["ceph"]
+        config_ceph = {
+            "ceph_config_file": config["ceph_directory"]
+            + "/"
+            + o_ceph["ceph_config_file"],
+            "ceph_admin_keyring": config["ceph_directory"]
+            + "/"
+            + o_ceph["ceph_keyring_file"],
+            "ceph_monitor_port": o_ceph["monitor_port"],
+            "ceph_secret_uuid": o_ceph["secret_uuid"],
+            "storage_hosts": o_ceph.get("monitor_hosts", None),
+        }
+        config = {**config, **config_ceph}
+
+        o_api = o_config["api"]
+
+        o_api_listen = o_api["listen"]
+        config_api_listen = {
+            "api_listen_address": o_api_listen["address"],
+            "api_listen_port": o_api_listen["port"],
+        }
+        config = {**config, **config_api_listen}
+
+        o_api_authentication = o_api["authentication"]
+        config_api_authentication = {
+            "api_auth_enabled": o_api_authentication.get("enabled", False),
+            "api_auth_secret_key": o_api_authentication.get("secret_key", ""),
+            "api_auth_source": o_api_authentication.get("source", "token"),
+        }
+        config = {**config, **config_api_authentication}
+
+        o_api_ssl = o_api["ssl"]
+        config_api_ssl = {
+            "api_ssl_enabled": o_api_ssl.get("enabled", False),
+            "api_ssl_cert_file": o_api_ssl.get("certificate", None),
+            "api_ssl_key_file": o_api_ssl.get("private_key", None),
+        }
+        config = {**config, **config_api_ssl}
+
+        # Use coordinators as storage hosts if not explicitly specified
+        if not config["storage_hosts"] or len(config["storage_hosts"]) < 1:
+            config["storage_hosts"] = config["coordinators"]
+
+        # Set up our token list if specified
+        if config["api_auth_source"] == "token":
+            config["api_auth_tokens"] = o_api["token"]
+        else:
+            if config["api_auth_enabled"]:
+                print(
+                    "WARNING: No authentication method provided; disabling API authentication."
+                )
+                config["api_auth_enabled"] = False
+
+        # Add our node static data to the config
+        config["static_data"] = get_static_data()
+
+    except Exception as e:
+        raise MalformedConfigurationError(e)
+
+    return config
+
+
+def get_configuration():
+    """
+    Get the configuration.
+    """
+    pvc_config_file = get_configuration_path()
+    config = get_parsed_configuration(pvc_config_file)
+    return config
+
+
+def validate_directories(config):
+    if not os.path.exists(config["dynamic_directory"]):
+        os.makedirs(config["dynamic_directory"])
+        os.makedirs(config["dnsmasq_dynamic_directory"])
+        os.makedirs(config["pdns_dynamic_directory"])
+        os.makedirs(config["nft_dynamic_directory"])
+
+    if not os.path.exists(config["log_directory"]):
+        os.makedirs(config["log_directory"])
+        os.makedirs(config["dnsmasq_log_directory"])
+        os.makedirs(config["pdns_log_directory"])
+        os.makedirs(config["nft_log_directory"])
--- a/daemon-common/log.py
+++ b/daemon-common/log.py
@ -76,9 +76,11 @@ class Logger(object):
        self.config = config

        if self.config["file_logging"]:
-            self.logfile = self.config["log_directory"] + "/pvc.log"
+            self.logfile = (
+                self.config["log_directory"] + "/" + self.config["daemon_name"] + ".log"
+            )
            # We open the logfile for the duration of our session, but have a hup function
-            self.writer = open(self.logfile, "a", buffering=0)
+            self.writer = open(self.logfile, "a")

        self.last_colour = ""
        self.last_prompt = ""
@ -91,14 +93,12 @@ class Logger(object):
    # Provide a hup function to close and reopen the writer
    def hup(self):
        self.writer.close()
-        self.writer = open(self.logfile, "a", buffering=0)
+        self.writer = open(self.logfile, "a")

    # Provide a termination function so all messages are flushed before terminating the main daemon
    def terminate(self):
-        if self.config["file_logging"]:
-            self.writer.close()
        if self.config["zookeeper_logging"]:
-            self.out("Waiting 15s for Zookeeper message queue to drain", state="s")
+            self.out("Waiting for Zookeeper message queue to drain", state="s")

            tick_count = 0
            while not self.zookeeper_queue.empty():
@ -110,6 +110,9 @@ class Logger(object):
            self.zookeeper_logger.stop()
            self.zookeeper_logger.join()

+        if self.config["file_logging"]:
+            self.writer.close()
+
    # Output function
    def out(self, message, state=None, prefix=""):
        # Get the date
@ -139,20 +142,28 @@ class Logger(object):
        if prefix != "":
            prefix = prefix + " - "

-        # Assemble message string
-        message = colour + prompt + endc + date + prefix + message
-
        # Log to stdout
        if self.config["stdout_logging"]:
-            print(message)
+            # Assemble output string
+            output = colour + prompt + endc + date + prefix + message
+            print(output)

        # Log to file
        if self.config["file_logging"]:
-            self.writer.write(message + "\n")
+            # Assemble output string
+            output = colour + prompt + endc + date + prefix + message
+            self.writer.write(output + "\n")
+            self.writer.flush()

        # Log to Zookeeper
        if self.config["zookeeper_logging"]:
-            self.zookeeper_queue.put(message)
+            # Set the daemon value (only used here as others do not overlap with different daemons)
+            daemon = f"{self.config['daemon_name']}: "
+            # Expand to match all daemon names (pvcnoded, pvcworkerd, pvchealthd)
+            daemon = "{daemon: <12}".format(daemon=daemon)
+            # Assemble output string
+            output = daemon + colour + prompt + endc + date + prefix + message
+            self.zookeeper_queue.put(output)

        # Set last message variables
        self.last_colour = colour
@ -198,16 +209,9 @@ class ZookeeperLogger(Thread):
        self.zkhandler.write([("base.logs", ""), (("logs", self.node), "")])

    def run(self):
-        while not self.connected:
-            self.start_zkhandler()
-            sleep(1)
-
+        self.start_zkhandler()
        self.running = True
-        # Get the logs that are currently in Zookeeper and populate our deque
-        raw_logs = self.zkhandler.read(("logs.messages", self.node))
-        if raw_logs is None:
-            raw_logs = ""
-        logs = deque(raw_logs.split("\n"), self.max_lines)
+
        while self.running:
            # Get a new message
            try:
@ -222,25 +226,26 @@ class ZookeeperLogger(Thread):
                date = "{} ".format(datetime.now().strftime("%Y/%m/%d %H:%M:%S.%f"))
            else:
                date = ""
-            # Add the message to the deque
-            logs.append(f"{date}{message}")

-            tick_count = 0
-            while True:
-                try:
+            try:
+                with self.zkhandler.writelock(("logs.messages", self.node)):
+                    # Get the logs that are currently in Zookeeper and populate our deque
+                    cur_logs = self.zkhandler.read(("logs.messages", self.node))
+                    if cur_logs is None:
+                        cur_logs = ""
+
+                    logs = deque(cur_logs.split("\n"), self.max_lines - 1)
+
+                    # Add the message to the deque
+                    logs.append(f"{date}{message}")
+
                    # Write the updated messages into Zookeeper
                    self.zkhandler.write(
                        [(("logs.messages", self.node), "\n".join(logs))]
                    )
-                    break
-                except Exception:
-                    # The write failed (connection loss, etc.) so retry for 15 seconds
-                    sleep(0.5)
-                    tick_count += 1
-                    if tick_count > 30:
-                        break
-                    else:
-                        continue
+            except Exception:
+                continue
+
        return

    def stop(self):
--- a/api-daemon/pvcapid/vmbuilder.py
+++ b/api-daemon/pvcapid/vmbuilder.py
@ -31,8 +31,6 @@ import uuid

 from contextlib import contextmanager

-from pvcapid.Daemon import config
-
 from daemon_lib.zkhandler import ZKHandler
 from daemon_lib.celery import start, fail, log_info, log_warn, log_err, update, finish

@ -167,11 +165,11 @@ def chroot(destination):
 def open_db(config):
    try:
        conn = psycopg2.connect(
-            host=config["database_host"],
-            port=config["database_port"],
-            dbname=config["database_name"],
-            user=config["database_user"],
-            password=config["database_password"],
+            host=config["api_postgresql_host"],
+            port=config["api_postgresql_port"],
+            dbname=config["api_postgresql_name"],
+            user=config["api_postgresql_user"],
+            password=config["api_postgresql_password"],
        )
        cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
    except Exception:
@ -216,8 +214,14 @@ def open_zk(config):
 #
 # Main VM provisioning function - executed by the Celery worker
 #
-def create_vm(
-    celery, vm_name, vm_profile, define_vm=True, start_vm=True, script_run_args=[]
+def worker_create_vm(
+    celery,
+    config,
+    vm_name,
+    vm_profile,
+    define_vm=True,
+    start_vm=True,
+    script_run_args=[],
 ):
    current_stage = 0
    total_stages = 11
@ -326,9 +330,9 @@ def create_vm(
    vm_data["system_architecture"] = stdout.strip()

    monitor_list = list()
-    coordinator_names = config["storage_hosts"]
-    for coordinator in coordinator_names:
-        monitor_list.append("{}.{}".format(coordinator, config["storage_domain"]))
+    monitor_names = config["storage_hosts"]
+    for monitor in monitor_names:
+        monitor_list.append("{}.{}".format(monitor, config["storage_domain"]))
    vm_data["ceph_monitor_list"] = monitor_list
    vm_data["ceph_monitor_port"] = config["ceph_monitor_port"]
    vm_data["ceph_monitor_secret"] = config["ceph_storage_secret_uuid"]
--- a/debian/changelog
+++ b/debian/changelog
@ -1,3 +1,23 @@
+pvc (0.9.83-0) unstable; urgency=high
+
+  **Breaking Changes:** This release features a breaking change for the daemon config. A new unified "pvc.conf" file is required for all daemons (and the CLI client for Autobackup and API-on-this-host functionality), which will be written by the "pvc" role in the PVC Ansible framework. Using the "update-pvc-daemons" oneshot playbook from PVC Ansible is **required** to update to this release, as it will ensure this file is written to the proper place before deploying the new package versions, and also ensures that the old entires are cleaned up afterwards. In addition, this release fully splits the node worker and health subsystems into discrete daemons ("pvcworkerd" and "pvchealthd") and packages ("pvc-daemon-worker" and "pvc-daemon-health") respectively. The "pvc-daemon-node" package also now depends on both packages, and the "pvc-daemon-api" package can now be reliably used outside of the PVC nodes themselves (for instance, in a VM) without any strange cross-dependency issues.
+
+  * [All] Unifies all daemon (and on-node CLI task) configuration into a "pvc.conf" YAML configuration.
+  * [All] Splits the node worker subsystem into a discrete codebase and package ("pvc-daemon-worker"), still named "pvcworkerd".
+  * [All] Splits the node health subsystem into a discrete codebase and package ("pvc-daemon-health"), named "pvchealthd".
+  * [All] Improves Zookeeper node logging to avoid bugs and to support multiple simultaneous daemon writes.
+  * [All] Fixes several bugs in file logging and splits file logs by daemon.
+  * [Node Daemon] Improves several log messages to match new standards from Health daemon.
+  * [API Daemon] Reworks Celery task routing and handling to move all worker tasks to Worker daemon.
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Fri, 01 Dec 2023 17:33:53 -0500
+
+pvc (0.9.82-0) unstable; urgency=high
+
+  * [API Daemon] Fixes a bug where the Celery result_backend was not loading properly on Celery <5.2.x (Debian 10/11).
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Sat, 25 Nov 2023 15:38:50 -0500
+
 pvc (0.9.81-0) unstable; urgency=high

  **Breaking Changes:** This large release features a number of major changes. While these should all be a seamless transition, the behaviour of several commands and the backend system for handling them has changed significantly, along with new dependencies from PVC Ansible. A full cluster configuration update via `pvc.yml` is recommended after installing this version. Redis is replaced with KeyDB on coordinator nodes as a Celery backend; this transition will be handled gracefully by the `pvc-ansible` playbooks, though note that KeyDB will be exposed on the Upstream interface. The Celery worker system is renamed `pvcworkerd`, is now active on all nodes (coordinator and non-coordinator), and is expanded to encompass several commands that previously used a similar, custom setup within the node daemons, including "pvc vm flush-locks" and all "pvc storage osd" tasks. The previously-mentioned CLI commands now all feature "--wait"/"--no-wait" flags, with wait showing a progress bar and status output of the task run. The "pvc cluster task" command can now used for viewing all task types, replacing the previously-custom/specific "pvc provisioner status" command. All example provisioner scripts have been updated to leverage new helper functions in the Celery system; while updating these is optional, an administrator is recommended to do so for optimal log output behaviour.
--- a/debian/control
+++ b/debian/control
@ -4,20 +4,36 @@ Priority: optional
 Maintainer: Joshua Boniface <joshua@boniface.me>
 Standards-Version: 3.9.8
 Homepage: https://www.boniface.me
-X-Python3-Version: >= 3.2
+X-Python3-Version: >= 3.7

 Package: pvc-daemon-node
 Architecture: all
-Depends: systemd, pvc-daemon-common, python3-kazoo, python3-psutil, python3-apscheduler, python3-libvirt, python3-psycopg2, python3-dnspython, python3-yaml, python3-distutils, python3-rados, python3-gevent, ipmitool, libvirt-daemon-system, arping, vlan, bridge-utils, dnsmasq, nftables, pdns-server, pdns-backend-pgsql
-Description: Parallel Virtual Cluster node daemon (Python 3)
+Depends: systemd, pvc-daemon-common, pvc-daemon-health, pvc-daemon-worker, python3-kazoo, python3-psutil, python3-apscheduler, python3-libvirt, python3-psycopg2, python3-dnspython, python3-yaml, python3-distutils, python3-rados, python3-gevent, ipmitool, libvirt-daemon-system, arping, vlan, bridge-utils, dnsmasq, nftables, pdns-server, pdns-backend-pgsql
+Description: Parallel Virtual Cluster node daemon
 A KVM/Zookeeper/Ceph-based VM and private cloud manager
 .
 This package installs the PVC node daemon

+Package: pvc-daemon-health
+Architecture: all
+Depends: systemd, pvc-daemon-common, python3-kazoo, python3-psutil, python3-apscheduler, python3-yaml
+Description: Parallel Virtual Cluster health daemon
+ A KVM/Zookeeper/Ceph-based VM and private cloud manager
+ .
+ This package installs the PVC health monitoring daemon
+
+Package: pvc-daemon-worker
+Architecture: all
+Depends: systemd, pvc-daemon-common, python3-kazoo, python3-celery, python3-redis, python3-yaml, python-celery-common, fio
+Description: Parallel Virtual Cluster worker daemon
+ A KVM/Zookeeper/Ceph-based VM and private cloud manager
+ .
+ This package installs the PVC Celery task worker daemon
+
 Package: pvc-daemon-api
 Architecture: all
-Depends: systemd, pvc-daemon-common, python3-yaml, python3-flask, python3-flask-restful, python3-celery, python-celery-common, python3-distutils, python3-redis, python3-lxml, python3-flask-migrate, fio
-Description: Parallel Virtual Cluster API daemon (Python 3)
+Depends: systemd, pvc-daemon-common, python3-yaml, python3-flask, python3-flask-restful, python3-celery, python3-distutils, python3-redis, python3-lxml, python3-flask-migrate
+Description: Parallel Virtual Cluster API daemon
 A KVM/Zookeeper/Ceph-based VM and private cloud manager
 .
 This package installs the PVC API daemon
@ -25,7 +41,7 @@ Description: Parallel Virtual Cluster API daemon (Python 3)
 Package: pvc-daemon-common
 Architecture: all
 Depends: python3-kazoo, python3-psutil, python3-click, python3-lxml
-Description: Parallel Virtual Cluster common libraries (Python 3)
+Description: Parallel Virtual Cluster common libraries
 A KVM/Zookeeper/Ceph-based VM and private cloud manager
 .
 This package installs the common libraries for the daemon and API
@ -33,7 +49,7 @@ Description: Parallel Virtual Cluster common libraries (Python 3)
 Package: pvc-client-cli
 Architecture: all
 Depends: python3-requests, python3-requests-toolbelt, python3-yaml, python3-lxml, python3-click
-Description: Parallel Virtual Cluster CLI client (Python 3)
+Description: Parallel Virtual Cluster CLI client
 A KVM/Zookeeper/Ceph-based VM and private cloud manager
 .
 This package installs the PVC API command-line client
--- a/debian/pvc-client-cli.install
+++ b/debian/pvc-client-cli.install
@ -1 +0,0 @@
-client-cli/autobackup.sample.yaml usr/share/pvc
--- a/debian/pvc-daemon-api.install
+++ b/debian/pvc-daemon-api.install
@ -1,10 +1,7 @@
 api-daemon/pvcapid.py usr/share/pvc
 api-daemon/pvcapid-manage*.py usr/share/pvc
 api-daemon/pvc-api-db-upgrade usr/share/pvc
-api-daemon/pvcapid.sample.yaml usr/share/pvc
 api-daemon/pvcapid usr/share/pvc
 api-daemon/pvcapid.service lib/systemd/system
-api-daemon/pvcworkerd.service lib/systemd/system
-api-daemon/pvcworkerd.sh usr/share/pvc
 api-daemon/provisioner usr/share/pvc
 api-daemon/migrations usr/share/pvc
--- a/debian/pvc-daemon-api.postinst
+++ b/debian/pvc-daemon-api.postinst
@ -15,9 +15,6 @@ if systemctl is-active --quiet pvcworkerd.service; then
    systemctl start pvcworkerd.service
 fi

-if [ ! -f /etc/pvc/pvcapid.yaml ]; then
-    echo "NOTE: The PVC client API daemon (pvcapid.service) and the PVC Worker daemon (pvcworkerd.service) have not been started; create a config file at /etc/pvc/pvcapid.yaml, then run the database configuration (/usr/share/pvc/pvc-api-db-upgrade) and start them manually."
+if [ ! -f /etc/pvc/pvc.conf ]; then
+    echo "NOTE: The PVC client API daemon (pvcapid.service) and the PVC Worker daemon (pvcworkerd.service) have not been started; create a config file at /etc/pvc/pvc.conf, then run the database configuration (/usr/share/pvc/pvc-api-db-upgrade) and start them manually."
 fi
-
-# Clean up any old sample configs
-rm /etc/pvc/pvcapid.sample.yaml || true
--- a/debian/pvc-daemon-api.preinst
+++ b/debian/pvc-daemon-api.preinst
@ -0,0 +1,5 @@
+#!/bin/sh
+
+# Remove any cached CPython directories or files
+echo "Cleaning up existing CPython files"
+find /usr/share/pvc/pvcapid -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
--- a/debian/pvc-daemon-health.install
+++ b/debian/pvc-daemon-health.install
@ -0,0 +1,4 @@
+health-daemon/pvchealthd.py usr/share/pvc
+health-daemon/pvchealthd usr/share/pvc
+health-daemon/pvchealthd.service lib/systemd/system
+health-daemon/plugins usr/share/pvc
--- a/debian/pvc-daemon-health.postinst
+++ b/debian/pvc-daemon-health.postinst
@ -0,0 +1,14 @@
+#!/bin/sh
+
+# Reload systemd's view of the units
+systemctl daemon-reload
+
+# Enable the service and target
+systemctl enable /lib/systemd/system/pvchealthd.service
+
+# Inform administrator of the service restart/startup not occurring automatically
+if systemctl is-active --quiet pvchealthd.service; then
+    echo "NOTE: The PVC health daemon (pvchealthd.service) has not been restarted; this is up to the administrator."
+else
+    echo "NOTE: The PVC health daemon (pvchealthd.service) has not been started; create a config file at /etc/pvc/pvc.conf then start it."
+fi
--- a/debian/pvc-daemon-health.preinst
+++ b/debian/pvc-daemon-health.preinst
@ -0,0 +1,6 @@
+#!/bin/sh
+
+# Remove any cached CPython directories or files
+echo "Cleaning up existing CPython files"
+find /usr/share/pvc/pvchealthd -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/plugins -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
--- a/debian/pvc-daemon-health.prerm
+++ b/debian/pvc-daemon-health.prerm
@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Disable the services
+systemctl disable pvchealthd.service
--- a/debian/pvc-daemon-node.install
+++ b/debian/pvc-daemon-node.install
@ -1,8 +1,6 @@
 node-daemon/pvcnoded.py usr/share/pvc
-node-daemon/pvcnoded.sample.yaml usr/share/pvc
 node-daemon/pvcnoded usr/share/pvc
 node-daemon/pvcnoded.service lib/systemd/system
 node-daemon/pvc.target lib/systemd/system
 node-daemon/pvcautoready.service lib/systemd/system
 node-daemon/monitoring usr/share/pvc
-node-daemon/plugins usr/share/pvc
--- a/debian/pvc-daemon-node.postinst
+++ b/debian/pvc-daemon-node.postinst
@ -12,8 +12,5 @@ systemctl enable /lib/systemd/system/pvc.target
 if systemctl is-active --quiet pvcnoded.service; then
    echo "NOTE: The PVC node daemon (pvcnoded.service) has not been restarted; this is up to the administrator."
 else
-    echo "NOTE: The PVC node daemon (pvcnoded.service) has not been started; create a config file at /etc/pvc/pvcnoded.yaml then start it."
+    echo "NOTE: The PVC node daemon (pvcnoded.service) has not been started; create a config file at /etc/pvc/pvc.conf then start it."
 fi
-
-# Clean up any old sample configs
-rm /etc/pvc/pvcnoded.sample.yaml || true
--- a/debian/pvc-daemon-node.preinst
+++ b/debian/pvc-daemon-node.preinst
@ -2,4 +2,4 @@

 # Remove any cached CPython directories or files
 echo "Cleaning up existing CPython files"
-find /usr/share/pvc -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
+find /usr/share/pvc/pvcnoded -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
--- a/debian/pvc-daemon-worker.install
+++ b/debian/pvc-daemon-worker.install
@ -0,0 +1,4 @@
+worker-daemon/pvcworkerd.sh usr/share/pvc
+worker-daemon/pvcworkerd.py usr/share/pvc
+worker-daemon/pvcworkerd usr/share/pvc
+worker-daemon/pvcworkerd.service lib/systemd/system
--- a/debian/pvc-daemon-worker.postinst
+++ b/debian/pvc-daemon-worker.postinst
@ -0,0 +1,14 @@
+#!/bin/sh
+
+# Reload systemd's view of the units
+systemctl daemon-reload
+
+# Enable the service and target
+systemctl enable /lib/systemd/system/pvcworkerd.service
+
+# Inform administrator of the service restart/startup not occurring automatically
+if systemctl is-active --quiet pvcworkerd.service; then
+    echo "NOTE: The PVC worker daemon (pvcworkerd.service) has not been restarted; this is up to the administrator."
+else
+    echo "NOTE: The PVC worker daemon (pvcworkerd.service) has not been started; create a config file at /etc/pvc/pvc.conf then start it."
+fi
--- a/debian/pvc-daemon-worker.preinst
+++ b/debian/pvc-daemon-worker.preinst
@ -0,0 +1,5 @@
+#!/bin/sh
+
+# Remove any cached CPython directories or files
+echo "Cleaning up existing CPython files"
+find /usr/share/pvc/pvcworkerd -type d -name "__pycache__" -exec rm -rf {} \; &>/dev/null || true
--- a/debian/pvc-daemon-worker.prerm
+++ b/debian/pvc-daemon-worker.prerm
@ -0,0 +1,4 @@
+#!/bin/sh
+
+# Disable the services
+systemctl disable pvcworkerd.service
--- a/2
+++ b/2
@ -8,7 +8,7 @@ import os
 import sys
 import json

-os.environ['PVC_CONFIG_FILE'] = "./api-daemon/pvcapid.sample.yaml"
+os.environ['PVC_CONFIG_FILE'] = "./pvc.sample.conf"

 sys.path.append('api-daemon')

--- a/health-daemon/daemon_lib
+++ b/health-daemon/daemon_lib
@ -0,0 +1 @@
+../daemon-common
--- a/health-daemon/plugins/disk
+++ b/health-daemon/plugins/disk
@ -39,7 +39,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/dpkg
+++ b/health-daemon/plugins/dpkg
@ -40,7 +40,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/edac
+++ b/health-daemon/plugins/edac
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/hwrd
+++ b/health-daemon/plugins/hwrd
@ -39,7 +39,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/ipmi
+++ b/health-daemon/plugins/ipmi
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/kydb
+++ b/health-daemon/plugins/kydb
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
@ -75,7 +75,7 @@ class MonitoringPluginScript(MonitoringPlugin):
        # Set the health delta to 0 (no change)
        health_delta = 0
        # Craft a message that can be used by the clients
-        message = "Successfully connected to Libvirtd on localhost"
+        message = "Successfully connected to KeyDB/Redis on localhost"

        # Check the Zookeeper connection
        try:
--- a/health-daemon/plugins/lbvt
+++ b/health-daemon/plugins/lbvt
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/load
+++ b/health-daemon/plugins/load
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/nics
+++ b/health-daemon/plugins/nics
@ -39,7 +39,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/psql
+++ b/health-daemon/plugins/psql
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
@ -67,8 +67,8 @@ class MonitoringPluginScript(MonitoringPlugin):
        # Run any imports first
        from psycopg2 import connect

-        conn_metadata = None
-        cur_metadata = None
+        conn_api = None
+        cur_api = None
        conn_dns = None
        cur_dns = None

@ -79,25 +79,25 @@ class MonitoringPluginScript(MonitoringPlugin):

        # Check the Metadata database (primary)
        try:
-            conn_metadata = connect(
+            conn_api = connect(
                host=self.this_node.name,
-                port=self.config["metadata_postgresql_port"],
-                dbname=self.config["metadata_postgresql_dbname"],
-                user=self.config["metadata_postgresql_user"],
-                password=self.config["metadata_postgresql_password"],
+                port=self.config["api_postgresql_port"],
+                dbname=self.config["api_postgresql_dbname"],
+                user=self.config["api_postgresql_user"],
+                password=self.config["api_postgresql_password"],
            )
-            cur_metadata = conn_metadata.cursor()
-            cur_metadata.execute("""SELECT * FROM alembic_version""")
-            data = cur_metadata.fetchone()
+            cur_api = conn_api.cursor()
+            cur_api.execute("""SELECT * FROM alembic_version""")
+            data = cur_api.fetchone()
        except Exception as e:
            health_delta = 50
            err = str(e).split('\n')[0]
-            message = f"Failed to connect to PostgreSQL database {self.config['metadata_postgresql_dbname']}: {err}"
+            message = f"Failed to connect to PostgreSQL database {self.config['api_postgresql_dbname']}: {err}"
        finally:
-            if cur_metadata is not None:
-                cur_metadata.close()
-            if conn_metadata is not None:
-                conn_metadata.close()
+            if cur_api is not None:
+                cur_api.close()
+            if conn_api is not None:
+                conn_api.close()

        if health_delta == 0:
            # Check the PowerDNS database (secondary)
--- a/health-daemon/plugins/psur
+++ b/health-daemon/plugins/psur
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/plugins/zkpr
+++ b/health-daemon/plugins/zkpr
@ -38,7 +38,7 @@

 # This import is always required here, as MonitoringPlugin is used by the
 # MonitoringPluginScript class
-from pvcnoded.objects.MonitoringInstance import MonitoringPlugin
+from pvchealthd.objects.MonitoringInstance import MonitoringPlugin


 # A monitoring plugin script must always expose its nice name, which must be identical to
--- a/health-daemon/pvchealthd.py
+++ b/health-daemon/pvchealthd.py
@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+
+# pvchealthd.py - Health daemon startup stub
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import pvchealthd.Daemon  # noqa: F401
+
+pvchealthd.Daemon.entrypoint()
--- a/health-daemon/pvchealthd.service
+++ b/health-daemon/pvchealthd.service
@ -0,0 +1,20 @@
+# Parallel Virtual Cluster health daemon unit file
+
+[Unit]
+Description = Parallel Virtual Cluster health daemon
+After = network.target
+Wants = network-online.target
+PartOf = pvc.target
+
+[Service]
+Type = simple
+WorkingDirectory = /usr/share/pvc
+Environment = PYTHONUNBUFFERED=true
+Environment = PVC_CONFIG_FILE=/etc/pvc/pvc.conf
+ExecStartPre = /bin/sleep 2
+ExecStart = /usr/share/pvc/pvchealthd.py
+ExecStopPost = /bin/sleep 2
+Restart = on-failure
+
+[Install]
+WantedBy = pvc.target
--- a/health-daemon/pvchealthd/Daemon.py
+++ b/health-daemon/pvchealthd/Daemon.py
@ -0,0 +1,145 @@
+#!/usr/bin/env python3
+
+# Daemon.py - Health daemon main entrypoing
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import pvchealthd.util.zookeeper
+
+import pvchealthd.objects.MonitoringInstance as MonitoringInstance
+import pvchealthd.objects.NodeInstance as NodeInstance
+
+import daemon_lib.config as cfg
+import daemon_lib.log as log
+
+from time import sleep
+
+import os
+import signal
+
+# Daemon version
+version = "0.9.83"
+
+
+##########################################################
+# Entrypoint
+##########################################################
+
+
+def entrypoint():
+    monitoring_instance = None
+
+    # Get our configuration
+    config = cfg.get_configuration()
+    config["daemon_name"] = "pvchealthd"
+    config["daemon_version"] = version
+
+    # Set up the logger instance
+    logger = log.Logger(config)
+
+    # Print our startup message
+    logger.out("")
+    logger.out("|--------------------------------------------------------------|")
+    logger.out("|                                                              |")
+    logger.out("|             ███████████ ▜█▙      ▟█▛ █████ █ █ █             |")
+    logger.out("|                      ██  ▜█▙    ▟█▛  ██                      |")
+    logger.out("|             ███████████   ▜█▙  ▟█▛   ██                      |")
+    logger.out("|             ██             ▜█▙▟█▛    ███████████             |")
+    logger.out("|                                                              |")
+    logger.out("|--------------------------------------------------------------|")
+    logger.out("| Parallel Virtual Cluster health daemon v{0: <20} |".format(version))
+    logger.out("| Debug: {0: <53} |".format(str(config["debug"])))
+    logger.out("| FQDN: {0: <54} |".format(config["node_fqdn"]))
+    logger.out("| Host: {0: <54} |".format(config["node_hostname"]))
+    logger.out("| ID: {0: <56} |".format(config["node_id"]))
+    logger.out("| IPMI hostname: {0: <45} |".format(config["ipmi_hostname"]))
+    logger.out("| Machine details:                                             |")
+    logger.out("|   CPUs: {0: <52} |".format(config["static_data"][0]))
+    logger.out("|   Arch: {0: <52} |".format(config["static_data"][3]))
+    logger.out("|   OS: {0: <54} |".format(config["static_data"][2]))
+    logger.out("|   Kernel: {0: <50} |".format(config["static_data"][1]))
+    logger.out("|--------------------------------------------------------------|")
+    logger.out("")
+    logger.out(f'Starting pvchealthd on host {config["node_fqdn"]}', state="s")
+
+    # Connect to Zookeeper and return our handler and current schema version
+    zkhandler, _ = pvchealthd.util.zookeeper.connect(logger, config)
+
+    # Define a cleanup function
+    def cleanup(failure=False):
+        nonlocal logger, zkhandler, monitoring_instance
+
+        logger.out("Terminating pvchealthd and cleaning up", state="s")
+
+        # Shut down the monitoring system
+        try:
+            logger.out("Shutting down monitoring subsystem", state="s")
+            monitoring_instance.shutdown()
+        except Exception:
+            pass
+
+        # Close the Zookeeper connection
+        try:
+            zkhandler.disconnect(persistent=True)
+            del zkhandler
+        except Exception:
+            pass
+
+        logger.out("Terminated health daemon", state="s")
+        logger.terminate()
+
+        if failure:
+            retcode = 1
+        else:
+            retcode = 0
+
+        os._exit(retcode)
+
+    # Termination function
+    def term(signum="", frame=""):
+        cleanup(failure=False)
+
+    # Hangup (logrotate) function
+    def hup(signum="", frame=""):
+        if config["file_logging"]:
+            logger.hup()
+
+    # Handle signals gracefully
+    signal.signal(signal.SIGTERM, term)
+    signal.signal(signal.SIGINT, term)
+    signal.signal(signal.SIGQUIT, term)
+    signal.signal(signal.SIGHUP, hup)
+
+    this_node = NodeInstance.NodeInstance(
+        config["node_hostname"],
+        zkhandler,
+        config,
+        logger,
+    )
+
+    # Set up the node monitoring instance and thread
+    monitoring_instance = MonitoringInstance.MonitoringInstance(
+        zkhandler, config, logger, this_node
+    )
+
+    # Tick loop; does nothing since everything is async
+    while True:
+        try:
+            sleep(1)
+        except Exception:
+            break
--- a/health-daemon/pvchealthd/init.py
+++ b/health-daemon/pvchealthd/init.py
--- a/health-daemon/pvchealthd/objects/MonitoringInstance.py
+++ b/health-daemon/pvchealthd/objects/MonitoringInstance.py
@ -1,6 +1,6 @@
 #!/usr/bin/env python3

-# PluginInstance.py - Class implementing a PVC monitoring instance
+# MonitoringInstance.py - Class implementing a PVC monitor in pvchealthd
 # Part of the Parallel Virtual Cluster (PVC) system
 #
 #    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
@ -317,8 +317,17 @@ class MonitoringInstance(object):
        )

        if successful_plugins < 1:
+            self.logger.out(
+                "No plugins loaded; pvchealthd going into noop loop. Incorrect plugin directory? Fix and restart pvchealthd.",
+                state="e",
+            )
            return

+        self.logger.out(
+            f'{self.logger.fmt_cyan}Plugin list:{self.logger.fmt_end} {" ".join(self.all_plugin_names)}',
+            state="s",
+        )
+
        # Clean up any old plugin data for which a plugin file no longer exists
        plugins_data = self.zkhandler.children(
            ("node.monitoring.data", self.this_node.name)
@ -344,6 +353,7 @@ class MonitoringInstance(object):
    def shutdown(self):
        self.stop_check_timer()
        self.run_cleanups()
+        return

    def start_check_timer(self):
        check_interval = self.config["monitoring_interval"]
@ -395,6 +405,11 @@ class MonitoringInstance(object):
        active_coordinator_state = self.this_node.coordinator_state

        runtime_start = datetime.now()
+        self.logger.out(
+            "Starting monitoring healthcheck run",
+            state="t",
+        )
+
        total_health = 100
        plugin_results = list()
        with concurrent.futures.ThreadPoolExecutor(max_workers=99) as executor:
@ -406,10 +421,7 @@ class MonitoringInstance(object):
                plugin_results.append(future.result())

        for result in sorted(plugin_results, key=lambda x: x.plugin_name):
-            if (
-                self.config["log_keepalives"]
-                and self.config["log_keepalive_plugin_details"]
-            ):
+            if self.config["log_monitoring_details"]:
                self.logger.out(
                    result.message + f" [-{result.health_delta}]",
                    state="t",
--- a/health-daemon/pvchealthd/objects/NodeInstance.py
+++ b/health-daemon/pvchealthd/objects/NodeInstance.py
@ -0,0 +1,224 @@
+#!/usr/bin/env python3
+
+# NodeInstance.py - Class implementing a PVC node in pvchealthd
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+
+class NodeInstance(object):
+    # Initialization function
+    def __init__(
+        self,
+        name,
+        zkhandler,
+        config,
+        logger,
+    ):
+        # Passed-in variables on creation
+        self.name = name
+        self.zkhandler = zkhandler
+        self.config = config
+        self.logger = logger
+        # States
+        self.daemon_state = "stop"
+        self.coordinator_state = "client"
+        self.domain_state = "flushed"
+        # Node resources
+        self.health = 100
+        self.active_domains_count = 0
+        self.provisioned_domains_count = 0
+        self.memused = 0
+        self.memfree = 0
+        self.memalloc = 0
+        self.vcpualloc = 0
+
+        # Zookeeper handlers for changed states
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.state.daemon", self.name)
+        )
+        def watch_node_daemonstate(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = "stop"
+
+            if data != self.daemon_state:
+                self.daemon_state = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.state.router", self.name)
+        )
+        def watch_node_routerstate(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = "client"
+
+            if data != self.coordinator_state:
+                self.coordinator_state = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.state.domain", self.name)
+        )
+        def watch_node_domainstate(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = "unknown"
+
+            if data != self.domain_state:
+                self.domain_state = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.monitoring.health", self.name)
+        )
+        def watch_node_health(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 100
+
+            try:
+                data = int(data)
+            except ValueError:
+                pass
+
+            if data != self.health:
+                self.health = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.memory.free", self.name)
+        )
+        def watch_node_memfree(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 0
+
+            if data != self.memfree:
+                self.memfree = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.memory.used", self.name)
+        )
+        def watch_node_memused(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 0
+
+            if data != self.memused:
+                self.memused = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.memory.allocated", self.name)
+        )
+        def watch_node_memalloc(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 0
+
+            if data != self.memalloc:
+                self.memalloc = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.vcpu.allocated", self.name)
+        )
+        def watch_node_vcpualloc(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 0
+
+            if data != self.vcpualloc:
+                self.vcpualloc = data
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.running_domains", self.name)
+        )
+        def watch_node_runningdomains(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii").split()
+            except AttributeError:
+                data = []
+
+            if len(data) != self.active_domains_count:
+                self.active_domains_count = len(data)
+
+        @self.zkhandler.zk_conn.DataWatch(
+            self.zkhandler.schema.path("node.count.provisioned_domains", self.name)
+        )
+        def watch_node_domainscount(data, stat, event=""):
+            if event and event.type == "DELETED":
+                # The key has been deleted after existing before; terminate this watcher
+                # because this class instance is about to be reaped in Daemon.py
+                return False
+
+            try:
+                data = data.decode("ascii")
+            except AttributeError:
+                data = 0
+
+            if data != self.provisioned_domains_count:
+                self.provisioned_domains_count = data
--- a/health-daemon/pvchealthd/objects/init.py
+++ b/health-daemon/pvchealthd/objects/init.py
--- a/health-daemon/pvchealthd/util/init.py
+++ b/health-daemon/pvchealthd/util/init.py
--- a/health-daemon/pvchealthd/util/zookeeper.py
+++ b/health-daemon/pvchealthd/util/zookeeper.py
@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+
+# <Filename> - <Description>
+# zookeeper.py - Utility functions for pvcnoded Zookeeper connections
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+##############################################################################
+
+from daemon_lib.zkhandler import ZKHandler
+
+import os
+import time
+
+
+def connect(logger, config):
+    # Create an instance of the handler
+    zkhandler = ZKHandler(config, logger)
+
+    try:
+        logger.out(
+            "Connecting to Zookeeper on coordinator nodes {}".format(
+                config["coordinators"]
+            ),
+            state="i",
+        )
+        # Start connection
+        zkhandler.connect(persistent=True)
+    except Exception as e:
+        logger.out(
+            "ERROR: Failed to connect to Zookeeper cluster: {}".format(e), state="e"
+        )
+        os._exit(1)
+
+    logger.out("Validating Zookeeper schema", state="i")
+
+    try:
+        node_schema_version = int(
+            zkhandler.read(("node.data.active_schema", config["node_hostname"]))
+        )
+    except Exception:
+        node_schema_version = int(zkhandler.read("base.schema.version"))
+        zkhandler.write(
+            [
+                (
+                    ("node.data.active_schema", config["node_hostname"]),
+                    node_schema_version,
+                )
+            ]
+        )
+
+    # Load in the current node schema version
+    zkhandler.schema.load(node_schema_version)
+
+    # Record the latest intalled schema version
+    latest_schema_version = zkhandler.schema.find_latest()
+    logger.out("Latest installed schema is {}".format(latest_schema_version), state="i")
+    zkhandler.write(
+        [(("node.data.latest_schema", config["node_hostname"]), latest_schema_version)]
+    )
+
+    # If we are the last node to get a schema update, fire the master update
+    if latest_schema_version > node_schema_version:
+        node_latest_schema_version = list()
+        for node in zkhandler.children("base.node"):
+            node_latest_schema_version.append(
+                int(zkhandler.read(("node.data.latest_schema", node)))
+            )
+
+        # This is true if all elements of the latest schema version are identical to the latest version,
+        # i.e. they have all had the latest schema installed and ready to load.
+        if node_latest_schema_version.count(latest_schema_version) == len(
+            node_latest_schema_version
+        ):
+            zkhandler.write([("base.schema.version", latest_schema_version)])
+
+    return zkhandler, node_schema_version
+
+
+def validate_schema(logger, zkhandler):
+    # Validate our schema against the active version
+    if not zkhandler.schema.validate(zkhandler, logger):
+        logger.out("Found schema violations, applying", state="i")
+        zkhandler.schema.apply(zkhandler)
+    else:
+        logger.out("Schema successfully validated", state="o")
+
+
+def setup_node(logger, config, zkhandler):
+    # Check if our node exists in Zookeeper, and create it if not
+    if config["daemon_mode"] == "coordinator":
+        init_routerstate = "secondary"
+    else:
+        init_routerstate = "client"
+
+    if zkhandler.exists(("node", config["node_hostname"])):
+        logger.out(
+            f"Node is {logger.fmt_green}present{logger.fmt_end} in Zookeeper", state="i"
+        )
+        # Update static data just in case it's changed
+        zkhandler.write(
+            [
+                (("node", config["node_hostname"]), config["daemon_mode"]),
+                (("node.mode", config["node_hostname"]), config["daemon_mode"]),
+                (("node.state.daemon", config["node_hostname"]), "init"),
+                (("node.state.router", config["node_hostname"]), init_routerstate),
+                (
+                    ("node.data.static", config["node_hostname"]),
+                    " ".join(config["static_data"]),
+                ),
+                (
+                    ("node.data.pvc_version", config["node_hostname"]),
+                    config["daemon_version"],
+                ),
+                (
+                    ("node.ipmi.hostname", config["node_hostname"]),
+                    config["ipmi_hostname"],
+                ),
+                (
+                    ("node.ipmi.username", config["node_hostname"]),
+                    config["ipmi_username"],
+                ),
+                (
+                    ("node.ipmi.password", config["node_hostname"]),
+                    config["ipmi_password"],
+                ),
+            ]
+        )
+    else:
+        logger.out(
+            f"Node is {logger.fmt_red}absent{logger.fmt_end} in Zookeeper; adding new node",
+            state="i",
+        )
+        keepalive_time = int(time.time())
+        zkhandler.write(
+            [
+                (("node", config["node_hostname"]), config["daemon_mode"]),
+                (("node.keepalive", config["node_hostname"]), str(keepalive_time)),
+                (("node.mode", config["node_hostname"]), config["daemon_mode"]),
+                (("node.state.daemon", config["node_hostname"]), "init"),
+                (("node.state.domain", config["node_hostname"]), "flushed"),
+                (("node.state.router", config["node_hostname"]), init_routerstate),
+                (
+                    ("node.data.static", config["node_hostname"]),
+                    " ".join(config["static_data"]),
+                ),
+                (
+                    ("node.data.pvc_version", config["node_hostname"]),
+                    config["daemon_version"],
+                ),
+                (
+                    ("node.ipmi.hostname", config["node_hostname"]),
+                    config["ipmi_hostname"],
+                ),
+                (
+                    ("node.ipmi.username", config["node_hostname"]),
+                    config["ipmi_username"],
+                ),
+                (
+                    ("node.ipmi.password", config["node_hostname"]),
+                    config["ipmi_password"],
+                ),
+                (("node.memory.total", config["node_hostname"]), "0"),
+                (("node.memory.used", config["node_hostname"]), "0"),
+                (("node.memory.free", config["node_hostname"]), "0"),
+                (("node.memory.allocated", config["node_hostname"]), "0"),
+                (("node.memory.provisioned", config["node_hostname"]), "0"),
+                (("node.vcpu.allocated", config["node_hostname"]), "0"),
+                (("node.cpu.load", config["node_hostname"]), "0.0"),
+                (("node.running_domains", config["node_hostname"]), "0"),
+                (("node.count.provisioned_domains", config["node_hostname"]), "0"),
+                (("node.count.networks", config["node_hostname"]), "0"),
+            ]
+        )
--- a/node-daemon/pvcnoded.sample.yaml
+++ b/node-daemon/pvcnoded.sample.yaml
@ -1,216 +0,0 @@
---
-# pvcnoded configuration file example
-#
-# This configuration file specifies details for this node in PVC. Multiple node
-# blocks can be added but only the one matching the current system nodename will
-# be used by the local daemon. Default values are not supported; the values in
-# this sample configuration are considered defaults and, with adjustment of the
-# nodename section and coordinators list, can be used as-is on a Debian system.
-#
-# Copy this example to /etc/pvc/pvcnoded.conf and edit to your needs
-
-pvc:
-  # node: The (short) hostname of the node, set during provisioning
-  node: pvchv1
-  # debug: Enable or disable debug output
-  debug: False
-  # functions: The daemon functions to enable
-  functions:
-    # enable_hypervisor: Enable or disable hypervisor functionality
-    # This should never be False except in very advanced usecases
-    enable_hypervisor: True
-    # enable_networking: Enable or disable virtual networking and routing functionality
-    enable_networking: True
-    # enable_storage: Enable or disable Ceph storage management functionality
-    enable_storage: True
-    # enable_api: Enable or disable the API client, if installed, when node is Primary
-    enable_api: True
-  # cluster: Cluster-level configuration
-  cluster:
-    # coordinators: The list of cluster coordinator hostnames
-    coordinators:
-      - pvchv1
-      - pvchv2
-      - pvchv3
-    # networks: Cluster-level network configuration
-    # OPTIONAL if enable_networking: False
-    networks:
-      # upstream: Upstream routed network for in- and out-bound upstream networking
-      upstream:
-        # domain: Upstream domain name, may be None
-        domain: "mydomain.net"
-        # network: Upstream network block
-        network: "1.1.1.0/24"
-        # floating_ip: Upstream floating IP address for the primary coordinator
-        floating_ip: "1.1.1.10/24"
-        # gateway: Upstream static default gateway, if applicable
-        gateway: "1.1.1.1"
-      # cluster: Cluster internal network for node communication and client virtual networks
-      cluster:
-        # domain: Cluster internal domain name
-        domain: "pvc.local"
-        # network: Cluster internal network block
-        network: "10.255.0.0/24"
-        # floating_ip: Cluster internal floating IP address for the primary coordinator
-        floating_ip: "10.255.0.254/24"
-      # storage: Cluster internal network for storage traffic
-      storage:
-        # domain: Cluster storage domain name
-        domain: "pvc.storage"
-        # network: Cluster storage network block
-        network: "10.254.0.0/24"
-        # floating_ip: Cluster storage floating IP address for the primary coordinator
-        floating_ip: "10.254.0.254/24"
-  # coordinator: Coordinator-specific configuration
-  # OPTIONAL if enable_networking: False
-  coordinator:
-    # dns: DNS aggregator subsystem
-    dns:
-      # database: Patroni PostgreSQL database configuration
-      database:
-        # host: PostgreSQL hostname, invariably 'localhost'
-        host: localhost
-        # port: PostgreSQL port, invariably 'localhost'
-        port: 5432
-        # name: PostgreSQL database name, invariably 'pvcdns'
-        name: pvcdns
-        # user: PostgreSQL username, invariable 'pvcdns'
-        user: pvcdns
-        # pass: PostgreSQL user password, randomly generated
-        pass: pvcdns
-    # metadata: Metadata API subsystem
-    metadata:
-      # database: Patroni PostgreSQL database configuration
-      database:
-        # host: PostgreSQL hostname, invariably 'localhost'
-        host: localhost
-        # port: PostgreSQL port, invariably 'localhost'
-        port: 5432
-        # name: PostgreSQL database name, invariably 'pvcapi'
-        name: pvcapi
-        # user: PostgreSQL username, invariable 'pvcapi'
-        user: pvcapi
-        # pass: PostgreSQL user password, randomly generated
-        pass: pvcapi
-  # system: Local PVC instance configuration
-  system:
-    # intervals: Intervals for keepalives and fencing
-    intervals:
-      # vm_shutdown_timeout: Number of seconds for a VM to 'shutdown' before being forced off
-      vm_shutdown_timeout: 180
-      # keepalive_interval: Number of seconds between keepalive/status updates
-      keepalive_interval: 5
-      # monitoring_interval: Number of seconds between monitoring check updates
-      monitoring_interval: 60
-      # fence_intervals: Number of keepalive_intervals to declare a node dead and fence it
-      fence_intervals: 6
-      # suicide_intervals: Numer of keepalive_intervals before a node considers itself dead and self-fences, 0 to disable 
-      suicide_intervals: 0
-    # fencing: Node fencing configuration
-    fencing:
-      # actions: Actions to take after a fence trigger
-      actions:
-        # successful_fence: Action to take after successfully fencing a node, options: migrate, None
-        successful_fence: migrate
-        # failed_fence: Action to take after failing to fence a node, options: migrate, None
-        failed_fence: None
-      # ipmi: Local system IPMI options
-      ipmi:
-        # host: Hostname/IP of the local system's IPMI interface, must be reachable
-        host: pvchv1-lom
-        # user: Local system IPMI username
-        user: admin
-        # pass: Local system IPMI password
-        pass: Passw0rd
-    # migration: Migration option configuration
-    migration:
-      # target_selector: Criteria to select the ideal migration target, options: mem, memprov, load, vcpus, vms
-      target_selector: mem
-    # configuration: Local system configurations
-    configuration:
-      # directories: PVC system directories
-      directories:
-        # plugin_directory: Directory containing node monitoring plugins
-        plugin_directory: "/usr/share/pvc/plugins"
-        # dynamic_directory: Temporary in-memory directory for active configurations
-        dynamic_directory: "/run/pvc"
-        # log_directory: Logging directory
-        log_directory: "/var/log/pvc"
-        # console_log_directory: Libvirt console logging directory
-        console_log_directory: "/var/log/libvirt"
-      # logging: PVC logging configuration
-      logging:
-        # file_logging: Enable or disable logging to files under log_directory
-        file_logging: True
-        # stdout_logging: Enable or disable logging to stdout (i.e. journald)
-        stdout_logging: True
-        # zookeeper_logging: Enable ot disable logging to Zookeeper (for `pvc node log` functionality)
-        zookeeper_logging: True
-        # log_colours: Enable or disable ANSI colours in log output
-        log_colours: True
-        # log_dates: Enable or disable date strings in log output
-        log_dates: True
-        # log_keepalives: Enable or disable keepalive logging
-        log_keepalives: True
-        # log_keepalive_cluster_details: Enable or disable node status logging during keepalive
-        log_keepalive_cluster_details: True
-        # log_keepalive_plugin_details: Enable or disable node health plugin logging during keepalive
-        log_keepalive_plugin_details: True
-        # console_log_lines: Number of console log lines to store in Zookeeper per VM
-        console_log_lines: 1000
-        # node_log_lines: Number of node log lines to store in Zookeeper per node
-        node_log_lines: 2000
-      # networking: PVC networking configuration
-      # OPTIONAL if enable_networking: False
-      networking:
-        # bridge_device: Underlying device to use for bridged vLAN networks; usually the device of <cluster>
-        bridge_device: ens4
-        # bridge_mtu: The MTU of the underlying device used for bridged vLAN networks, and thus the maximum
-        #             MTU of the overlying bridge devices.
-        bridge_mtu: 1500
-        # sriov_enable: Enable or disable (default if absent) SR-IOV network support
-        sriov_enable: False
-        # sriov_device: Underlying device(s) to use for SR-IOV networks; can be bridge_device or other NIC(s)
-        sriov_device:
-            # The physical device name
-          - phy: ens1f1 
-            # The preferred MTU of the physical device; OPTIONAL - defaults to the interface default if unset
-            mtu: 9000
-            # The number of VFs to enable on this device
-            # NOTE: This defines the maximum number of VMs which can be provisioned on this physical device; VMs
-            #       are allocated to these VFs manually by the administrator and thus all nodes should have the
-            #       same number
-            # NOTE: This value cannot be changed at runtime on Intel(R) NICs; the node will need to be restarted
-            #       if this value changes
-            vfcount: 8
-        # upstream: Upstream physical interface device
-        upstream:
-          # device: Upstream interface device name
-          device: ens4
-          # mtu: Upstream interface MTU; use 9000 for jumbo frames (requires switch support)
-          mtu: 1500
-          # address: Upstream interface IP address, options: by-id, <static>/<mask>
-          address: by-id
-        # cluster: Cluster (VNIC) physical interface device
-        cluster:
-          # device: Cluster (VNIC) interface device name
-          device: ens4
-          # mtu: Cluster (VNIC) interface MTU; use 9000 for jumbo frames (requires switch support)
-          mtu: 1500
-          # address: Cluster (VNIC) interface IP address, options: by-id, <static>/<mask>
-          address: by-id
-        # storage: Storage (Ceph OSD) physical interface device
-        storage:
-          # device: Storage (Ceph OSD) interface device name
-          device: ens4
-          # mtu: Storage (Ceph OSD) interface MTU; use 9000 for jumbo frames (requires switch support)
-          mtu: 1500
-          # address: Storage (Ceph OSD) interface IP address, options: by-id, <static>/<mask>
-          address: by-id
-      # storage; PVC storage configuration
-      # OPTIONAL if enable_storage: False
-      storage:
-        # ceph_config_file: The config file containing the Ceph cluster configuration
-        ceph_config_file: "/etc/ceph/ceph.conf"
-        # ceph_admin_keyring: The file containing the Ceph client admin keyring
-        ceph_admin_keyring: "/etc/ceph/ceph.client.admin.keyring"
--- a/node-daemon/pvcnoded.service
+++ b/node-daemon/pvcnoded.service
@ -10,7 +10,7 @@ PartOf = pvc.target
 Type = simple
 WorkingDirectory = /usr/share/pvc
 Environment = PYTHONUNBUFFERED=true
-Environment = PVCD_CONFIG_FILE=/etc/pvc/pvcnoded.yaml
+Environment = PVC_CONFIG_FILE=/etc/pvc/pvc.conf
 ExecStartPre = /bin/sleep 2
 ExecStart = /usr/share/pvc/pvcnoded.py
 ExecStopPost = /bin/sleep 2
--- a/node-daemon/pvcnoded/Daemon.py
+++ b/node-daemon/pvcnoded/Daemon.py
@ -20,14 +20,12 @@
 ###############################################################################

 import pvcnoded.util.keepalive
-import pvcnoded.util.config
 import pvcnoded.util.fencing
 import pvcnoded.util.networking
 import pvcnoded.util.services
 import pvcnoded.util.libvirt
 import pvcnoded.util.zookeeper

-import pvcnoded.objects.MonitoringInstance as MonitoringInstance
 import pvcnoded.objects.DNSAggregatorInstance as DNSAggregatorInstance
 import pvcnoded.objects.MetadataAPIInstance as MetadataAPIInstance
 import pvcnoded.objects.VMInstance as VMInstance
@ -36,6 +34,7 @@ import pvcnoded.objects.VXNetworkInstance as VXNetworkInstance
 import pvcnoded.objects.SRIOVVFInstance as SRIOVVFInstance
 import pvcnoded.objects.CephInstance as CephInstance

+import daemon_lib.config as cfg
 import daemon_lib.log as log
 import daemon_lib.common as common

@ -49,7 +48,7 @@ import re
 import json

 # Daemon version
-version = "0.9.81"
+version = "0.9.83"


 ##########################################################
@ -59,45 +58,40 @@ version = "0.9.81"

 def entrypoint():
    keepalive_timer = None
-    monitoring_instance = None

    # Get our configuration
-    config = pvcnoded.util.config.get_configuration()
-    config["pvcnoded_version"] = version
-
-    # Set some useful booleans for later (fewer characters)
-    debug = config["debug"]
-    if debug:
-        print("DEBUG MODE ENABLED")
+    config = cfg.get_configuration()
+    config["daemon_name"] = "pvcnoded"
+    config["daemon_version"] = version

    # Create and validate our directories
-    pvcnoded.util.config.validate_directories(config)
+    cfg.validate_directories(config)

    # Set up the logger instance
    logger = log.Logger(config)

    # Print our startup message
    logger.out("")
-    logger.out("|----------------------------------------------------------|")
-    logger.out("|                                                          |")
-    logger.out("|           ███████████ ▜█▙      ▟█▛ █████ █ █ █           |")
-    logger.out("|                    ██  ▜█▙    ▟█▛  ██                    |")
-    logger.out("|           ███████████   ▜█▙  ▟█▛   ██                    |")
-    logger.out("|           ██             ▜█▙▟█▛    ███████████           |")
-    logger.out("|                                                          |")
-    logger.out("|----------------------------------------------------------|")
-    logger.out("| Parallel Virtual Cluster node daemon v{0: <18} |".format(version))
-    logger.out("| Debug: {0: <49} |".format(str(config["debug"])))
-    logger.out("| FQDN: {0: <50} |".format(config["node_fqdn"]))
-    logger.out("| Host: {0: <50} |".format(config["node_hostname"]))
-    logger.out("| ID: {0: <52} |".format(config["node_id"]))
-    logger.out("| IPMI hostname: {0: <41} |".format(config["ipmi_hostname"]))
-    logger.out("| Machine details:                                         |")
-    logger.out("|   CPUs: {0: <48} |".format(config["static_data"][0]))
-    logger.out("|   Arch: {0: <48} |".format(config["static_data"][3]))
-    logger.out("|   OS: {0: <50} |".format(config["static_data"][2]))
-    logger.out("|   Kernel: {0: <46} |".format(config["static_data"][1]))
-    logger.out("|----------------------------------------------------------|")
+    logger.out("|--------------------------------------------------------------|")
+    logger.out("|                                                              |")
+    logger.out("|             ███████████ ▜█▙      ▟█▛ █████ █ █ █             |")
+    logger.out("|                      ██  ▜█▙    ▟█▛  ██                      |")
+    logger.out("|             ███████████   ▜█▙  ▟█▛   ██                      |")
+    logger.out("|             ██             ▜█▙▟█▛    ███████████             |")
+    logger.out("|                                                              |")
+    logger.out("|--------------------------------------------------------------|")
+    logger.out("| Parallel Virtual Cluster node daemon v{0: <22} |".format(version))
+    logger.out("| Debug: {0: <53} |".format(str(config["debug"])))
+    logger.out("| FQDN: {0: <54} |".format(config["node_fqdn"]))
+    logger.out("| Host: {0: <54} |".format(config["node_hostname"]))
+    logger.out("| ID: {0: <56} |".format(config["node_id"]))
+    logger.out("| IPMI hostname: {0: <45} |".format(config["ipmi_hostname"]))
+    logger.out("| Machine details:                                             |")
+    logger.out("|   CPUs: {0: <52} |".format(config["static_data"][0]))
+    logger.out("|   Arch: {0: <52} |".format(config["static_data"][3]))
+    logger.out("|   OS: {0: <54} |".format(config["static_data"][2]))
+    logger.out("|   Kernel: {0: <50} |".format(config["static_data"][1]))
+    logger.out("|--------------------------------------------------------------|")
    logger.out("")
    logger.out(f'Starting pvcnoded on host {config["node_fqdn"]}', state="s")

@ -206,7 +200,7 @@ def entrypoint():

    # Define a cleanup function
    def cleanup(failure=False):
-        nonlocal logger, zkhandler, keepalive_timer, d_domain, monitoring_instance
+        nonlocal logger, zkhandler, keepalive_timer, d_domain

        logger.out("Terminating pvcnoded and cleaning up", state="s")

@ -255,13 +249,6 @@ def entrypoint():
        except Exception:
            pass

-        # Shut down the monitoring system
-        try:
-            logger.out("Shutting down monitoring subsystem", state="s")
-            monitoring_instance.shutdown()
-        except Exception:
-            pass
-
        # Set stop state in Zookeeper
        zkhandler.write([(("node.state.daemon", config["node_hostname"]), "stop")])

@ -399,8 +386,8 @@ def entrypoint():

        node_list = new_node_list
        logger.out(
-            f'{logger.fmt_blue}Node list:{logger.fmt_end} {" ".join(node_list)}',
-            state="i",
+            f'{logger.fmt_cyan}Node list:{logger.fmt_end} {" ".join(node_list)}',
+            state="s",
        )

        # Update node objects lists
@ -567,8 +554,8 @@ def entrypoint():
            # Update the new list
            network_list = new_network_list
            logger.out(
-                f'{logger.fmt_blue}Network list:{logger.fmt_end} {" ".join(network_list)}',
-                state="i",
+                f'{logger.fmt_cyan}Network list:{logger.fmt_end} {" ".join(network_list)}',
+                state="s",
            )

            # Update node objects list
@ -898,8 +885,8 @@ def entrypoint():

                sriov_vf_list = sorted(new_sriov_vf_list)
                logger.out(
-                    f'{logger.fmt_blue}SR-IOV VF list:{logger.fmt_end} {" ".join(sriov_vf_list)}',
-                    state="i",
+                    f'{logger.fmt_cyan}SR-IOV VF list:{logger.fmt_end} {" ".join(sriov_vf_list)}',
+                    state="s",
                )

    if config["enable_hypervisor"]:
@ -925,8 +912,8 @@ def entrypoint():
            # Update the new list
            domain_list = new_domain_list
            logger.out(
-                f'{logger.fmt_blue}Domain list:{logger.fmt_end} {" ".join(domain_list)}',
-                state="i",
+                f'{logger.fmt_cyan}Domain list:{logger.fmt_end} {" ".join(domain_list)}',
+                state="s",
            )

            # Update node objects' list
@ -952,8 +939,8 @@ def entrypoint():
            # Update the new list
            osd_list = new_osd_list
            logger.out(
-                f'{logger.fmt_blue}OSD list:{logger.fmt_end} {" ".join(osd_list)}',
-                state="i",
+                f'{logger.fmt_cyan}OSD list:{logger.fmt_end} {" ".join(osd_list)}',
+                state="s",
            )

        # Pool objects
@ -977,8 +964,8 @@ def entrypoint():
            # Update the new list
            pool_list = new_pool_list
            logger.out(
-                f'{logger.fmt_blue}Pool list:{logger.fmt_end} {" ".join(pool_list)}',
-                state="i",
+                f'{logger.fmt_cyan}Pool list:{logger.fmt_end} {" ".join(pool_list)}',
+                state="s",
            )

            # Volume objects (in each pool)
@ -1009,15 +996,10 @@ def entrypoint():
                    # Update the new list
                    volume_list[pool] = new_volume_list
                    logger.out(
-                        f'{logger.fmt_blue}Volume list [{pool}]:{logger.fmt_end} {" ".join(volume_list[pool])}',
-                        state="i",
+                        f'{logger.fmt_cyan}Volume list [{pool}]:{logger.fmt_end} {" ".join(volume_list[pool])}',
+                        state="s",
                    )

-    # Set up the node monitoring instance and thread
-    monitoring_instance = MonitoringInstance.MonitoringInstance(
-        zkhandler, config, logger, this_node
-    )
-
    # Start keepalived thread
    keepalive_timer = pvcnoded.util.keepalive.start_keepalive_timer(
        logger, config, zkhandler, this_node
--- a/node-daemon/pvcnoded/dnsmasq-zookeeper-leases.py
+++ b/node-daemon/pvcnoded/dnsmasq-zookeeper-leases.py
@ -70,12 +70,12 @@ def get_client_id():
 def connect_zookeeper():
    # We expect the environ to contain the config file
    try:
-        pvcnoded_config_file = os.environ["PVCD_CONFIG_FILE"]
+        pvc_config_file = os.environ["PVC_CONFIG_FILE"]
    except Exception:
        # Default place
-        pvcnoded_config_file = "/etc/pvc/pvcnoded.yaml"
+        pvc_config_file = "/etc/pvc/pvc.conf"

-    with open(pvcnoded_config_file, "r") as cfgfile:
+    with open(pvc_config_file, "r") as cfgfile:
        try:
            o_config = yaml.load(cfgfile, yaml.SafeLoader)
        except Exception as e:
@ -87,7 +87,7 @@ def connect_zookeeper():

    try:
        zk_conn = kazoo.client.KazooClient(
-            hosts=o_config["pvc"]["cluster"]["coordinators"]
+            hosts=o_config["cluster"]["coordinator_nodes"]
        )
        zk_conn.start()
    except Exception as e:
--- a/node-daemon/pvcnoded/objects/MetadataAPIInstance.py
+++ b/node-daemon/pvcnoded/objects/MetadataAPIInstance.py
@ -131,11 +131,11 @@ class MetadataAPIInstance(object):
    # Helper functions
    def open_database(self):
        conn = psycopg2.connect(
-            host=self.config["metadata_postgresql_host"],
-            port=self.config["metadata_postgresql_port"],
-            dbname=self.config["metadata_postgresql_dbname"],
-            user=self.config["metadata_postgresql_user"],
-            password=self.config["metadata_postgresql_password"],
+            host=self.config["api_postgresql_host"],
+            port=self.config["api_postgresql_port"],
+            dbname=self.config["api_postgresql_dbname"],
+            user=self.config["api_postgresql_user"],
+            password=self.config["api_postgresql_password"],
        )
        cur = conn.cursor(cursor_factory=RealDictCursor)
        return conn, cur
--- a/node-daemon/pvcnoded/objects/VXNetworkInstance.py
+++ b/node-daemon/pvcnoded/objects/VXNetworkInstance.py
@ -743,10 +743,27 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
            )

            # Recreate the environment we need for dnsmasq
-            pvcnoded_config_file = os.environ["PVCD_CONFIG_FILE"]
+            pvc_config_file = None
+            try:
+                _config_file = "/etc/pvc/pvcnoded.yaml"
+                if not os.path.exists(_config_file):
+                    raise
+                pvc_config_file = _config_file
+            except Exception:
+                pass
+            try:
+                _config_file = os.environ["PVC_CONFIG_FILE"]
+                if not os.path.exists(_config_file):
+                    raise
+                pvc_config_file = _config_file
+            except Exception:
+                pass
+            if pvc_config_file is None:
+                raise Exception
+
            dhcp_environment = {
                "DNSMASQ_BRIDGE_INTERFACE": self.bridge_nic,
-                "PVCD_CONFIG_FILE": pvcnoded_config_file,
+                "PVC_CONFIG_FILE": pvc_config_file,
            }

            # Define the dnsmasq config fragments
--- a/node-daemon/pvcnoded/util/config.py
+++ b/node-daemon/pvcnoded/util/config.py
@ -1,431 +0,0 @@
-#!/usr/bin/env python3
-
-# config.py - Utility functions for pvcnoded configuration parsing
-# Part of the Parallel Virtual Cluster (PVC) system
-#
-#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
-#
-#    This program is free software: you can redistribute it and/or modify
-#    it under the terms of the GNU General Public License as published by
-#    the Free Software Foundation, version 3.
-#
-#    This program is distributed in the hope that it will be useful,
-#    but WITHOUT ANY WARRANTY; without even the implied warranty of
-#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-#    GNU General Public License for more details.
-#
-#    You should have received a copy of the GNU General Public License
-#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
-#
-###############################################################################
-
-import daemon_lib.common as common
-
-import os
-import subprocess
-import yaml
-
-from socket import gethostname
-from re import findall
-from psutil import cpu_count
-from ipaddress import ip_address, ip_network
-from json import loads
-
-
-class MalformedConfigurationError(Exception):
-    """
-    An except when parsing the PVC Node daemon configuration file
-    """
-
-    def __init__(self, error=None):
-        self.msg = f"ERROR: Configuration file is malformed: {error}"
-
-    def __str__(self):
-        return str(self.msg)
-
-
-def get_static_data():
-    """
-    Data that is obtained once at node startup for use later
-    """
-    staticdata = list()
-    staticdata.append(str(cpu_count()))  # CPU count
-    staticdata.append(
-        subprocess.run(["uname", "-r"], stdout=subprocess.PIPE)
-        .stdout.decode("ascii")
-        .strip()
-    )
-    staticdata.append(
-        subprocess.run(["uname", "-o"], stdout=subprocess.PIPE)
-        .stdout.decode("ascii")
-        .strip()
-    )
-    staticdata.append(
-        subprocess.run(["uname", "-m"], stdout=subprocess.PIPE)
-        .stdout.decode("ascii")
-        .strip()
-    )
-
-    return staticdata
-
-
-def get_configuration_path():
-    try:
-        return os.environ["PVCD_CONFIG_FILE"]
-    except KeyError:
-        print('ERROR: The "PVCD_CONFIG_FILE" environment variable must be set.')
-        os._exit(1)
-
-
-def get_hostname():
-    node_fqdn = gethostname()
-    node_hostname = node_fqdn.split(".", 1)[0]
-    node_domain = "".join(node_fqdn.split(".", 1)[1:])
-    try:
-        node_id = findall(r"\d+", node_hostname)[-1]
-    except IndexError:
-        node_id = 0
-
-    return node_fqdn, node_hostname, node_domain, node_id
-
-
-def validate_floating_ip(config, network):
-    if network not in ["cluster", "storage", "upstream"]:
-        return False, f'Specified network type "{network}" is not valid'
-
-    floating_key = f"{network}_floating_ip"
-    network_key = f"{network}_network"
-
-    # Verify the network provided is valid
-    try:
-        network = ip_network(config[network_key])
-    except Exception:
-        return (
-            False,
-            f"Network address {config[network_key]} for {network_key} is not valid",
-        )
-
-    # Verify that the floating IP is valid (and in the network)
-    try:
-        floating_address = ip_address(config[floating_key].split("/")[0])
-        if floating_address not in list(network.hosts()):
-            raise
-    except Exception:
-        return (
-            False,
-            f"Floating address {config[floating_key]} for {floating_key} is not valid",
-        )
-
-    return True, ""
-
-
-def get_configuration():
-    """
-    Parse the configuration of the node daemon.
-    """
-    pvcnoded_config_file = get_configuration_path()
-
-    print('Loading configuration from file "{}"'.format(pvcnoded_config_file))
-
-    with open(pvcnoded_config_file, "r") as cfgfile:
-        try:
-            o_config = yaml.load(cfgfile, Loader=yaml.SafeLoader)
-        except Exception as e:
-            print("ERROR: Failed to parse configuration file: {}".format(e))
-            os._exit(1)
-
-    node_fqdn, node_hostname, node_domain, node_id = get_hostname()
-
-    # Create the configuration dictionary
-    config = dict()
-
-    # Get the initial base configuration
-    try:
-        o_base = o_config["pvc"]
-        o_cluster = o_config["pvc"]["cluster"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_general = {
-        "node": o_base.get("node", node_hostname),
-        "node_hostname": node_hostname,
-        "node_fqdn": node_fqdn,
-        "node_domain": node_domain,
-        "node_id": node_id,
-        "coordinators": o_cluster.get("coordinators", list()),
-        "debug": o_base.get("debug", False),
-    }
-
-    config = {**config, **config_general}
-
-    # Get the functions configuration
-    try:
-        o_functions = o_config["pvc"]["functions"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_functions = {
-        "enable_hypervisor": o_functions.get("enable_hypervisor", False),
-        "enable_networking": o_functions.get("enable_networking", False),
-        "enable_storage": o_functions.get("enable_storage", False),
-        "enable_api": o_functions.get("enable_api", False),
-    }
-
-    config = {**config, **config_functions}
-
-    # Get the directory configuration
-    try:
-        o_directories = o_config["pvc"]["system"]["configuration"]["directories"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_directories = {
-        "plugin_directory": o_directories.get(
-            "plugin_directory", "/usr/share/pvc/plugins"
-        ),
-        "dynamic_directory": o_directories.get("dynamic_directory", None),
-        "log_directory": o_directories.get("log_directory", None),
-        "console_log_directory": o_directories.get("console_log_directory", None),
-    }
-
-    # Define our dynamic directory schema
-    config_directories["dnsmasq_dynamic_directory"] = (
-        config_directories["dynamic_directory"] + "/dnsmasq"
-    )
-    config_directories["pdns_dynamic_directory"] = (
-        config_directories["dynamic_directory"] + "/pdns"
-    )
-    config_directories["nft_dynamic_directory"] = (
-        config_directories["dynamic_directory"] + "/nft"
-    )
-
-    # Define our log directory schema
-    config_directories["dnsmasq_log_directory"] = (
-        config_directories["log_directory"] + "/dnsmasq"
-    )
-    config_directories["pdns_log_directory"] = (
-        config_directories["log_directory"] + "/pdns"
-    )
-    config_directories["nft_log_directory"] = (
-        config_directories["log_directory"] + "/nft"
-    )
-
-    config = {**config, **config_directories}
-
-    # Get the logging configuration
-    try:
-        o_logging = o_config["pvc"]["system"]["configuration"]["logging"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_logging = {
-        "file_logging": o_logging.get("file_logging", False),
-        "stdout_logging": o_logging.get("stdout_logging", False),
-        "zookeeper_logging": o_logging.get("zookeeper_logging", False),
-        "log_colours": o_logging.get("log_colours", False),
-        "log_dates": o_logging.get("log_dates", False),
-        "log_keepalives": o_logging.get("log_keepalives", False),
-        "log_keepalive_cluster_details": o_logging.get(
-            "log_keepalive_cluster_details", False
-        ),
-        "log_keepalive_plugin_details": o_logging.get(
-            "log_keepalive_plugin_details", False
-        ),
-        "console_log_lines": o_logging.get("console_log_lines", False),
-        "node_log_lines": o_logging.get("node_log_lines", False),
-    }
-
-    config = {**config, **config_logging}
-
-    # Get the interval configuration
-    try:
-        o_intervals = o_config["pvc"]["system"]["intervals"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_intervals = {
-        "vm_shutdown_timeout": int(o_intervals.get("vm_shutdown_timeout", 60)),
-        "keepalive_interval": int(o_intervals.get("keepalive_interval", 5)),
-        "monitoring_interval": int(o_intervals.get("monitoring_interval", 60)),
-        "fence_intervals": int(o_intervals.get("fence_intervals", 6)),
-        "suicide_intervals": int(o_intervals.get("suicide_interval", 0)),
-    }
-
-    config = {**config, **config_intervals}
-
-    # Get the fencing configuration
-    try:
-        o_fencing = o_config["pvc"]["system"]["fencing"]
-        o_fencing_actions = o_fencing["actions"]
-        o_fencing_ipmi = o_fencing["ipmi"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_fencing = {
-        "successful_fence": o_fencing_actions.get("successful_fence", None),
-        "failed_fence": o_fencing_actions.get("failed_fence", None),
-        "ipmi_hostname": o_fencing_ipmi.get(
-            "host", f"{node_hostname}-lom.{node_domain}"
-        ),
-        "ipmi_username": o_fencing_ipmi.get("user", "null"),
-        "ipmi_password": o_fencing_ipmi.get("pass", "null"),
-    }
-
-    config = {**config, **config_fencing}
-
-    # Get the migration configuration
-    try:
-        o_migration = o_config["pvc"]["system"]["migration"]
-    except Exception as e:
-        raise MalformedConfigurationError(e)
-
-    config_migration = {
-        "migration_target_selector": o_migration.get("target_selector", "mem"),
-    }
-
-    config = {**config, **config_migration}
-
-    if config["enable_networking"]:
-        # Get the node networks configuration
-        try:
-            o_networks = o_config["pvc"]["cluster"]["networks"]
-            o_network_cluster = o_networks["cluster"]
-            o_network_storage = o_networks["storage"]
-            o_network_upstream = o_networks["upstream"]
-            o_sysnetworks = o_config["pvc"]["system"]["configuration"]["networking"]
-            o_sysnetwork_cluster = o_sysnetworks["cluster"]
-            o_sysnetwork_storage = o_sysnetworks["storage"]
-            o_sysnetwork_upstream = o_sysnetworks["upstream"]
-        except Exception as e:
-            raise MalformedConfigurationError(e)
-
-        config_networks = {
-            "cluster_domain": o_network_cluster.get("domain", None),
-            "cluster_network": o_network_cluster.get("network", None),
-            "cluster_floating_ip": o_network_cluster.get("floating_ip", None),
-            "cluster_dev": o_sysnetwork_cluster.get("device", None),
-            "cluster_mtu": o_sysnetwork_cluster.get("mtu", None),
-            "cluster_dev_ip": o_sysnetwork_cluster.get("address", None),
-            "storage_domain": o_network_storage.get("domain", None),
-            "storage_network": o_network_storage.get("network", None),
-            "storage_floating_ip": o_network_storage.get("floating_ip", None),
-            "storage_dev": o_sysnetwork_storage.get("device", None),
-            "storage_mtu": o_sysnetwork_storage.get("mtu", None),
-            "storage_dev_ip": o_sysnetwork_storage.get("address", None),
-            "upstream_domain": o_network_upstream.get("domain", None),
-            "upstream_network": o_network_upstream.get("network", None),
-            "upstream_floating_ip": o_network_upstream.get("floating_ip", None),
-            "upstream_gateway": o_network_upstream.get("gateway", None),
-            "upstream_dev": o_sysnetwork_upstream.get("device", None),
-            "upstream_mtu": o_sysnetwork_upstream.get("mtu", None),
-            "upstream_dev_ip": o_sysnetwork_upstream.get("address", None),
-            "bridge_dev": o_sysnetworks.get("bridge_device", None),
-            "bridge_mtu": o_sysnetworks.get("bridge_mtu", None),
-            "enable_sriov": o_sysnetworks.get("sriov_enable", False),
-            "sriov_device": o_sysnetworks.get("sriov_device", list()),
-        }
-
-        if config_networks["bridge_mtu"] is None:
-            # Read the current MTU of bridge_dev and set bridge_mtu to it; avoids weird resets
-            retcode, stdout, stderr = common.run_os_command(
-                f"ip -json link show dev {config_networks['bridge_dev']}"
-            )
-            current_bridge_mtu = loads(stdout)[0]["mtu"]
-            print(
-                f"Config key bridge_mtu not explicitly set; using live MTU {current_bridge_mtu} from {config_networks['bridge_dev']}"
-            )
-            config_networks["bridge_mtu"] = current_bridge_mtu
-
-        config = {**config, **config_networks}
-
-        for network_type in ["cluster", "storage", "upstream"]:
-            result, msg = validate_floating_ip(config, network_type)
-            if not result:
-                raise MalformedConfigurationError(msg)
-
-            address_key = "{}_dev_ip".format(network_type)
-            network_key = f"{network_type}_network"
-            network = ip_network(config[network_key])
-            # With autoselection of addresses, construct an IP from the relevant network
-            if config[address_key] == "by-id":
-                # The NodeID starts at 1, but indexes start at 0
-                address_id = int(config["node_id"]) - 1
-                # Grab the nth address from the network
-                config[address_key] = "{}/{}".format(
-                    list(network.hosts())[address_id], network.prefixlen
-                )
-            # Validate the provided IP instead
-            else:
-                try:
-                    address = ip_address(config[address_key].split("/")[0])
-                    if address not in list(network.hosts()):
-                        raise
-                except Exception:
-                    raise MalformedConfigurationError(
-                        f"IP address {config[address_key]} for {address_key} is not valid"
-                    )
-
-        # Get the PowerDNS aggregator database configuration
-        try:
-            o_pdnsdb = o_config["pvc"]["coordinator"]["dns"]["database"]
-        except Exception as e:
-            raise MalformedConfigurationError(e)
-
-        config_pdnsdb = {
-            "pdns_postgresql_host": o_pdnsdb.get("host", None),
-            "pdns_postgresql_port": o_pdnsdb.get("port", None),
-            "pdns_postgresql_dbname": o_pdnsdb.get("name", None),
-            "pdns_postgresql_user": o_pdnsdb.get("user", None),
-            "pdns_postgresql_password": o_pdnsdb.get("pass", None),
-        }
-
-        config = {**config, **config_pdnsdb}
-
-        # Get the Cloud-Init Metadata database configuration
-        try:
-            o_metadatadb = o_config["pvc"]["coordinator"]["metadata"]["database"]
-        except Exception as e:
-            raise MalformedConfigurationError(e)
-
-        config_metadatadb = {
-            "metadata_postgresql_host": o_metadatadb.get("host", None),
-            "metadata_postgresql_port": o_metadatadb.get("port", None),
-            "metadata_postgresql_dbname": o_metadatadb.get("name", None),
-            "metadata_postgresql_user": o_metadatadb.get("user", None),
-            "metadata_postgresql_password": o_metadatadb.get("pass", None),
-        }
-
-        config = {**config, **config_metadatadb}
-
-    if config["enable_storage"]:
-        # Get the storage configuration
-        try:
-            o_storage = o_config["pvc"]["system"]["configuration"]["storage"]
-        except Exception as e:
-            raise MalformedConfigurationError(e)
-
-        config_storage = {
-            "ceph_config_file": o_storage.get("ceph_config_file", None),
-            "ceph_admin_keyring": o_storage.get("ceph_admin_keyring", None),
-        }
-
-        config = {**config, **config_storage}
-
-        # Add our node static data to the config
-        config["static_data"] = get_static_data()
-
-    return config
-
-
-def validate_directories(config):
-    if not os.path.exists(config["dynamic_directory"]):
-        os.makedirs(config["dynamic_directory"])
-        os.makedirs(config["dnsmasq_dynamic_directory"])
-        os.makedirs(config["pdns_dynamic_directory"])
-        os.makedirs(config["nft_dynamic_directory"])
-
-    if not os.path.exists(config["log_directory"]):
-        os.makedirs(config["log_directory"])
-        os.makedirs(config["dnsmasq_log_directory"])
-        os.makedirs(config["pdns_log_directory"])
-        os.makedirs(config["nft_log_directory"])
--- a/node-daemon/pvcnoded/util/keepalive.py
+++ b/node-daemon/pvcnoded/util/keepalive.py
@ -700,6 +700,10 @@ def node_keepalive(logger, config, zkhandler, this_node):
        active_coordinator_state = this_node.coordinator_state

        runtime_start = datetime.now()
+        logger.out(
+            "Starting node keepalive run",
+            state="t",
+        )

    # Set the migration selector in Zookeeper for clients to read
    if config["enable_hypervisor"]:
--- a/node-daemon/pvcnoded/util/services.py
+++ b/node-daemon/pvcnoded/util/services.py
@ -70,19 +70,27 @@ def start_ceph_mgr(logger, config):


 def start_keydb(logger, config):
-    if config["enable_api"] and config["daemon_mode"] == "coordinator":
+    if (config["enable_api"] or config["enable_worker"]) and config[
+        "daemon_mode"
+    ] == "coordinator":
        logger.out("Starting KeyDB daemon", state="i")
        # TODO: Move our handling out of Systemd and integrate it directly as a subprocess?
        common.run_os_command("systemctl start keydb-server.service")


-def start_api_worker(logger, config):
-    if config["enable_api"]:
+def start_workerd(logger, config):
+    if config["enable_worker"]:
        logger.out("Starting Celery Worker daemon", state="i")
        # TODO: Move our handling out of Systemd and integrate it directly as a subprocess?
        common.run_os_command("systemctl start pvcworkerd.service")


+def start_healthd(logger, config):
+    logger.out("Starting Health Monitoring daemon", state="i")
+    # TODO: Move our handling out of Systemd and integrate it directly as a subprocess?
+    common.run_os_command("systemctl start pvchealthd.service")
+
+
 def start_system_services(logger, config):
    start_zookeeper(logger, config)
    start_libvirtd(logger, config)
@ -91,7 +99,8 @@ def start_system_services(logger, config):
    start_ceph_mon(logger, config)
    start_ceph_mgr(logger, config)
    start_keydb(logger, config)
-    start_api_worker(logger, config)
+    start_workerd(logger, config)
+    start_healthd(logger, config)

-    logger.out("Waiting 10 seconds for daemons to start", state="s")
-    sleep(10)
+    logger.out("Waiting 5 seconds for daemons to start", state="s")
+    sleep(5)
--- a/node-daemon/pvcnoded/util/zookeeper.py
+++ b/node-daemon/pvcnoded/util/zookeeper.py
@ -123,7 +123,7 @@ def setup_node(logger, config, zkhandler):
                ),
                (
                    ("node.data.pvc_version", config["node_hostname"]),
-                    config["pvcnoded_version"],
+                    config["daemon_version"],
                ),
                (
                    ("node.ipmi.hostname", config["node_hostname"]),
@ -159,7 +159,7 @@ def setup_node(logger, config, zkhandler):
                ),
                (
                    ("node.data.pvc_version", config["node_hostname"]),
-                    config["pvcnoded_version"],
+                    config["daemon_version"],
                ),
                (
                    ("node.ipmi.hostname", config["node_hostname"]),
--- a/pvc.sample.conf
+++ b/pvc.sample.conf
@ -0,0 +1,452 @@
+---
+# PVC system configuration - example file
+#
+# This configuration file defines the details of a PVC cluster.
+# It is used by several daemons on the system, including pvcnoded, pvcapid, pvcworkerd, and pvchealthd.
+#
+# This file will normally be written by the PVC Ansible framework; this example is provided for reference
+
+# Paths configuration
+path:
+
+  # Plugin directory
+  plugin_directory: "/usr/share/pvc/plugins"
+
+  # Dynamic directory
+  dynamic_directory: "/run/pvc"
+
+  # System log directory
+  system_log_directory: "/var/log/pvc"
+
+  # VM Console log directory (set by Libvirt)
+  console_log_directory: "/var/log/libvirt"
+
+  # Ceph configuration directory (set by Ceph/Ansible)
+  ceph_directory: "/etc/ceph"
+
+# Subsystem configuration
+# Changing these values can be used to turn on or off various parts of PVC
+# Normally, all should be enabled ("yes") except in very custom clusters
+subsystem:
+
+    # Enable or disable hypervisor functionality
+    enable_hypervisor: yes
+
+    # Enable or disable virtual networking and routing functionality
+    enable_networking: yes
+
+    # Enable or disable Ceph storage management functionality
+    enable_storage: yes
+
+    # Enable or disable the worker client
+    enable_worker: yes
+
+    # Enable or disable the API client, if installed, when node is Primary
+    enable_api: yes
+
+# Cluster configuration
+cluster:
+
+  # The name of the cluster
+  name: pvc1
+
+  # The full list of nodes in this cluster
+  all_nodes:
+    - pvchv1
+    - pvchv2
+    - pvchv3
+
+  # The list of coorrdinator nodes in this cluster (subset of nodes)
+  coordinator_nodes:
+    - pvchv1
+    - pvchv2
+    - pvchv3
+
+  # Hardcoded networks (upstream/cluster/storage)
+  networks:
+
+    # Upstream network, used for inbound and outbound connectivity, API management, etc.
+    upstream:
+
+      # Domain name
+      domain: "mydomain.net"
+
+      # Device
+      device: ens4
+
+      # MTU
+      mtu: 1500
+
+      # IPv4 configuration
+      ipv4:
+
+        # CIDR netmask
+        netmask: 24
+
+        # Network address
+        network_address: 10.0.0.0
+
+        # Floating address
+        floating_address: 10.0.0.250
+
+        # Upstream/default gateway address
+        gateway_address: 10.0.0.254
+
+      # Node IP selection mechanism (either "by-id", or a static IP, no netmask, in the above network)
+      node_ip_selection: by-id
+
+    # Cluster network, used for inter-node communication (VM- and Network-layer), unrouted
+    cluster:
+
+      # Domain name
+      domain: "pvc.local"
+
+      # Device
+      device: ens4
+
+      # MTU
+      mtu: 1500
+
+      # IPv4 configuration
+      ipv4:
+      
+        # CIDR netmask
+        netmask: 24
+
+        # Network address
+        network_address: 10.0.1.0
+
+        # Floating address
+        floating_address: 10.0.1.250
+
+      # Node IP selection mechanism (either "by-id", or a static IP, no netmask, in the above network)
+      node_ip_selection: by-id
+
+    # Storage network, used for inter-node communication (Storage-layer), unrouted
+    storage:
+
+      # Domain name
+      domain: "storage.local"
+
+      # Device
+      device: ens4
+
+      # MTU
+      mtu: 1500
+
+      # IPv4 configuration
+      ipv4:
+
+        # CIDR netmask
+        netmask: 24
+
+        # Network address
+        network_address: 10.0.2.0
+
+        # Floating address
+        floating_address: 10.0.2.250
+
+      # Node IP selection mechanism (either "by-id", or a static IP, no netmask, in the above network)
+      node_ip_selection: by-id
+
+# Database configuration
+database:
+
+  # Zookeeper client configuration
+  zookeeper:
+
+    # Port number
+    port: 2181
+
+  # KeyDB/Redis client configuration
+  keydb:
+
+    # Port number
+    port: 6379
+
+    # Hostname; use `cluster` network floating IP address
+    hostname: 10.0.1.250
+
+    # Path, usually "/0"
+    path: "/0"
+
+  # PostgreSQL client configuration
+  postgres:
+    
+    # Port number
+    port: 5432
+
+    # Hostname; use `cluster` network floating IP address
+    hostname: 10.0.1.250
+
+    # Credentials
+    credentials:
+    
+      # API database
+      api:
+
+        # Database name
+        database: pvcapi
+
+        # Username
+        username: pvcapi
+
+        # Password
+        password: pvcapiPassw0rd
+
+      # DNS database
+      dns:
+      
+        # Database name
+        database: pvcdns
+
+        # Username
+        username: pvcdns
+
+        # Password
+        password: pvcdnsPassw0rd
+
+# Timer information
+timer:
+
+  # VM shutdown timeout (seconds)
+  vm_shutdown_timeout: 180
+
+  # Node keepalive interval (seconds)
+  keepalive_interval: 5
+
+  # Monitoring interval (seconds)
+  monitoring_interval: 60
+
+# Fencing configuration
+fencing:
+
+  # Disable fencing or not on IPMI failure at startup
+  # Setting this value to "no" will allow fencing to be enabled even if does not respond during node daemon
+  # startup. This will allow future fencing to be attempted if it later recovers.
+  disable_on_ipmi_failure: no
+
+  # Fencing intervals
+  intervals:
+
+    # Fence intervals (number of keepalives)
+    fence_intervals: 6
+
+    # Suicide intervals (number of keepalives; 0 to disable)
+    suicide_intervals: 0
+
+  # Fencing actions
+  actions:
+  
+    # Successful fence action ("migrate" or "none")
+    successful_fence: migrate
+
+    # Failed fence action ("migrate" or "none")
+    failed_fence: none
+
+  # IPMI details
+  ipmi:
+
+    # Hostname format; use a "{node_id}" entry for a template, or a literal hostname
+    hostname: "pvchv{node_id}-lom.mydomain.tld"
+
+    # IPMI username
+    username: admin
+
+    # IPMI password
+    password: S3cur3IPMIP4ssw0rd
+
+
+# VM migration configuration
+migration:
+
+  # Target selection default value (mem, memprov, load, vcpus, vms)
+  target_selector: mem
+
+# Logging configuration
+logging:
+
+  # Enable or disable debug logging (all services)
+  debug_logging: yes
+
+  # Enable or disable file logging
+  file_logging: no
+
+  # Enable or disable stdout logging (to journald)
+  stdout_logging: yes
+
+  # Enable or disable Zookeeper logging (for "pvc node log" functionality)
+  zookeeper_logging: yes
+
+  # Enable or disable ANSI colour sequences in logs
+  log_colours: yes
+
+  # Enable or disable dates in logs
+  log_dates: yes
+
+  # Enale or disable keepalive event logging
+  log_keepalives: yes
+
+  # Enable or disable cluster detail logging during keepalive events
+  log_cluster_details: yes
+
+  # Enable or disable monitoring detail logging during healthcheck events
+  log_monitoring_details: yes
+
+  # Number of VM console log lines to store in Zookeeper (per VM)
+  console_log_lines: 1000
+
+  # Number of node log lines to store in Zookeeper (per node)
+  node_log_lines: 2000
+
+# Guest networking configuration
+guest_networking:
+
+  # Bridge device for "bridged"-type networks
+  bridge_device: ens4
+
+  # Bridge device MTU
+  bridge_mtu: 1500
+
+  # Enable or disable SR-IOV functionality
+  sriov_enable: no
+
+  # SR-IOV configuration (list of PFs)
+  sriov_device:
+
+      # SR-IOV device; if this device isn't found, it is ignored on a given node
+    - device: ens1f1
+
+      # SR-IOV device MTU
+      mtu: 9000
+
+      # Number of VFs on this device
+      vfcount: 4
+
+# Ceph configuration
+ceph:
+
+  # Main config file name
+  ceph_config_file: "ceph.conf"
+
+  # Admin keyring file name
+  ceph_keyring_file: "ceph.client.admin.keyring"
+
+  # Monitor port, usually 6789
+  monitor_port: 6789
+
+  # Monitor host(s), enable only you want to use hosts other than the coordinators
+  #monitor_hosts:
+  #  - pvchv1
+  #  - pvchv2
+  #  - pvchv3
+
+  # Storage secret UUID, generated during Ansible cluster bootstrap
+  secret_uuid: ""
+
+# API configuration
+api:
+
+  # API listening configuration
+  listen:
+
+    # Listen address, usually upstream floating IP
+    address: 10.0.0.250
+
+    # Listen port, usually 7370
+    port: 7370
+
+  # Authentication configuration
+  authentication:
+  
+    # Enable or disable authentication
+    enabled: yes
+
+    # Secret key for API cookies (long and secure password or UUID)
+    secret_key: "1234567890abcdefghijklmnopqrstuvwxyz"
+
+    # Authentication source (token, others in future)
+    source: token
+
+  # Token configuration
+  token:
+
+      # A friendly description
+    - description: "testing"
+        
+      # The token (long and secure password or UUID)
+      token: "1234567890abcdefghijklmnopqrstuvwxyz"
+
+  # SSL configuration
+  ssl:
+
+    # Enable or disable SSL operation
+    enabled: no
+
+    # Certificate file path
+    certificate: ""
+
+    # Private key file path
+    private_key: ""
+
+# Automatic backups
+autobackup:
+
+  # Backup root path on the node, used as the remote mountpoint
+  # Must be an absolute path beginning with '/'
+  # If remote_mount is enabled, the remote mount will be mounted on this directory
+  # If remote_mount is enabled, it is recommended to use a path under `/tmp` for this
+  # If remote_mount is disabled, a real filesystem must be mounted here (PVC system volumes are small!)
+  backup_root_path: "/tmp/backups"
+
+  # Suffix to the backup root path, used to allow multiple PVC systems to write to a single root path
+  # Must begin with '/'; leave empty to use the backup root path directly
+  # Note that most remote mount options can fake this if needed, but provided to ensure local compatability
+  backup_root_suffix: "/mycluster"
+
+  # VM tag(s) to back up
+  # Only VMs with at least one of the given tag(s) will be backed up; all others will be skipped
+  backup_tags:
+    - "backup"
+    - "mytag"
+
+  # Backup schedule: when and what format to take backups
+  backup_schedule:
+
+    full_interval:  7  # Number of total backups between full backups; others are incremental
+                       # > If this number is 1, every backup will be a full backup and no incremental
+                       #   backups will be taken
+                       # > If this number is 2, every second backup will be a full backup, etc.
+
+    full_retention: 2  # Keep this many full backups; the oldest will be deleted when a new one is
+                       # taken, along with all child incremental backups of that backup
+                       # > Should usually be at least 2 when using incrementals (full_interval > 1) to
+                       #   avoid there being too few backups after cleanup from a new full backup
+
+  # Automatic mount settings
+  # These settings permit running an arbitrary set of commands, ideally a "mount" command or similar, to
+  # ensure that a remote filesystem is mounted on the backup root path
+  # While the examples here show absolute paths, that is not required; they will run with the $PATH of the
+  # executing environment (either the "pvc" command on a CLI or a cron/systemd timer)
+  # A "{backup_root_path}" f-string/str.format type variable MAY be present in any cmds string to represent
+  # the above configured root backup path, which is interpolated at runtime
+  # If multiple commands are given, they will be executed in the order given; if no commands are given,
+  # nothing is executed, but the keys MUST be present
+  auto_mount:
+
+    enabled: no  # Enable automatic mount/unmount support
+
+    # These commands are executed at the start of the backup run and should mount a filesystem
+    mount_cmds:
+
+      # This example shows an NFS mount leveraging the backup_root_path variable
+      - "/usr/sbin/mount.nfs -o nfsvers=3 10.0.0.10:/backups {backup_root_path}"
+
+    # These commands are executed at the end of the backup run and should unmount a filesystem
+    unmount_cmds:
+
+      # This example shows a generic umount leveraging the backup_root_path variable
+      - "/usr/bin/umount {backup_root_path}"
+
+# VIM modeline, requires "set modeline" in your VIMRC
+# vim: expandtab shiftwidth=2 tabstop=2 filetype=yaml
--- a/worker-daemon/daemon_lib
+++ b/worker-daemon/daemon_lib
@ -0,0 +1 @@
+../daemon-common
--- a/worker-daemon/pvcworkerd.py
+++ b/worker-daemon/pvcworkerd.py
@ -0,0 +1,24 @@
+#!/usr/bin/env python3
+
+# pvcworkerd.py - Health daemon startup stub
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+import pvcworkerd.Daemon  # noqa: F401
+
+pvcworkerd.Daemon.entrypoint()
--- a/worker-daemon/pvcworkerd.service
+++ b/worker-daemon/pvcworkerd.service
@ -0,0 +1,20 @@
+# Parallel Virtual Cluster worker daemon unit file
+
+[Unit]
+Description = Parallel Virtual Cluster worker daemon
+After = network.target
+Wants = network-online.target
+PartOf = pvc.target
+
+[Service]
+Type = simple
+WorkingDirectory = /usr/share/pvc
+Environment = PYTHONUNBUFFERED=true
+Environment = PVC_CONFIG_FILE=/etc/pvc/pvc.conf
+ExecStartPre = /bin/sleep 2
+ExecStart = /usr/share/pvc/pvcworkerd.sh
+ExecStopPost = /bin/sleep 2
+Restart = on-failure
+
+[Install]
+WantedBy = pvc.target
--- a/worker-daemon/pvcworkerd.sh
+++ b/worker-daemon/pvcworkerd.sh
@ -25,10 +25,10 @@ CELERY_BIN="$( which celery )"
 # app arguments work in a non-backwards-compatible way with Celery 5.
 case "$( cat /etc/debian_version )" in
    10.*)
-        CELERY_ARGS="worker --app pvcapid.flaskapi.celery --events --concurrency 3 --hostname pvcworkerd@$(hostname -s) --queues $(hostname -s) --loglevel INFO"
+        CELERY_ARGS="worker --app pvcworkerd.Daemon.celery --events --concurrency 3 --hostname pvcworkerd@$(hostname -s) --queues $(hostname -s) --loglevel INFO"
    ;;
    *)
-        CELERY_ARGS="--app pvcapid.flaskapi.celery worker --events --concurrency 3 --hostname pvcworkerd@$(hostname -s) --queues $(hostname -s) --loglevel INFO"
+        CELERY_ARGS="--app pvcworkerd.Daemon.celery worker --events --concurrency 3 --hostname pvcworkerd@$(hostname -s) --queues $(hostname -s) --loglevel INFO"
    ;;
 esac

--- a/worker-daemon/pvcworkerd/Daemon.py
+++ b/worker-daemon/pvcworkerd/Daemon.py
@ -0,0 +1,237 @@
+#!/usr/bin/env python3
+
+# Daemon.py - PVC Node Worker daemon
+# Part of the Parallel Virtual Cluster (PVC) system
+#
+#    Copyright (C) 2018-2022 Joshua M. Boniface <joshua@boniface.me>
+#
+#    This program is free software: you can redistribute it and/or modify
+#    it under the terms of the GNU General Public License as published by
+#    the Free Software Foundation, version 3.
+#
+#    This program is distributed in the hope that it will be useful,
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#    GNU General Public License for more details.
+#
+#    You should have received a copy of the GNU General Public License
+#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+#
+###############################################################################
+
+from celery import Celery
+
+import daemon_lib.config as cfg
+
+from daemon_lib.zkhandler import ZKConnection
+from daemon_lib.vm import (
+    vm_worker_flush_locks,
+    vm_worker_attach_device,
+    vm_worker_detach_device,
+)
+from daemon_lib.ceph import (
+    osd_worker_add_osd,
+    osd_worker_replace_osd,
+    osd_worker_refresh_osd,
+    osd_worker_remove_osd,
+    osd_worker_add_db_vg,
+)
+from daemon_lib.benchmark import (
+    worker_run_benchmark,
+)
+from daemon_lib.vmbuilder import (
+    worker_create_vm,
+)
+
+# Daemon version
+version = "0.9.83"
+
+
+config = cfg.get_configuration()
+config["daemon_name"] = "pvcworkerd"
+config["daemon_version"] = version
+
+
+celery_task_uri = "redis://{}:{}{}".format(
+    config["keydb_host"], config["keydb_port"], config["keydb_path"]
+)
+celery = Celery(
+    "pvcworkerd",
+    broker=celery_task_uri,
+    result_backend=celery_task_uri,
+    result_extended=True,
+)
+
+
+#
+# Job functions
+#
+@celery.task(name="provisioner.create", bind=True, routing_key="run_on")
+def create_vm(
+    self,
+    vm_name=None,
+    profile_name=None,
+    define_vm=True,
+    start_vm=True,
+    script_run_args=[],
+    run_on="primary",
+):
+    return worker_create_vm(
+        self,
+        config,
+        vm_name,
+        profile_name,
+        define_vm=define_vm,
+        start_vm=start_vm,
+        script_run_args=script_run_args,
+    )
+
+
+@celery.task(name="storage.benchmark", bind=True, routing_key="run_on")
+def storage_benchmark(self, pool=None, run_on="primary"):
+    @ZKConnection(config)
+    def run_storage_benchmark(zkhandler, self, pool):
+        return worker_run_benchmark(zkhandler, self, config, pool)
+
+    return run_storage_benchmark(self, pool)
+
+
+@celery.task(name="vm.flush_locks", bind=True, routing_key="run_on")
+def vm_flush_locks(self, domain=None, force_unlock=False, run_on="primary"):
+    @ZKConnection(config)
+    def run_vm_flush_locks(zkhandler, self, domain, force_unlock=False):
+        return vm_worker_flush_locks(zkhandler, self, domain, force_unlock=force_unlock)
+
+    return run_vm_flush_locks(self, domain, force_unlock=force_unlock)
+
+
+@celery.task(name="vm.device_attach", bind=True, routing_key="run_on")
+def vm_device_attach(self, domain=None, xml=None, run_on=None):
+    @ZKConnection(config)
+    def run_vm_device_attach(zkhandler, self, domain, xml):
+        return vm_worker_attach_device(zkhandler, self, domain, xml)
+
+    return run_vm_device_attach(self, domain, xml)
+
+
+@celery.task(name="vm.device_detach", bind=True, routing_key="run_on")
+def vm_device_detach(self, domain=None, xml=None, run_on=None):
+    @ZKConnection(config)
+    def run_vm_device_detach(zkhandler, self, domain, xml):
+        return vm_worker_detach_device(zkhandler, self, domain, xml)
+
+    return run_vm_device_detach(self, domain, xml)
+
+
+@celery.task(name="osd.add", bind=True, routing_key="run_on")
+def osd_add(
+    self,
+    device=None,
+    weight=None,
+    ext_db_ratio=None,
+    ext_db_size=None,
+    split_count=None,
+    run_on=None,
+):
+    @ZKConnection(config)
+    def run_osd_add(
+        zkhandler,
+        self,
+        run_on,
+        device,
+        weight,
+        ext_db_ratio=None,
+        ext_db_size=None,
+        split_count=None,
+    ):
+        return osd_worker_add_osd(
+            zkhandler,
+            self,
+            run_on,
+            device,
+            weight,
+            ext_db_ratio,
+            ext_db_size,
+            split_count,
+        )
+
+    return run_osd_add(
+        self, run_on, device, weight, ext_db_ratio, ext_db_size, split_count
+    )
+
+
+@celery.task(name="osd.replace", bind=True, routing_key="run_on")
+def osd_replace(
+    self,
+    osd_id=None,
+    new_device=None,
+    old_device=None,
+    weight=None,
+    ext_db_ratio=None,
+    ext_db_size=None,
+    run_on=None,
+):
+    @ZKConnection(config)
+    def run_osd_replace(
+        zkhandler,
+        self,
+        run_on,
+        osd_id,
+        new_device,
+        old_device=None,
+        weight=None,
+        ext_db_ratio=None,
+        ext_db_size=None,
+    ):
+        return osd_worker_replace_osd(
+            zkhandler,
+            self,
+            run_on,
+            osd_id,
+            new_device,
+            old_device,
+            weight,
+            ext_db_ratio,
+            ext_db_size,
+        )
+
+    return run_osd_replace(
+        self, run_on, osd_id, new_device, old_device, weight, ext_db_ratio, ext_db_size
+    )
+
+
+@celery.task(name="osd.refresh", bind=True, routing_key="run_on")
+def osd_refresh(self, osd_id=None, device=None, ext_db_flag=False, run_on=None):
+    @ZKConnection(config)
+    def run_osd_refresh(zkhandler, self, run_on, osd_id, device, ext_db_flag=False):
+        return osd_worker_refresh_osd(
+            zkhandler, self, run_on, osd_id, device, ext_db_flag
+        )
+
+    return run_osd_refresh(self, run_on, osd_id, device, ext_db_flag)
+
+
+@celery.task(name="osd.remove", bind=True, routing_key="run_on")
+def osd_remove(self, osd_id=None, force_flag=False, skip_zap_flag=False, run_on=None):
+    @ZKConnection(config)
+    def run_osd_remove(
+        zkhandler, self, run_on, osd_id, force_flag=False, skip_zap_flag=False
+    ):
+        return osd_worker_remove_osd(
+            zkhandler, self, run_on, osd_id, force_flag, skip_zap_flag
+        )
+
+    return run_osd_remove(self, run_on, osd_id, force_flag, skip_zap_flag)
+
+
+@celery.task(name="osd.add_db_vg", bind=True, routing_key="run_on")
+def osd_add_db_vg(self, device=None, run_on=None):
+    @ZKConnection(config)
+    def run_osd_add_db_vg(zkhandler, self, run_on, device):
+        return osd_worker_add_db_vg(zkhandler, self, run_on, device)
+
+    return run_osd_add_db_vg(self, run_on, device)
+
+
+def entrypoint():
+    pass
--- a/worker-daemon/pvcworkerd/init.py
+++ b/worker-daemon/pvcworkerd/init.py
Author	SHA1	Message	Date
Joshua M. Boniface	988de1218f	Bump version to 0.9.83	2023-12-01 17:37:42 -05:00
Joshua M. Boniface	0ffcbf3152	Fix bad file paths	2023-12-01 17:25:12 -05:00
Joshua M. Boniface	ad8d8cf7a7	Avoid removing changelog file until the end Avoids losing a changelog if something else fails.	2023-12-01 17:23:43 -05:00
Joshua M. Boniface	915a84ee3c	Fix psql check for new configs	2023-12-01 03:58:21 -05:00
Joshua M. Boniface	6315a068d1	Use SafeLoader for config load	2023-12-01 02:01:24 -05:00
Joshua M. Boniface	2afd064445	Update CLI to read from pvc.conf	2023-12-01 01:53:33 -05:00
Joshua M. Boniface	7cb9ebae6b	Remove legacy configuration handler This is not going to be needed.	2023-12-01 01:25:40 -05:00
Joshua M. Boniface	1fb0463dea	Adjust daemon service startup Add healthd, adjust workerd, lower waittime	2023-11-30 03:28:02 -05:00
Joshua M. Boniface	13549fc995	Depend pvcnoded on pvcworkerd	2023-11-30 03:24:01 -05:00
Joshua M. Boniface	102c3c3106	Port all Celery worker functions to discrete pkg Moves all tasks run by the Celery worker into a discrete package/module for easier installation. Also adjusts several parameters throughout to accomplish this.	2023-11-30 02:24:54 -05:00
Joshua M. Boniface	0c0fb65c62	Rework Flask API to route Celery tasks manually Avoids needing to define any of these tasks here; they can all be defined in the pvcworkerd code.	2023-11-30 00:40:09 -05:00
Joshua M. Boniface	03a738f878	Move config parser into daemon_lib And reformat/add config values for API.	2023-11-30 00:05:37 -05:00
Joshua M. Boniface	4df5fdbca6	Update description of example conf	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	97eb63ebab	Clean up config naming and dead files	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	4a2eba0961	Improve node output messages (from pvchealthd) 1. Output startup "list" entries in cyan with s state 2. Add start of keepalive run message	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	077dd8708f	Add check start message	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	b6b5786c3b	Output list in cyan (s state)	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	ad738dec40	Clean up plugin pycache too	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	d2b764a2c7	Output more details on startup	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	b8aecd9c83	Wait less time between restarts	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	11db3c5b20	Fix ordering during termination	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	7a7c975eff	Ensure return from health shutdown	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	fa12a3c9b1	Permit buffered log appending	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	787f4216b3	Expand Zookeeper log daemon prefix to match	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	647cba3cf5	Expand startup width for new daemon name	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	921ecb3a05	Fix name in kydb plugin	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	6a68cf665b	Wait between service restarts	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	41f4e4fb2f	Split health monitoring into discrete daemon/pkg	2023-11-29 21:21:51 -05:00
Joshua M. Boniface	74a416165d	Move default autobackup config to pvc.conf	2023-11-29 21:21:37 -05:00
Joshua M. Boniface	83ceb41138	Add daemon name to Logger entries	2023-11-29 15:18:37 -05:00
Joshua M. Boniface	2e5958640a	Remove erroneous time from message	2023-11-29 15:12:41 -05:00
Joshua M. Boniface	d65b18f15b	Improve handling of loglines on client	2023-11-29 15:12:41 -05:00
Joshua M. Boniface	7abc697c8a	Improve Zookeeper log handling Ensures that messages are fully read before each append. Adds more Zookeeper hits, but ensures logs won't be overwritten by multiple daemons. Also don't use a set on the client side, to avoid "removing duplicate" entries erroneously.	2023-11-29 15:12:41 -05:00
Joshua M. Boniface	1adc3674b6	Install everything uploaded	2023-11-29 15:12:28 -05:00
Joshua M. Boniface	bd811408f9	Remove "Python 3" from package descriptions	2023-11-29 15:12:09 -05:00
Joshua M. Boniface	6090b286fe	Improve copying to avoid leaving stale files	2023-11-29 14:46:59 -05:00
Joshua M. Boniface	2545a7b744	Allow similar for IPMI hostnames	2023-11-28 16:09:01 -05:00
Joshua M. Boniface	ce907ff26a	Allow specifying static IPs instead of a file	2023-11-28 15:28:31 -05:00
Joshua M. Boniface	71e589e461	Remove superflous debug output This is printed in the startup logo block anyways.	2023-11-27 13:46:30 -05:00
Joshua M. Boniface	fc3d292081	Add missing subdirectory configs	2023-11-27 13:40:07 -05:00
Joshua M. Boniface	eab1ae873b	Ensure upstream_gateway key will exist	2023-11-27 13:37:57 -05:00
Joshua M. Boniface	eaf93cdf96	Readd missing subsystem configurations	2023-11-27 13:33:41 -05:00
Joshua M. Boniface	c8f4cbb39e	Fix node entry keys	2023-11-27 13:24:01 -05:00
Joshua M. Boniface	786fae7769	Improve logo output	2023-11-27 13:01:43 -05:00
Joshua M. Boniface	17f81e8296	Refactor pvcapid to use new configuration	2023-11-27 12:49:26 -05:00
Joshua M. Boniface	bcc57638a9	Refactor pvcnoded to use new configuration	2023-11-26 15:41:25 -05:00
Joshua M. Boniface	a593ee9c2e	Reorganize and add more configuration items	2023-11-26 15:32:53 -05:00
Joshua M. Boniface	2666e0603e	Update dnsmasq script to use new config file	2023-11-26 14:18:13 -05:00
Joshua M. Boniface	dab7396196	Move to unified pvc.conf configuration file	2023-11-26 14:16:21 -05:00
Joshua M. Boniface	460a2dd09f	Bump version to 0.9.82	2023-11-25 15:38:50 -05:00
Joshua M. Boniface	c30ea66d14	Add swagger document to gitignore	2023-11-25 15:37:01 -05:00
Joshua M. Boniface	24cabd3b99	Fix missing result_backend on Debian 10/11 For whatever reason, a Celery worker on <5.2.x was not picking these up. Move them back to the root of the module so they are properly picked up on these older versions but still prevents calling the routing functions during an API doc generation.	2023-11-25 15:35:25 -05:00
 @ -1 +1 @@
 .9.81
 .9.83
				`@ -1 +0,0 @@`
				`client-cli/autobackup.sample.yaml usr/share/pvc`