Bump version to 0.9.11

Fix a bad reference
Fix bad links
2021-01-05 15:58:26 -05:00 · 2021-01-03 23:29:43 -05:00 · 2021-01-03 16:56:13 -05:00 · 2021-01-03 16:50:18 -05:00 · 2021-01-03 16:41:15 -05:00 · 2021-01-03 16:38:07 -05:00
19 changed files with 672 additions and 181 deletions
--- a/README.md
+++ b/README.md
@ -20,6 +20,20 @@ To get started with PVC, please see the [About](https://parallelvirtualcluster.r

 ## Changelog

+#### v0.9.11
+
+  * Documentation updates
+  * Adds VNC information to VM info
+  * Goes back to external Ceph commands for disk usage
+
+#### v0.9.10
+
+  * Moves OSD stats uploading to primary, eliminating reporting failures while hosts are down
+  * Documentation updates
+  * Significantly improves RBD locking behaviour in several situations, eliminating cold-cluster start issues and failed VM boot-ups after crashes
+  * Fixes some timeout delays with fencing
+  * Fixes bug in validating YAML provisioner userdata
+
 #### v0.9.9

  * Adds documentation updates
--- a/api-daemon/pvcapid/flaskapi.py
+++ b/api-daemon/pvcapid/flaskapi.py
@ -1023,6 +1023,15 @@ class API_VM_Root(Resource):
                console:
                  type: string
                  descritpion: The serial console type of the VM
+                vnc:
+                  type: object
+                  properties:
+                    listen:
+                      type: string
+                      description: The active VNC listen address or 'None'
+                    port:
+                      type: string
+                      description: The active VNC port or 'None'
                emulator:
                  type: string
                  description: The binary emulator of the VM
--- a/client-cli/cli_lib/vm.py
+++ b/client-cli/cli_lib/vm.py
@ -1071,6 +1071,11 @@ def format_info(config, domain_information, long_output):
    ainformation.append('{}vCPUs:{}              {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['vcpu']))
    ainformation.append('{}Topology (S/C/T):{}   {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['vcpu_topology']))

+    if domain_information['vnc'].get('listen', 'None') != 'None' and domain_information['vnc'].get('port', 'None') != 'None':
+        ainformation.append('')
+        ainformation.append('{}VNC listen:{}         {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['vnc']['listen']))
+        ainformation.append('{}VNC port:{}           {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['vnc']['port']))
+
    if long_output is True:
        # Virtualization information
        ainformation.append('')
--- a/client-cli/pvc.py
+++ b/client-cli/pvc.py
@ -3320,7 +3320,7 @@ def provisioner_userdata_add(name, filename):
    userdata = filename.read()
    filename.close()
    try:
-        yaml.load(userdata, Loader=yaml.FullLoader)
+        yaml.load(userdata, Loader=yaml.SafeLoader)
    except Exception as e:
        click.echo("Error: Userdata document is malformed")
        cleanup(False, e)
@ -3397,7 +3397,7 @@ def provisioner_userdata_modify(name, filename, editor):
        filename.close()

    try:
-        yaml.load(userdata, Loader=yaml.FullLoader)
+        yaml.load(userdata, Loader=yaml.SafeLoader)
    except Exception as e:
        click.echo("Error: Userdata document is malformed")
        cleanup(False, e)
--- a/daemon-common/common.py
+++ b/daemon-common/common.py
@ -267,6 +267,13 @@ def getInformationFromXML(zk_conn, uuid):
    except Exception:
        domain_profile = None

+    try:
+        domain_vnc = zkhandler.readdata(zk_conn, '/domains/{}/vnc'.format(uuid))
+        domain_vnc_listen, domain_vnc_port = domain_vnc.split(':')
+    except Exception:
+        domain_vnc_listen = 'None'
+        domain_vnc_port = 'None'
+
    parsed_xml = getDomainXML(zk_conn, uuid)

    try:
@ -312,6 +319,10 @@ def getInformationFromXML(zk_conn, uuid):
        'arch': domain_arch,
        'machine': domain_machine,
        'console': domain_console,
+        'vnc': {
+            'listen': domain_vnc_listen,
+            'port': domain_vnc_port
+        },
        'emulator': domain_emulator,
        'features': domain_features,
        'disks': domain_disks,
--- a/daemon-common/vm.py
+++ b/daemon-common/vm.py
@ -207,6 +207,7 @@ def define_vm(zk_conn, config_data, target_node, node_limit, node_selector, node
        '/domains/{}/consolelog'.format(dom_uuid): '',
        '/domains/{}/rbdlist'.format(dom_uuid): formatted_rbd_list,
        '/domains/{}/profile'.format(dom_uuid): profile,
+        '/domains/{}/vnc'.format(dom_uuid): '',
        '/domains/{}/xml'.format(dom_uuid): config_data
    })

--- a/debian/changelog
+++ b/debian/changelog
@ -1,3 +1,21 @@
+pvc (0.9.11-0) unstable; urgency=high
+
+  * Documentation updates
+  * Adds VNC information to VM info
+  * Goes back to external Ceph commands for disk usage
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Tue, 05 Jan 2021 15:58:26 -0500
+
+pvc (0.9.10-0) unstable; urgency=high
+
+  * Moves OSD stats uploading to primary, eliminating reporting failures while hosts are down
+  * Documentation updates
+  * Significantly improves RBD locking behaviour in several situations, eliminating cold-cluster start issues and failed VM boot-ups after crashes
+  * Fixes some timeout delays with fencing
+  * Fixes bug in validating YAML provisioner userdata
+
+ -- Joshua M. Boniface <joshua@boniface.me>  Tue, 15 Dec 2020 10:45:15 -0500
+
 pvc (0.9.9-0) unstable; urgency=high

  * Adds documentation updates
--- a/docs/about.md
+++ b/docs/about.md
@ -41,31 +41,31 @@ PVC is build from a number of other, open source components. The main system its

 Virtual machines themselves are run with the Linux KVM subsystem via the Libvirt virtual machine management library. This provides the maximum flexibility and compatibility for running various guest operating systems in multiple modes (fully-virtualized, para-virtualized, virtio-enabled, etc.).

-To manage cluster state, PVC uses Zookeeper. This is an Apache project designed to provide a highly-available and always-consistent key-value database. The various daemons all connect to the distributed Zookeeper database to both obtain details about cluster state, and to manage that state. For instance the node daemon watches Zookeeper for information on what VMs to run, networks to create, etc., while the API writes information to Zookeeper in response to requests.
+To manage cluster state, PVC uses Zookeeper. This is an Apache project designed to provide a highly-available and always-consistent key-value database. The various daemons all connect to the distributed Zookeeper database to both obtain details about cluster state, and to manage that state. For instance the node daemon watches Zookeeper for information on what VMs to run, networks to create, etc., while the API writes to or reads information from Zookeeper in response to requests. The Zookeeper database is the glue which holds the cluster together.

-Additional relational database functionality, specifically for the DNS aggregation subsystem and the VM provisioner, is provided by the PostgreSQL database and the Patroni management tool, which provides automatic clustering and failover for PostgreSQL database instances.
+Additional relational database functionality, specifically for the managed network DNS aggregation subsystem and the VM provisioner, is provided by the PostgreSQL database system and the Patroni management tool, which provides automatic clustering and failover for PostgreSQL database instances.

-Node network routing for managed networks providing EBGP VXLAN and route-learning is provided by FRRouting, a descendant project of Quaaga and GNU Zebra.
+Node network routing for managed networks providing EBGP VXLAN and route-learning is provided by FRRouting, a descendant project of Quaaga and GNU Zebra. Upstream routers can use this interface to learn routes to cluster networks as well.

-The storage subsystem is provided by Ceph, a distributed object-based storage subsystem with extensive scalability, self-managing, and self-healing functionality. The Ceph RBD (Rados Block Device) subsystem is used to provide VM block devices similar to traditional LVM or ZFS zvols, but in a distributed, shared-storage manner.
+The storage subsystem is provided by Ceph, a distributed object-based storage subsystem with extensive scalability, self-managing, and self-healing functionality. The Ceph RBD (RADOS Block Device) subsystem is used to provide VM block devices similar to traditional LVM or ZFS zvols, but in a distributed, shared-storage manner.

 All the components are designed to be run on top of Debian GNU/Linux, specifically Debian 10.X "Buster", with the SystemD system service manager. This OS provides a stable base to run the various other subsystems while remaining truly Free Software, while SystemD provides functionality such as automatic daemon restarting and complex startup/shutdown ordering.

 ## Cluster Architecture

-A PVC cluster is based around "nodes", which are physical servers on which the various daemons, storage, networks, and virtual machines run. Each node is self-contained and is able to perform any and all cluster functions if needed; there is no segmentation of function between different types of physical hosts. Ideally, all nodes in a cluster will be identical in specifications, but in some situations mismatched nodes are acceptable, with limitations.
+A PVC cluster is based around "nodes", which are physical servers on which the various daemons, storage, networks, and virtual machines run. Each node is self-contained and is able to perform any and all cluster functions if needed and configured to do so; there is no strict segmentation of function between different "types" of physical hosts. Ideally, all nodes in a cluster will be identical in specifications, but in some situations mismatched nodes are acceptable, with limitations.

-A subset of the nodes, called "coordinators", are statically configured to provide additional services for the cluster. For instance, all databases, FRRouting instances, and Ceph management daemons run only on the set of cluster coordinators. At cluster bootstrap, 1 (testing-only), 3 (small clusters), or 5 (large clusters) nodes may be chosen as the coordinators. Other nodes can then be added as "hypervisor" nodes, which then provide only block device (storage) and VM (compute) functionality by connecting to the set of coordinators. This limits the scaling problem of the databases while ensuring there is still maximum redundancy and resiliency for the core cluster services. 
+A subset of the nodes, called "coordinators", are statically configured to provide services for the cluster. For instance, all databases, FRRouting instances, and Ceph management daemons run only on the set of cluster coordinators. At cluster bootstrap, 1 (testing-only), 3 (small clusters), or 5 (large clusters) nodes may be chosen as the coordinators. Other nodes can then be added as "hypervisor" nodes, which then provide only block device (storage) and VM (compute) functionality by connecting to the set of coordinators. This limits the scaling problem of the databases while ensuring there is still maximum redundancy and resiliency for the core cluster services. 

-Additional nodes can be added to the cluster either as coordinators, or as hypervisors, by adding them to the Ansible configuration and running it against the full set of nodes. Note that the number of coordinators must always be odd, and more than 5 coordinators are normally unnecessary and can cause issues with the database; it is thus normally advisable to add any nodes beyond the initial set as hypervisors instead of coordinators. Nodes can be removed from service, but this is a manual process and should not be attempted unless absolutely required; the Ceph subsystem in particular is sensitive to changes in the coordinator nodes.
+Additional nodes can be added to the cluster either as coordinators, or as hypervisors, by adding them to the Ansible configuration and running it against the full set of nodes. Note that the number of coordinators must always be odd, and more than 5 coordinators are normally unnecessary and can cause issues with the database; it is thus normally advisable to add any nodes beyond the initial set as hypervisors instead of coordinators. Nodes can be removed from service, but this is a manual process and should not be attempted unless absolutely required; the Ceph subsystem in particular is sensitive to changes in the coordinator nodes. Nodes can also be upgraded or replaced dynamically and without interrupting the cluster, allowing for seamless hardware maintenance, upgrades, and even replacement, as cluster state configuration is held cluster-wide.

 During runtime, one coordinator is elected the "primary" for the cluster. This designation can shift dynamically in response to cluster events, or be manually migrated by an administrator. The coordinator takes on a number of roles for which only one host may be active at once, for instance to provide DHCP services to managed client networks or to interface with the API.

-Nodes are networked together via a set of statically-configured networks. At a minimum, 2 discrete networks are required, with an optional 3rd.
+Nodes are networked together via a set of statically-configured, simple layer-2 networks. At a minimum, 2 discrete networks are required, with an optional 3rd.

- * The "upstream" network is the primary network for the nodes, and provides functions such as upstream Internet access, routing to and from the cluster nodes, and management via the API; it may be either a firewalled public or NAT'd RFC1918 network, but should never be exposed directly to the Internet.
+ * The "upstream" network is the primary network for the nodes, and provides functions such as upstream Internet access, routing to and from the cluster nodes, and management via the API; it may be either a firewalled public or NAT'd RFC1918 network, but should never be exposed directly to the Internet. It should also contain, or be able to route to, the IPMI BMC management interfaces of the node chassis'.
 * The "cluster" network is an unrouted RFC1918 network which provides inter-node communication for managed client network traffic (VXLANs), cross-node routing, VM migration and failover, and database replication and access.
- * The "storage" network is another unrouted RFC1918 network which provides a dedicated logical and/or physical link between the nodes for storage traffic, including VM block device storage traffic, inter-OSD replication traffic, and Ceph heartbeat traffic, thus allowing it to be completely isolated from the other networks for maximum performance. This network can be optionally colocated with the "cluster" network, by specifying the same device for both, and can be further combined by specifying the same IP for both to completely collapse the "cluster" and "storage" networks. This may be ideal to simply management of small clusters.
+ * The "storage" network is another unrouted RFC1918 network which provides a dedicated logical and/or physical link between the nodes for storage traffic, including VM block device storage traffic, inter-OSD replication traffic, and Ceph heartbeat traffic, thus allowing it to be completely isolated from the other networks for maximum performance. This network can be optionally colocated with the "cluster" network, by specifying the same device for both, and can be further combined by specifying the same IP for both to completely collapse the "cluster" and "storage" networks. A collapsed cluster+storage configuration may be ideal to simplify management of small clusters, or a split configuration can be used to provide flexbility for large or demanding high-performance clusters - this choice is left to the administrator based on their needs.
 
 Within each network is a single "floating" IP address which follows the primary coordinator, providing a single interface to the cluster. Once configured, the cluster is then able to create additional networks of two kinds, "bridged" traditional vLANs and "managed" routed VXLANs, to provide network access to VMs.

@ -79,15 +79,15 @@ The API client is a Flask-based RESTful API and is the core interface to PVC. By

 The API generally accepts all requests as HTTP form requests following standard RESTful guidelines, supporting arguments in the URI string or, with limited exceptions, in the message body. The API returns JSON response bodies to all requests consisting either of the information requested, or a `{ "message": "text" }` construct to pass informational status messages back to the client.

-The API client manual can be found at the [API manual page](/manuals/api), and the full API documentation can be found at the [API reference page](/manuals/api-reference.html).
+The API client manual can be found at the [API manual page](/manuals/api), and the full API details can be found in the [API reference specification](/manuals/api-reference.html).

 ### Direct Bindings

-The API client uses a dedicated set of Python libraries, packaged as the `pvc-daemon-common` Debian package, to communicate with the cluster. It is thus possible to build custom Python clients that directly interface with the PVC cluster, without having to get "into the weeds" of the Zookeeper or PostgreSQL databases.
+The API client uses a dedicated set of Python libraries, packaged as the `pvc-daemon-common` Debian package, to communicate with the cluster. One can thus use these libraries to build custom Python clients that directly interface with the PVC cluster, without having to get "into the weeds" of the Zookeeper or PostgreSQL databases.

 ### CLI Client

-The CLI client is a Python Click application, which provides a convenient CLI interface to the API client. It supports connecting to multiple clusters from a single instance, with or without authentication and over both HTTP or HTTPS, including a special "local" cluster if the client determines that an API configuration exists on the local host. Information about the configured clusters is stored in a local JSON document, and a default cluster can be set with an environment variable.
+The CLI client is a Python Click application, which provides a convenient CLI interface to the API client. It supports connecting to multiple clusters from a single instance, with or without authentication and over both HTTP or HTTPS, including a special "local" cluster if the client determines that an API configuration exists on the local host. Information about the configured clusters is stored in a local JSON document, and a default cluster can be set with an environment variable. The CLI client can thus be run either on PVC nodes themselves, or on other, remote systems which can then interface with cluster(s) over the network.

 The CLI client is self-documenting using the `-h`/`--help` arguments throughout, easing the administrator learning curve and providing easy access to command details. A short manual can also be found at the [CLI manual page](/manuals/cli).

@ -97,6 +97,8 @@ The overall management, deployment, bootstrapping, and configuring of nodes is a

 The Ansible configuration and architecture manual can be found at the [Ansible manual page](/manuals/ansible).

+The [getting started documentation](/getting-started) provides a walkthrough of using these tools to bootstrap a new cluster.
+
 ## Frequently Asked Questions

 ### General Questions
@ -116,12 +118,13 @@ PVC might be right for you if:
 1. You need KVM-based VMs.
 2. You want management of storage and networking (a.k.a. "batteries-included") in the same tool.
 3. You want hypervisor-level redundancy, able to tolerate hypervisor downtime seamlessly, for all elements of the stack.
+4. You have a requirement of at least 3 nodes' worth of compute and storage.

-I built PVC for my homelab first, found a perfect use-case with my employer, and think it might be useful to you too.
+If all you want is a simple home server solution, or you demand scalability beyond a few dozen compute nodes, PVC is likely not what you're looking for. Its sweet spot is specifically in the 3-9 node range, for instance in an advanced homelab, for SMBs or small ISPs with a relatively small server stack, or for MSPs looking to deploy small on-premises clusters at low cost.

 #### Is 3 hypervisors really the minimum?

-For a redundant cluster, yes. PVC requires a majority quorum for proper operation at various levels, and the smallest possible majority quorum is 2-of-3; thus 3 nodes is the safe minimum. That said, you can run PVC on a single node for testing/lab purposes without host-level redundancy, should you wish to do so, and it might also be possible to run 2 "main" systems with a 3rd "quorum observer" hosting only the management tools but no VMs, however this is not officially supported.
+For a redundant cluster, yes. PVC requires a majority quorum for proper operation at various levels, and the smallest possible majority quorum is 2-of-3; thus 3 nodes is the smallest safe minimum. That said, you can run PVC on a single node for testing/lab purposes without host-level redundancy, should you wish to do so, and it might also be possible to run 2 "main" systems with a 3rd "quorum observer" hosting only the management tools but no VMs; however these options are not officially supported, as PVC is designed primarily for 3+ node operation.

 ### Feature Questions

@ -133,6 +136,10 @@ No, not directly. PVC supports only KVM VMs. To run containers, you would need t

 Not yet. Right now, PVC management is done exclusively with the CLI interface to the API. A WebUI can and likely will be built in the future, but I'm not a frontend developer and I do not consider this a personal priority. As of late 2020 the API is generally stable, so I would welcome 3rd party assistance here.

+#### I want feature X, does it fit with PVC?
+
+That depends on the specific feature. I will limit features to those that align with the overall goals of PVC, that is to say, to provide an easy-to-use hyperconverged virtualization system focused on redundancy. If a feature suits this goal it is likely to be considered; if it does not, it will not. PVC is rapidly approaching the completion of its 1.0 roadmap, which I consider feature-complete for the primary usecase, and future versions may expand in scope.
+
 ### Storage Questions

 #### Can I use RAID-5/RAID-6 with PVC?
--- a/docs/cluster-architecture.md
+++ b/docs/cluster-architecture.md
@ -46,8 +46,6 @@ The following table provides bare-minimum, recommended, and optimal specificatio
 | Total RAM (n-1) | 32GB | 96GB | 128GB |
 | Total disk space | 200GB | 400GB | 800GB |

-Of these totals, some amount of CPU and RAM will be used by the storage subsystem and the PVC daemons themselves, meaning that the total available for virtual machines is slightly less. Generally, each OSD data disk will consume 1 vCPU at load and 1-2GB RAM, so nodes should be sized not only according to the VM workload, but the number of storage disks per node. Additionally the coordinator databases will use additional RAM and CPU resources of up to 1-4GB per node, though there is generally little need to spec coordinators any larger than non-coordinator nodes and the VM automatic node selection process will take used RAM into account by default.
-
 ### System Disks

 The system disk(s) chosen are important to consider, especially for coordinators. Ideally, an SSD, or two SSDs in RAID-1/mirroring are recommended for system disks. This helps ensure optimal performance for the system (e.g. swap space) and PVC components such as databases as well as the Ceph caches.
@ -62,6 +60,18 @@ The general rule for available resource capacity planning can be though of as "1

 For memory provisioning of VMs, PVC will warn the administrator, via a Degraded cluster state, if the "n-1" RAM quantity is exceeded by the total maximum allocation of all running VMs. This situation can be worked around with sufficient swap space on nodes to ensure there is overflow, however the warning cannot be overridden. If nodes are of mismatched sizes, the "n-1" RAM quantity is calculated by removing (one of) the largest node in the cluster and adding the remaining nodes' RAM counts together.

+### System Memory Utilization
+
+By default, several components of PVC outside of VMs will have large memory allocations, most notably Ceph OSD processes and Zookeeper database processes. These processes should be considered when selecting the RAM allocation of nodes, and adjusted in the Ansible `group_vars` if lower defaults are required.
+
+#### Ceph OSD processes
+
+By default, PVC will allow several GB (up to 4-6GB) of RAM allocation per OSD to maximize the available cache space and hence disk performance. This can be lowered as far as 939MB should the administrator require due to a low RAM configuration, but no further due to Ceph limitations; therefore at least 1GB of memory per storage OSD is required even in the most limited case.
+
+#### Zookeeper processes
+
+By default, the Java heap and stack sizes are set to 256MB and 512MB respectively, yieliding a memory usage of 500+MB after serveral days or weeks of uptime. This can be lowered to 32M or less for lightly-used clusters should the administrator require due to a low RAM configuration.
+
 ### Operating System and Architecture

 As an underlying OS, only Debian GNU/Linux 10.x "Buster" is supported by PVC. This is the operating system installed by the PVC [node installer](https://github.com/parallelvirtualcluster/pvc-installer) and expected by the PVC [Ansible configuration system](https://github.com/parallelvirtualcluster/pvc-ansible). Ubuntu or other Debian-derived distributions may work, but are not officially supported. PVC also makes use of a custom repository to provide the PVC software and an updated version of Ceph beyond what is available in the base operating system, and this is only compatible officially with Debian 10 "Buster". PVC will, in the future, upgrade to future versions of Debian based on their release schedule and testing; releases may be skipped for official support if required. As a general rule, using the current versions of the official node installer and Ansible repository is the preferred and only supported method for deploying PVC.
--- a/docs/index.md
+++ b/docs/index.md
@ -18,6 +18,20 @@ To get started with PVC, please see the [About](https://parallelvirtualcluster.r

 ## Changelog

+#### v0.9.11
+
+  * Documentation updates
+  * Adds VNC information to VM info
+  * Goes back to external Ceph commands for disk usage
+
+#### v0.9.10
+
+  * Moves OSD stats uploading to primary, eliminating reporting failures while hosts are down
+  * Documentation updates
+  * Significantly improves RBD locking behaviour in several situations, eliminating cold-cluster start issues and failed VM boot-ups after crashes
+  * Fixes some timeout delays with fencing
+  * Fixes bug in validating YAML provisioner userdata
+
 #### v0.9.9

  * Adds documentation updates
--- a/docs/manuals/ansible.md
+++ b/docs/manuals/ansible.md
@ -6,7 +6,7 @@ The PVC Ansible setup and management framework is written in Ansible. It consist

 The Base role configures a node to a specific, standard base Debian system, with a number of PVC-specific tweaks. Some examples include:

-* Installing the custom PVC repository at Boniface Labs.
+* Installing the custom PVC repository hosted at Boniface Labs.

 * Removing several unnecessary packages and installing numerous additional packages.

@ -22,6 +22,8 @@ The Base role configures a node to a specific, standard base Debian system, with

 The end result is a standardized "PVC node" system ready to have the daemons installed by the PVC role.

+The Base role is optional: if an administrator so chooses, they can bypass this role and configure things manually. That said, for the proper functioning of the PVC role, the Base role should always be applied first.
+
 ## PVC role

 The PVC role configures all the dependencies of PVC, including storage, networking, and databases, then installs the PVC daemon itself. Specifically, it will, in order:
@ -30,21 +32,19 @@ The PVC role configures all the dependencies of PVC, including storage, networki

 * Install, configure, and if `bootstrap=yes` is set, bootstrap a Zookeeper cluster (coordinators only).

-* Install, configure, and if `bootstrap=yes` is set`, bootstrap a Patroni PostgreSQL cluster for the PowerDNS aggregator (coordinators only).
+* Install, configure, and if `bootstrap=yes` is set, bootstrap a Patroni PostgreSQL cluster for the PowerDNS aggregator (coordinators only).

 * Install and configure Libvirt.

 * Install and configure FRRouting.

-* Install and configure the main PVC daemon and API client, including initializing the PVC cluster (`pvc task init`).
+* Install and configure the main PVC daemon and API client.
+
+* If `bootstrap=yes` is set, initialize the PVC cluster (`pvc task init`).

 ## Completion

-Once the entire playbook has run for the first time against a given host, the host will be rebooted to apply all the configured services. On startup, the system should immediately launch the PVC daemon, check in to the Zookeeper cluster, and become ready. The node will be in `flushed` state on its first boot; the administrator will need to run `pvc node unflush <node>` to set the node into active state ready to handle virtual machines.
-
-# PVC Ansible configuration manual
-
-This manual documents the various `group_vars` configuration options for the `pvc-ansible` framework. We assume that the administrator is generally familiar with Ansible and its operation.
+Once the entire playbook has run for the first time against a given host, the host will be rebooted to apply all the configured services. On startup, the system should immediately launch the PVC daemon, check in to the Zookeeper cluster, and become ready. The node will be in `flushed` state on its first boot; the administrator will need to run `pvc node unflush <node>` to set the node into active state ready to handle virtual machines. On the first bootstrap run, the administrator will also have to configure storage block devices (OSDs), networks, etc. For full details, see [the main getting started page](/getting-started).

 ## General usage

@ -62,7 +62,7 @@ Create a `group_vars/<cluster>` folder to hold the cluster configuration variabl

 ### Bootstrapping a cluster

-Before bootstrapping a cluster, see the section on [PVC Ansible configuration variables](/manuals/ansible#pvc-ansible-configuration-variables) to configure the cluster.
+Before bootstrapping a cluster, see the section on [PVC Ansible configuration variables](/manuals/ansible/#pvc-ansible-configuration-variables) to configure the cluster.

 Bootstrapping a cluster can be done using the main `pvc.yml` playbook. Generally, a bootstrap run should be limited to the coordinators of the cluster to avoid potential race conditions or strange bootstrap behaviour. The special variable `bootstrap=yes` must be set to indicate that a cluster bootstrap is to be requested.

@ -74,7 +74,13 @@ Adding new nodes to an existing cluster can be done using the main `pvc.yml` pla

 ### Reconfiguration and software updates

-After modifying configuration settings in the `group_vars`, or to update PVC to the latest version on a release, deployment of updated cluster can be done using the main `pvc.yml` playbook. The configuration should be updated if required, then the playbook run against all hosts in the cluster with no special flags or limits.
+For general, day-to-day software updates such as base system updates or upgrading to newer PVC versions, a special playbook, `oneshot/update-pvc-cluster.yml`, is provided. This playbook will gracefully update and upgrade all PVC nodes in the cluster, flush them, reboot them, and then unflush them. This operation should be completely transparent to VMs on the cluster.
+
+For more advanced updates, such as changing configurations in the `group_vars`, the main `pvc.yml` playbook can be used to deploy the changes across all hosts. Note that this may cause downtime due to node reboots if certain configurations change, and it is not recommended to use this process frequently.
+
+# PVC Ansible configuration manual
+
+This manual documents the various `group_vars` configuration options for the `pvc-ansible` framework. We assume that the administrator is generally familiar with Ansible and its operation.

 ## PVC Ansible configuration variables

@ -96,10 +102,14 @@ Example configuration:

 ```
 ---
+cluster_group: mycluster
+timezone_location: Canada/Eastern
 local_domain: upstream.local
+
 username_ipmi_host: "pvc"
 passwd_ipmi_host: "MyPassword2019"

+passwd_root: MySuperSecretPassword   # Not actually used by the playbook, but good for reference
 passwdhash_root: "$6$shadowencryptedpassword"

 logrotate_keepcount: 7
@ -118,13 +128,19 @@ admin_users:
      - "ssh-ed25519 MyKey 2019-06"

 networks:
-  "upstream":
+  "bondU":
    device: "bondU"
    type: "bond"
    bond_mode: "802.3ad"
    bond_devices:
      - "enp1s0f0"
      - "enp1s0f1"
+    mtu: 9000
+
+  "upstream":
+    device: "vlan1000"
+    type: "vlan"
+    raw_device: "bondU"
    mtu: 1500
    domain: "{{ local_domain }}"
    subnet: "192.168.100.0/24"
@ -144,12 +160,24 @@ networks:
    device: "vlan1002"
    type: "vlan"
    raw_device: "bondU"
-    mtu: 1500
+    mtu: 9000
    domain: "pvc-storage.local"
    subnet: "10.0.1.0/24"
    floating_ip: "10.0.1.254/24"
 ```

+#### `cluster_group`
+
+* *required*
+
+The name of the Ansible PVC cluster group in the `hosts` inventory.
+
+#### `timezone_location`
+
+* *required*
+
+The TZ database format name of the local timezone, e.g. `America/Toronto` or `Canada/Eastern`.
+
 #### `local_domain`

 * *required*
@ -172,6 +200,12 @@ The IPMI password, in plain text, used by PVC to communicate with the node manag

 Generate using `pwgen -s 16` and adjusting length as required.

+#### `passwd_root`
+
+* *ignored*
+
+Used only for reference, the plain-text root password for `passwdhash_root`.
+
 #### `passwdhash_root`

 * *required*
@ -240,9 +274,13 @@ A list of SSH public key strings, in `authorized_keys` line format, for the user

 * *required*

-A dictionary of networks to configure on the nodes. Three networks are required by all PVC clusters, though additional networks may be configured here as well.
+A dictionary of networks to configure on the nodes.

-The three required networks are: `upstream`, `cluster`, `storage`.
+The key will be used to "name" the interface file under `/etc/network/interfaces.d`, but otherwise the `device` is the real name of the device (e.g. `iface [device] inet ...`.
+
+The three required networks are: `upstream`, `cluster`, `storage`. If `storage` is configured identically to `cluster`, the two networks will be collapsed into one; for details on this, please see the [documentation about the storage network](/cluster-architecture/#storage-connecting-ceph-daemons-with-each-other-and-with-osds).
+
+Additional networks can also be specified here to automate their configuration. In the above example, a "bondU" interface is configured, which the remaining required networks use as their `raw_device`.

 Within each `network` element, the following options may be specified:

@ -250,7 +288,7 @@ Within each `network` element, the following options may be specified:

 * *required*

-The network device name.
+The real network device name.

 ##### `type`

@ -321,18 +359,33 @@ pvc_log_keepalive_cluster_details: True
 pvc_log_keepalive_storage_details: True
 pvc_log_console_lines: 1000

+pvc_vm_shutdown_timeout: 180
+pvc_keepalive_interval: 5
+pvc_fence_intervals: 6
+pvc_suicide_intervals: 0
+pvc_fence_successful_action: migrate
+pvc_fence_failed_action: None
+
+pvc_osd_memory_limit: 4294967296
+pvc_zookeeper_heap_limit: 256M
+pvc_zookeeper_stack_limit: 512M
+
 pvc_api_listen_address: "0.0.0.0"
 pvc_api_listen_port: "7370"
-pvc_api_enable_authentication: False
 pvc_api_secret_key: ""
+
+pvc_api_enable_authentication: False
 pvc_api_tokens:
  - description: "myuser"
    token: ""
+
 pvc_api_enable_ssl: False
+pvc_api_ssl_cert_path: /etc/ssl/pvc/cert.pem
 pvc_api_ssl_cert: >
  -----BEGIN CERTIFICATE-----
  MIIxxx
  -----END CERTIFICATE-----
+pvc_api_ssl_key_path: /etc/ssl/pvc/key.pem
 pvc_api_ssl_key: >
  -----BEGIN PRIVATE KEY-----
  MIIxxx
@ -343,6 +396,9 @@ pvc_ceph_storage_secret_uuid: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
 pvc_dns_database_name: "pvcdns"
 pvc_dns_database_user: "pvcdns"
 pvc_dns_database_password: "xxxxxxxx"
+pvc_api_database_name: "pvcapi"
+pvc_api_database_user: "pcapi"
+pvc_api_database_password: "xxxxxxxx"
 pvc_replication_database_user: "replicator"
 pvc_replication_database_password: "xxxxxxxx"
 pvc_superuser_database_user: "postgres"
@ -393,6 +449,8 @@ pvc_nodes:
    ipmi_user: "{{ username_ipmi_host }}"
    ipmi_password: "{{ passwd_ipmi_host }}"

+pvc_bridge_device: bondU
+
 pvc_upstream_device: "{{ networks['upstream']['device'] }}"
 pvc_upstream_mtu: "{{ networks['upstream']['mtu'] }}"
 pvc_upstream_domain: "{{ networks['upstream']['domain'] }}"
@ -413,19 +471,23 @@ pvc_storage_floatingip: "{{ networks['storage']['floating_ip'] }}"

 #### `pvc_log_to_file`

-* *required*
+* *optional*

 Whether to log PVC output to the file `/var/log/pvc/pvc.log`. Must be one of, unquoted: `True`, `False`.

+If unset, a default value of "False" is set in the role defaults.
+
 #### `pvc_log_to_stdout`

-* *required*
+* *optional*

 Whether to log PVC output to stdout, i.e. `journald`. Must be one of, unquoted: `True`, `False`.

+If unset, a default value of "True" is set in the role defaults.
+
 #### `pvc_log_colours`

-* *required*
+* *optional*

 Whether to include ANSI coloured prompts (`>>>`) for status in the log output. Must be one of, unquoted: `True`, `False`.

@ -433,39 +495,153 @@ Requires `journalctl -o cat` or file logging in order to be visible and useful.

 If set to False, the prompts will instead be text values.

+If unset, a default value of "True" is set in the role defaults.
+
 #### `pvc_log_dates`

-* *required*
+* *optional*

 Whether to include dates in the log output. Must be one of, unquoted: `True`, `False`.

 Requires `journalctl -o cat` or file logging in order to be visible and useful (and not clutter the logs with duplicate dates).

+If unset, a default value of "False" is set in the role defaults.
+
 #### `pvc_log_keepalives`

-* *required*
+* *optional*

-Whether to log keepalive messages. Must be one of, unquoted: `True`, `False`.
+Whether to log the regular keepalive messages. Must be one of, unquoted: `True`, `False`.
+
+If unset, a default value of "True" is set in the role defaults.

 #### `pvc_log_keepalive_cluster_details`

-* *required*
+* *optional*
 * *ignored* if `pvc_log_keepalives` is `False`

 Whether to log cluster and node details during keepalive messages. Must be one of, unquoted: `True`, `False`.

+If unset, a default value of "True" is set in the role defaults.
+
 #### `pvc_log_keepalive_storage_details`

-* *required*
+* *optional*
 * *ignored* if `pvc_log_keepalives` is `False`

 Whether to log storage cluster details during keepalive messages. Must be one of, unquoted: `True`, `False`.

+If unset, a default value of "True" is set in the role defaults.
+
 #### `pvc_log_console_lines`

-* *required*
+* *optional*

-The number of output console lines to log for each VM.
+The number of output console lines to log for each VM, to be used by the console log endpoints (`pvc vm log`).
+
+If unset, a default value of "1000" is set in the role defaults.
+
+#### `pvc_vm_shutdown_timeout`
+
+* *optional*
+
+The number of seconds to wait for a VM to `shutdown` before it is forced off.
+
+A value of "0" disables this functionality.
+
+If unset, a default value of "180" is set in the role defaults.
+
+#### `pvc_keepalive_interval`
+
+* *optional*
+
+The number of seconds between node keepalives.
+
+If unset, a default value of "5" is set in the role defaults.
+
+**WARNING**: Changing this value is not recommended except in exceptional circumstances.
+
+#### `pvc_fence_intervals`
+
+* *optional*
+
+The number of keepalive intervals to be missed before other nodes consider a node `dead` and trigger the fencing process. The total time elapsed will be `pvc_keepalive_interval * pvc_fence_intervals`.
+
+If unset, a default value of "6" is set in the role defaults.
+
+**NOTE**: This is not the total time until a node is fenced. A node has a further 6 (hardcoded) `pvc_keepalive_interval`s ("saving throw" attepmts) to try to send a keepalive before it is actually fenced. Thus, with the default values, this works out to a total of 60 +/- 5 seconds between a node crashing, and it being fenced. An administrator of a very important cluster may want to set this lower, perhaps to 2, or even 1, leaving only the "saving throws", though this is not recommended for most clusters, due to timing overhead from various other subsystems.
+
+#### `pvc_suicide intervals`
+
+* *optional*
+
+The number of keepalive intervals without the ability to send a keepalive before a node considers *itself* to be dead and reboots itself.
+
+A value of "0" disables this functionality.
+
+If unset, a default value of "0" is set in the role defaults.
+
+**WARNING**: This option is provided to allow additional flexibility in fencing behaviour. Normally, it is not safe to set a `pvc_fence_failed_action` of `migrate`, since if the other nodes cannot fence a node its VMs cannot be safely started on other nodes. This would also apply to nodes without IPMI-over-LAN which could not be fenced normally. This option provides an alternative way to guarantee this safety, at least in situations where the node can still reliably shut itself down (i.e. it is not hard-locked). The administrator should however take special care and thoroughly test their system before using these alternative fencing options in production, as the results could be disasterous.
+
+#### `pvc_fence_successful_action`
+
+* *optional*
+
+The action the cluster should take upon a successful node fence with respect to running VMs.  Must be one of, unquoted: `migrate`, `None`.
+
+If unset, a default value of "migrate" is set in the role defaults.
+
+An administrator can set the value "None" to disable automatic VM recovery migrations after a node fence.
+
+#### `pvc_fence_failed_action`
+
+* *optional*
+
+The action the cluster should take upon a failed node fence with respect to running VMs. Must be one of, unquoted: `migrate`, `None`.
+
+If unset, a default value of "None" is set in the role defaults.
+
+**WARNING**: See the warning in the above `pvc_suicide_intervals` section for details on the purpose of this option. Do not set this option to "migrate" unless you have also set `pvc_suicide_intervals` to a non-"0" value and understand the caveats and risks.
+
+#### `pvc_fence_migrate_target_selector`
+
+* *optional*
+
+The migration selector to use when running a `migrate` command after a node fence. Must be one of, unquoted: `mem`, `load`, `vcpu`, `vms`.
+
+If unset, a default value of "mem" is set in the role defaults.
+
+**NOTE**: These values map to the standard VM meta `selector` options, and determine how nodes select where to run the migrated VMs.
+
+#### `pvc_osd_memory_limit`
+
+* *optional*
+
+The memory limit, in bytes, to pass to the Ceph OSD processes. Only set once, during cluster bootstrap; subsequent changes to this value must be manually made in the `files/*/ceph.conf` static configuration for the cluster in question.
+
+If unset, a default value of "4294967296" (i.e. 4GB) is set in the role defaults.
+
+As per Ceph documentation, the minimum value possible is "939524096" (i.e. ~1GB), and the default matches the Ceph system default. Setting a lower value is only recommended for systems with relatively low memory availability, where the default of 4GB per OSD is too large; it is recommended to increase the total system memory first before tweaking this setting to ensure optimal storage performance across all workloads.
+
+#### `pvc_zookeeper_heap_limit`
+
+* *optional*
+
+The memory limit to pass to the Zookeeper Java process for its heap.
+
+If unset, a default vlue of "256M" is set in the role defaults.
+
+The administrator may set this to a lower value on memory-constrained systems or if the memory usage of the Zookeeper process becomes excessive.
+
+#### `pvc_zookeeper_stack_limit`
+
+* *optional*
+
+The memory limit to pass to the Zookeeper Java process for its stack.
+
+If unset, a defautl value of "512M" is set in the role defaults.
+
+The administrator may set this to a lower value on memory-constrained systems or if the memory usage of the Zookeeper process becomes excessive.

 #### `pvc_api_listen_address`

@ -519,17 +695,33 @@ Generate using `uuidgen` or `pwgen -s 32` and adjusting length as required.

 Whether to enable SSL for the PVC API. Must be one of, unquoted: `True`, `False`.

+#### `pvc_api_ssl_cert_path`
+
+* *optional* 
+* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_cert` is not set.
+
+The path to an (existing) SSL certificate on the node system for the PVC API to use.
+
 #### `pvc_api_ssl_cert`

-* *required* if `pvc_api_enable_ssl` is `True`
+* *optional*
+* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_cert_path` is not set.

-The SSL certificate, in text form, for the PVC API to use.
+The SSL certificate, in text form, for the PVC API to use. Will be installed to `/etc/pvc/api-cert.pem` on the node system.
+
+#### `pc_api_ssl_key_path`
+
+* *optional*
+* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_key` is not set.
+
+The path to an (existing) SSL private key on the node system for the PVC API to use.

 #### `pvc_api_ssl_key`

-* *required* if `pvc_api_enable_ssl` is `True`
+* *optional*
+* *required* if `pvc_api_enable_ssl` is `True` and `pvc_api_ssl_key_path` is not set.

-The SSL private key, in text form, for the PVC API to use.
+The SSL private key, in text form, for the PVC API to use. Will be installed to `/etc/pvc/api-key.pem` on the node system.

 #### `pvc_ceph_storage_secret_uuid`

@ -559,6 +751,26 @@ The password of the PVC DNS aggregator database user.

 Generate using `pwgen -s 16` and adjusting length as required.

+#### `pvc_api_database_name`
+
+* *required*
+
+The name of the PVC API database.
+
+#### `pvc_api_database_user`
+
+* *required*
+
+The username of the PVC API database user.
+
+#### `pvc_api_database_password`
+
+* *required*
+
+The password of the PVC API database user.
+
+Generate using `pwgen -s 16` and adjusting length as required.
+
 #### `pvc_replication_database_user`

 * *required*
@ -589,10 +801,12 @@ Generate using `pwgen -s 16` and adjusting length as required.

 #### `pvc_asn`

-* *required*
+* *optional*

 The private autonomous system number used for BGP updates to upstream routers.

+A default value of "65001" is set in the role defaults if left unset.
+
 #### `pvc_routers`

 A list of upstream routers to communicate BGP routes to.
@ -681,6 +895,12 @@ The IPMI username for the node management controller. Unless a per-host override

 The IPMI password for the node management controller. Unless a per-host override is required, should usually use the previously-configured global `passwordname_ipmi_host`. All notes from that entry apply.

+#### `pvc_bridge_device`
+
+* *required*
+
+The device name of the underlying network interface to be used for "bridged"-type client networks. For each "bridged"-type network, an IEEE 802.3q vLAN and bridge will be created on top of this device to pass these networks. In most cases, using the reflexive `networks['cluster']['raw_device']` or `networks['upstream']['raw_device']` from the Base role is sufficient.
+
 #### `pvc_<network>_*`

 The next set of entries is hard-coded to use the values from the global `networks` list. It should not need to be changed under most circumstances. Refer to the previous sections for specific notes about each entry.
--- a/docs/manuals/provisioner.md
+++ b/docs/manuals/provisioner.md
@ -1,72 +1,89 @@
-# PVC Provisioner manual
+# PVC Provisioner Manual

-The PVC provisioner is a subsection of the main PVC API. IT interfaces directly with the Zookeeper database using the common client functions, and with the Patroni PostgreSQL database to store details. The provisioner also interfaces directly with the Ceph storage cluster, for mapping volumes, creating filesystems, and installing guests.
+The PVC provisioner is a subsection of the main PVC API. It interfaces directly with the Zookeeper database using the common client functions, and with the Patroni PostgreSQL database to store details. The provisioner also interfaces directly with the Ceph storage cluster, for mapping volumes, creating filesystems, and installing guests.

 Details of the Provisioner API interface can be found in [the API manual](/manuals/api).

+- [PVC Provisioner Manual](#pvc-provisioner-manual)
+  * [Overview](#overview)
+  * [PVC Provisioner concepts](#pvc-provisioner-concepts)
+    + [Templates](#templates)
+    + [Userdata](#cloud-init-userdata)
+    + [Scripts](#provisioning-scripts)
+    + [Profiles](#profiles)
+  * [Deploying VMs from provisioner scripts](#deploying-vms-from-provisioner-scripts)
+  * [Deploying VMs from OVA images](#deploying-vms-from-ova-images)
+    + [Uploading an OVA](#uploading-an-ova)
+    + [OVA limitations](#ova-limitations)
+
 ## Overview

 The purpose of the Provisioner API is to provide a convenient way for administrators to automate the creation of new virtual machines on the PVC cluster.

-The Provisioner allows the administrator to constuct descriptions of VMs, called profiles, which include system resource specifications, network interfaces, disks, cloud-init userdata, and installation scripts. These profiles are highly modular, allowing the administrator to specify arbitrary combinations of the mentioned VM features with which to build new VMs.
+The Provisioner allows the administrator to construct descriptions of VMs, called profiles, which include system resource specifications, network interfaces, disks, cloud-init userdata, and installation scripts. These profiles are highly modular, allowing the administrator to specify arbitrary combinations of the mentioned VM features with which to build new VMs.

 The provisioner supports creating VMs based off of installation scripts, by cloning existing volumes, and by uploading OVA image templates to the cluster.

 Examples in the following sections use the CLI exclusively for demonstration purposes. For details of the underlying API calls, please see the [API interface reference](/manuals/api-reference.html).

-# Deploying VMs from OVA images
+Use of the PVC Provisioner is not required. Administrators can always perform their own installation tasks, and the provisioner is not specially integrated, calling various other API commands as though they were run from the CLI or API.

-PVC supports deploying virtual machines from industry-standard OVA images. OVA images can be uploaded to the cluster with the `pvc provisioner ova` commands, and deployed via the created profile(s) using the `pvc provisioner create` command. Additionally, the profile(s) can be modified to suite your specific needs via the provisioner template system detailed below.
+# PVC Provisioner concepts

-# Deploying VMs from provisioner scripts
-
-PVC supports deploying virtual machines using administrator-provided scripts, using templates, profiles, and Cloud-init userdata to control the deployment process as desired. This deployment method permits the administrator to deploy POSIX-like systems such as Linux or BSD directly from a companion tool such as `debootstrap` on-demand and with maximum flexibility.
+Before explaining how to create VMs using either OVA images or installer scripts, we must discuss the concepts used to construct the PVC provisioner system.

 ## Templates

-The PVC Provisioner features three categories of templates to specify the resources allocated to the virtual machine. They are: System Templates, Network Templates, and Disk Templates.
+Templates are the building blocks of VMs. Each template type specifies part of the configuration of a VM, and when combined together later into profiles, provide a full description of the VM resources.
+
+Templates are used to provide flexibility for the administrator. For instance, one could specify some standard core resources for different VMs, but then specify a different set of storage devices and networks for each one. This flexibility is at the heart of this system, allowing the administrator to construct a complex set of VM configurations from a few basic templates.
+
+The PVC Provisioner features three types of templates: System Templates, Network Templates, and Disk Templates.

 ### System Templates

-System templates specify the basic resources of the virtual machine: vCPUs, memory, and configuration metadata (e.g. serial/VNC consoles, migration methods, node limits, etc.). Each profile requires a single system template.
+System templates specify the basic resources of the virtual machine: vCPUs, memory, serial/VNC consoles, and PVC configuration metadata (migration methods, node limits, etc.). Each profile requires a single system template.

-The simplest templates will specify a number of vCPUs and the amount of vRAM; additional details can be specified if required.
+The simplest valid template will specify a number of vCPUs and an amount of vRAM; additional details are optional and can be specified if required.

-Serial consoles permit the use of the `pvc vm log` functionality via console logfiles in `/var/log/libvirt`.
+Serial consoles are required to make use of the `pvc vm log` functionality, via console logfiles in `/var/log/libvirt` on the nodes. VMs without a serial console show an empty log. Note that the guest operating system must also be configured to provide output to this serial console for this functionality to work as expected.

 VNC consoles permit graphical access to the VM. By default, the VNC interface listens only on 127.0.0.1 on its parent node; the VNC bind configuration can override this to listen on other interfaces, including `0.0.0.0` for all.

+PVC does not currently support SPICE or any other non-VNC consoles.
+
+#### Examples
+
 ```
 $ pvc provisioner template system list
 Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"

 System templates:

-Name        ID  vCPUs  vRAM [MB]  Consoles: Serial  VNC    VNC bind   Metadata: Limit     Selector    Autostart  
-ext-lg      80  4      8192                 False   False  None                 None      None        False
-ext-lg-ser  81  4      8192                 True    False  None                 None      None        False
-ext-lg-vnc  82  4      8192                 False   True   0.0.0.0              None      None        False
-ext-sm-lim  83  1      1024                 True    False  None                 pvchv1    mem         True
+Name        ID  vCPUs  vRAM [MB]  Consoles: Serial  VNC    VNC bind   Metadata: Limit           Selector    Autostart  Migration
+ext-lg      80  4      8192                 False   False  None                 None            None        False      None
+ext-lg-ser  81  4      8192                 True    False  None                 None            None        False      None
+ext-lg-vnc  82  4      8192                 False   True   0.0.0.0              None            None        False      None
+ext-sm-lim  83  1      1024                 True    False  None                 pvchv1,pvchv2   mem         True       live
 ```

+* The first example specifies a template with 4 vCPUs and 8GB of RAM. It has no serial or VNC consoles, and no non-default metadata, forming the most basic possible system template.
+
+* The second example specifies a template with the same vCPU and RAM quantities as the first, but with a serial console as well. VMs using this template will be able to make use of `pvc vm log` as long as their guest operating system is configured to use it.
+
+* The third example specifies a template with an alternate console to the second, in this case a VNC console bound to `0.0.0.0` (all interfaces). VNC ports are always auto-selected due to the dynamic nature of PVC, and the administrator can connect to them once the VM is running by determining the port on the hosting hypervisor (e.g. with `netstat -tl`).
+
+* The fourth example shows the ability to set PVC cluster metadata in a system template. VMs with this template will be forcibly limited to running on the hypervisors `pvchv1` and `pvchv2`, but no others, will explicitly use the `mem` (free memory) selector when choosing migration or deployment targets, will be set to automatically start on reboot of its hypervisor, and will be limited to live migration between nodes. For full details on what these options mean, see `pvc vm meta -h`.
+
 ### Network Templates

-Network template specify which PVC networks the virtual machine is bound to, as well as the method used to calculate MAC addresses for VM interfaces. Networks are specified by their VNI ID within PVC.
+Network template specify which PVC networks the virtual machine will be bound to, as well as the method used to calculate MAC addresses for VM interfaces. Networks are specified by their VNI ID within PVC.

-A template requires at least one network VNI to be valid.
+A network template requires at least one network VNI to be valid, and is created in two stages. First, `pvc provisioner template network add` adds the template itself, along with the optional MAC template. Second, `pvc provisioner template network vni add` adds a VNI into the network template. VNIs are always shown and created in the order added; to move networks around they must be removed then re-added in the proper order; this will not affect existing VMs provisioned with the template.

-```
-$ pvc provisioner template network list
-Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"
+In some cases, it may be useful for the administrator to specify a static MAC address pattern for a set of VMs, for instance if they must get consistent DHCP reservations between rebuilds. Such a MAC address template can be specified when adding a new network template, using a standardized layout and set of interpolated variables. This is an optional feature; if no MAC template is specified, VMs will be configured with random MAC addresses for each interface at deploy time.

-Network templates:
-
-Name      ID  MAC template  Network VNIs
-ext-101   80  None          101
-ext-11X   81  None          110,1101
-```
-
-In some cases, it may be useful for the administrator to specify a static MAC address pattern for a set of VMs, for instance if they must get consistent DHCP reservations between rebuilds. Such a MAC address template can be specified when adding a new network template, using a standardized layout and set of interpolated variables. For example:
+#### Examples

 ```
 $ pvc provisioner template network list
@ -75,49 +92,67 @@ Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"
 Network templates:

 Name       ID  MAC template                  Network VNIs
-fixed-mac  82  {prefix}:XX:XX:{vmid}{netid}  1000,1001
+ext-101    80  None                          101
+ext-11X    81  None                          110,1101
+fixed-mac  82  {prefix}:ff:ff:{vmid}{netid}  1000,1001,1002
 ```

-The {prefix} variable is replaced by the provisioner with a standard prefix ("52:54:01"), which is different from the randomly-generated MAC prefix ("52:54:00") to avoid accidental overlap of MAC addresses.
+* The first example shows a simple single-VNI network with no MAC template.

-The {vmid} variable is replaced by a single hexidecimal digit representing the VM's ID, the numerical suffix portion of its name; VMs without a suffix numeral have ID 0. VMs with IDs greater than 15 (hexidecimal "f") will wrap back to 0.
+* The second example shows a dual-VNI network with no MAC template. Note the ordering; as mentioned, the first VNI will be provisioned on `eth0`, the second VNI on `eth1`, etc.

-The {netid} variable is replaced by the sequential identifier, starting at 0, of the network VNI of the interface; for example, the first interface is 0, the second is 1, etc. Like te VM ID, network IDs greater than 15 (hexidecimal "f") will wrap back to 0.
+* The third example shows a triple-VNI network with a MAC template. The variable names shown are literal, while the `f` values are user-configurable and must be set to valid hexadecimal values by the administrator to uniquely identify the MAC address (in this case, using `ff:ff` for that segment). The variables are interpolated at deploy time as follows:

-The four X digits are use-configurable. Use these digits to uniquely define the MAC address.
+    * The `{prefix}` variable is replaced by the provisioner with a standard prefix (`52:54:01`), which is different from the randomly-generated MAC prefix (`52:54:00`) to avoid accidental overlap of MAC addresses. These OUI prefixes are not assigned to any vendor by the IEEE and thus should not conflict with any (real, standards-compliant) devices on the network.

-The location of the two per-VM variables can be adjusted at the administrator's discretion, or removed if not required (e.g. a single-network template, or template for a single VM). In such situations, be careful to avoid accidental overlap with other templates' variable portions.
+    * The `{vmid}` variable is replaced by a single hexadecimal digit representing the VM's ID, the numerical suffix portion of its name (e.g. `myvm2` will have ID 2); VMs without a suffix numeral in their names have ID 0. VMs with IDs greater than 15 (hexadecimal `f`) will wrap back to 0, so a single MAC template should never be used by more than 16 VMs (numbered 0-15).
+
+    * The `{netid}` variable is replaced by a single hexadecimal digit representing the sequential identifier, starting at 0, of the interface within the template (i.e. the first interface is 0, the second is 1, etc.). Like the VM ID, network IDs greater than 15 (hexadecimal `f`) will wrap back to 0, so a single VM should never have more than 16 interfaces.
+
+    * The location of the two per-VM variables can be adjusted at the administrator's discretion, or removed if not required (e.g. a single-network template, or template for a single VM). In such situations, be careful to avoid accidental overlap with other templates' variable portions.

 ### Disk Templates

-Disk templates specify the disk layout, including filesystem and mountpoint for scripted deployments, for the VM. Disks are specified by their virtual disk ID in Libvirt, and sizes are always specified in GB. Disks may also reference other storage volumes, which will then be cloned during provisioning.
+Disk templates specify the disk layout, including filesystem and mountpoint for scripted deployments, for the VM. Disks are specified by their virtual disk ID in Libvirt, in either `sdX` or `vdX` format, and sizes are always specified in GB. Disks may also reference other storage volumes, which will then be cloned during provisioning.

 For additional flexibility, the volume filesystem and mountpoint are optional; such volumes will be created and attached to the VM but will not be modified during provisioning.

+All storage volumes created by the provisioner at deploy time, regardless of source or type, will be named in the format `<vmname>_<id>`, for instance `myvm_sda`.
+
+#### Examples
+
 ```
 $ pvc provisioner template storage list
 Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"

 Storage templates:

-Name           ID  Disk ID  Pool     Source Volume  Size [GB]  Filesystem  Arguments  Mountpoint
+Name           ID  Disk ID  Pool  Source Volume  Size [GB]  Filesystem  Arguments  Mountpoint
 standard-ext4  21
-                   sda      blsevm   None           2          ext4        -L=root    /
-                   sdb      blsevm   None           4          ext4        -L=var     /var
-                   sdc      blsevm   None           4          ext4        -L=log     /var/log
+                   sda      vms   None           2          ext4        -L=root    /
+                   sdb      vms   None           4          ext4        -L=var     /var
+                   sdc      vms   None           4          ext4        -L=log     /var/log
 large-cloned   22
-                   sda      blsevm   template_sda   None       None        None       None
-                   sdb      blsevm   None           40         None        None       None
+                   sda      vms   template_sda   None       None        None       None
+                   sdb      vms   None           40         None        None       None
 ```

+* The first example shows a volume with a simple 3-disk layout suitable for most Linux distributions. Each volume is in pool `vms`, with an `ext4` filesystem, an argument specifying a disk label, and a mountpoint to which the volume will be mounted when deploying the VM. All 3 volumes will be created at deploy time. When deploying VMs using Scripts detailed below, this is the normal format that storage templates should take to ensure that all block devices are formatted and mounted in the proper place for the script to take over and install the operating system to them.
+
+* The second example shows both a cloned volume and a blank volume. At deploy time, the Source Volume for the `sda` device will be cloned and attached to the VM at `sda`. The second volume will be created at deploy time, but will not be formatted or mounted, and will thus show as an empty block device inside the VM. This type of storage template is more suited to devices that do not use the Script install method, and are instead cloned from a source volume, either another running VM, or a manually-uploaded disk image.
+
+* Unformatted block devices as shown in the second example can be used in any type of storage template, though care should be taken to consider their purpose; unformatted block devices are completely ignored by the Script at deploy time.
+
 ## Cloud-Init Userdata

-PVC allows the sending of arbitrary cloud-init userdata to VMs on bootup. It uses an Amazon AWS EC2-style metadata service to delivery basic VM information and this userdata to the VMs, based dynamically on the assigned profile of the VM at boot time.
+PVC allows the sending of arbitrary cloud-init userdata to VMs on boot-up. It uses an Amazon AWS EC2-style metadata service, listening at the link-local IP `169.254.169.254` on port `80`, to delivery basic VM information and this userdata to the VMs. The metadata to be sent is based dynamically on the assigned profile of the VM at boot time.

-Both single-function and multipart cloud-init userdata is supported. Examples can be found at `/usr/share/pvc/provisioner/examples` on a PVC node.
+Both single-function and multipart cloud-init userdata is supported. Full examples can be found under `/usr/share/pvc/provisioner/examples` on any PVC coordinator node.

 The default userdata document "empty" can be used to skip userdata for a profile.

+#### Examples
+
 ```
 $ pvc provisioner userdata list
 Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"
@ -131,15 +166,53 @@ basic-ssh   11  Content-Type: text/cloud-config; charset="us-ascii"
                [...]
 ```

+* The first example is the default, always-present `empty` document, which is sent to invalid VMs if requested, or can be configured explicitly for profiles that do not require cloud-init userdata, instead of leaving that section of the profile as `None`.
+
+* The second, truncated, example is the start of a normal single-function userdata document. For full details on the contents of these documents, see the cloud-init documentation.
+
 ## Provisioning Scripts

-The PVC provisioner provides a scripting framework in order to automate VM installation. This is generally the most useful with UNIX-like systems which can be installed over the network via shell scripts. For instance, the script might install a Debian VM using `debootstrap`.
+The PVC provisioner provides a scripting framework in order to automate VM installation. This is generally the most useful with UNIX-like systems which can be installed over the network via shell scripts. For instance, the script might install a Debian VM using `debootstrap` or a Red Hat VM using `rpmstrap`. The PVC Ansible system will automatically install `debootstrap` on coordinator nodes, to allow out-of-the-box deployment of Debian-based VMs with `debootstrap` and the example script shipped with PVC (see below); any other deployment tool must be installed separately onto all PVC coordinator nodes, or installed by the script itself (e.g. using `os.system('apt-get install ...')`, `requests` to download a script, etc.).

-Provisioner scripts are written in Python 3 and are called in a standardized way during the provisioning sequence. A single function called `install` is called during the provisioning sequence to perform OS installation and basic configuration.
+Provisioner scripts are written in Python 3 and are called in a standardized way during the provisioning sequence. A single function called `install` is called during the provisioning sequence to perform arbitrary tasks. At execution time, the script is passed several default keyword arguments detailed below, and can also be passed arbitrary arguments defined either in the provisioner profile, or on the `provisioner create` CLI.

-*A WARNING*: It's important to remember that these provisioning scripts will run with the same privileges as the provisioner API daemon (usually root) on the system running the daemon. THIS MAY POSE A SECURITY RISK. However, the intent is that administrators of the cluster are the only ones allowed to specify these scripts, and that they check them thoroughly when adding them to the system as well as limit access to the provisioning API to trusted sources. If neither of these conditions are possible, for instance if arbitrary users must specify custom scripts without administrator oversight, then the PVC provisoner may not be ideal.
+A full example script to perform a `debootstrap` Debian installation can be found under `/usr/share/pvc/provisioner/examples` on any PVC coordinator node.

-The default script "empty" can be used to skip scripted installation for a profile. Additionally, profiles with no valid disk mountpoints skip scripted installation.
+The default script "empty" can be used to skip scripted installation for a profile. Additionally, profiles with no disk mountpoints (and specifically, no root `/` mountpoint) will skip scripted installation.
+
+**WARNING**: It is important to remember that these provisioning scripts will run with the same privileges as the provisioner API daemon (usually root) on the system running the daemon. THIS MAY POSE A SECURITY RISK. However, the intent is that administrators of the cluster are the only ones allowed to specify these scripts, and that they check them thoroughly when adding them to the system, as well as limit access to the provisioning API to trusted sources. If neither of these conditions are possible, for instance if arbitrary users must specify custom scripts without administrator oversight, then the PVC provisioner script system may not be ideal.
+
+**NOTE**: It is often required to perform a `chroot` to perform some aspects of the install process. The PVC script fully supports this, though it is relatively complicated. The example script details how to achieve this.
+
+#### The `install` function
+
+The `install` function is the main entrypoint for a provisioning script, and is the only part of the script that is explicitly called. The provisioner calls this function after setting up the temporary install directory and mounting the volumes. Thus, this script can then perform any sort of tasks required in the VM to install it, and then finishes, after which the main provisioner resumes control to unmount the volumes and finish the VM creation.
+
+It is good practice in these scripts to "fail through", since terminating the script abruptly would affect the entire provisioning flow and thus may leave the half-provisioned VM in an undefined state. Care should be taken to `try`/`catch` possible errors, and attempt to finish the script execution (or `return`) even if some aspect fails.
+
+This function is passed a number of keyword arguments that it can then use during installation. These include those specified by the administrator in the profile, on the CLI at deploy time, as well as a number of default arguments:
+
+##### `vm_name`
+
+The `vm_name` keyword argument contains the name of the new VM from PVC's perspective.
+
+##### `vm_id`
+
+The `vm_id` keyword argument contains the VM identifier (the last numeral of the VM name, or `0` for a VM that does not end in a numeral).
+
+##### `temporary_directory`
+
+The `temporary_directory` keyword argument contains the path to the temporary directory on which the new VM's disks are mounted. The function *must* perform any installation steps to/under this directory.
+
+##### `disks`
+
+The `disks` keyword argument contains a Python list of the configured disks, as dictionaries of values as specified in the Disk template. The function may use these values as appropriate, for instance to specify an `/etc/fstab`.
+
+##### `networks`
+
+The `networks` keyword argument contains a Python list of the configured networks, as dictionaries of values as specified in the Network template. The function may use these values as appropriate, for instance to write an `/etc/network/interfaces` file.
+
+#### Examples

 ```
 $ pvc provisioner script list
@ -149,43 +222,23 @@ Name         ID  Script
 empty        1
 debootstrap  2   #!/usr/bin/env python3

-                 # debootstrap_script.py - PVC Provisioner example script for Debootstrap
-                 # Part of the Parallel Virtual Cluster (PVC) system
+                 def install(**kwargs):
+                     vm_name = kwargs['vm_name']
                 [...]
 ```

-### `install` function
+* The first example is the default, always-present `empty` document, which is used if the VM does not specify a valid root mountpoint, or can be configured explicitly for profiles that do not require scripts, instead of leaving that section of the profile as `None`.

-The `install` function is the main entrypoing for a provisioning script, and is the only part of the script that is explicitly called. The provisioner calls this function after setting up the temporary install directory and mounting the volumes. Thus, this script can then perform any sort of tasks required in the VM to install it, and then finishes.
-
-This function is passed a number of keyword arguments that it can then use during installation. These include those specified by the administrator in the profile, as well as a number of default arguments:
-
-###### `vm_name`
-
-The `vm_name` keyword argument contains the full name of the new VM from PVC's perspective.
-
-###### `vm_id`
-
-The `vm_id` keyword argument contains the VM identifier (the last numeral of the VM name, or `0` for a VM that does not end in a numeral).
-
-###### `temporary_directory`
-
-The `temporary_directory` keyword argument contains the path to the temporary directory on which the new VM's disks are mounted. The function *must* perform any installation steps to/under this directory.
-
-###### `disks`
-
-The `disks` keyword argument contains a Python list of the configured disks, as dictionaries of values as specified in the Disk template. The function *may* use these values as appropriate, for instance to specify an `/etc/fstab`.
-
-###### `networks`
-
-The `networks` keyword argument contains a Python list of the configured networks, as dictionaries of values as specified in the Network template. The function *may* use these values as appropriate, for instance to write an `/etc/network/interfaces` file.
+* The second, truncated, example is the start of a normal Python install script. The full example is provided in the folder mentioned above on all PVC coordinator nodes.

 ## Profiles

-Provisioner profiles combine the templates, userdata, and scripts together into dynamic configurations which are then applied to the VM when provisioned. The VM retains the record of this profile name in its configuration for the full lifetime of the VM on the cluster, most generally for cloud-init functionality.
+Provisioner profiles combine the templates, userdata, and scripts together into dynamic configurations which are then applied to the VM when provisioned. The VM retains the record of this profile name in its configuration for the full lifetime of the VM on the cluster; this is primarily used for cloud-init functionality, but may also serve as a convenient administrator reference.

 Additional arguments to the installation script can be specified along with the profile, to allow further customization of the installation if required.

+#### Examples
+
 ```
 $ pvc provisioner profile list
 Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"
@ -194,9 +247,9 @@ Name        ID  Templates: System      Network  Storage        Data: Userdata
 std-large   41             ext-lg-ser  ext-101  standard-ext4        basic-ssh  debootstrap  deb_release=buster
 ```

-## Creating VMs with the Provisioner
+# Deploying VMs from provisioner scripts

-Creating VMs with the provisioner requires specifying a VM name and a profile to use.
+Once a profile with a Script value is defined, creating VMs with the provisioner is as simple as specifying a VM name and a profile to use.

 ```
 $ pvc provisioner create test1 std-large
@ -228,7 +281,7 @@ af1d0682-53e8-4141-982f-f672e2f23261  active   celery@pvchv1      test1  std-lar
 43d57a2d-8d0d-42f6-90df-cc39956825a9  pending  celery@pvchv1      testX  std-large  False    False
 ```

-Additionally, the `--wait` option can be given to the create command. This will cause the command to block and providing a visual progress indicator while the provisioning occurs.
+The `--wait` option can be given to the create command. This will cause the command to block and providing a visual progress indicator while the provisioning occurs.

 ```
 $ pvc provisioner create test2 std-large
@ -243,7 +296,7 @@ Waiting for task to start..... done.
 SUCCESS: VM "test2" with profile "std-large" has been provisioned and started successfully
 ```

-The administrator can also specify whether or not to automatically define and start the VM when launching a provisioner job, using the `--define`/`--no-define` and `--start`/`--no-start` options. The default is to define and start a VM. `--no-define` implies `--no-start` as there would be no VM to start.
+The administrator can also specify whether or not to automatically define and start the VM when launching a provisioner job, using the `--define`/`--no-define` and `--start`/`--no-start` options. The default is to define and start a VM. `--no-define` implies `--no-start` as there would be no VM to start. Using `--no-start` can be useful if other tasks must be performed before starting the VM for the first time, and `--no-define` can be useful for creating "template" VMs which would then be cloned by other profiles.

 ```
 $ pvc provisioner create test3 std-large --no-define
@ -252,8 +305,49 @@ Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"
 Task ID: 43d57a2d-8d0d-42f6-90df-cc39956825a9
 ```

-A VM set to do so will be defined on the cluster early in the provisioning process, before creating disks or executing the provisioning script, and with the special status "provision". Once completed, if the VM is not set to start automatically, the state will remain "provision" (with the VM not running) until its state is explicitly changed wit the client (or via autostart when its node returns to ready state).
+Finally, the administrator may specify further, one-time script arguments at install time, to further tune the VM installation (e.g. setting an FQDN or some conditional to trigger additional steps in the script).

-Provisioning jobs are tied to the node that spawned them. If the primary node changes, provisioning jobs will continue to run against that node until they are completed or interrupted, but the active API (now on the new primary node) will not have access to any status data from these jobs, until the primary node status is returned to the original host. The CLI will warn the administrator of this if there are active jobs while running node primary or secondary commands.
+```
+$ pvc provisioner create test4 std-large --script-arg vm_fqdn=testhost.example.tld --script-arg my_foo=True
+Using cluster "local" - Host: "10.0.0.1:7370"  Scheme: "http"  Prefix: "/api/v1"

-Provisioning jobs cannot be cancelled, either before they start or during execution. The administrator should always let an invalid job either complete or fail out automatically, then remove the erroneous VM with the vm remove command.
+Task ID: 39639f8c-4866-49de-8c51-4179edec0194
+```
+
+**NOTE**: A VM that is set to do so will be defined on the cluster early in the provisioning process, before creating disks or executing the provisioning script, with the special status `provision`. Once completed, if the VM is not set to start automatically, the state will remain `provision`, with the VM not running, until its state is explicitly changed with the client (or via autostart when its node returns to `ready` state).
+
+**NOTE**: Provisioning jobs are tied to the node that spawned them. If the primary node changes, provisioning jobs will continue to run against that node until they are completed, interrupted, or fail, but the active API (now on the new primary node) will not have access to any status data from these jobs, until the primary node status is returned to the original host. The CLI will warn the administrator of this if there are active jobs while running `node primary` or `node secondary` commands.
+
+**NOTE**: Provisioning jobs cannot be cancelled, either before they start or during execution. The administrator should always let an invalid job either complete or fail out automatically, then remove the erroneous VM with the `vm remove` command.
+
+# Deploying VMs from OVA images
+
+PVC supports deploying virtual machines from industry-standard OVA images. OVA images can be uploaded to the cluster with the `pvc provisioner ova` commands, and deployed via the created profile(s) using the `pvc provisioner create` command detailed above for scripted installs; the process is the same in both cases. Additionally, the profile(s) can be modified to suite your specific needs after creation.
+
+## Uploading an OVA
+
+Once the OVA is uploaded to the cluster with the `pvc provisioner ova upload` command, it will be visible in two different places:
+
+* In `pvc provisioner ova list`, one can see all uploaded OVA images as well as details on their disk configurations.
+
+* In `pvc profile list`, a new profile will be visible which matches the OVA `NAME` from the upload. This profile will have a "Source" of `OVA <NAME>`, and a system template of the same name. This system template will contain the basic configuration of the VM. You may notice that the other templates and data are set to `N/A`. For full details on this, see the next section.
+
+## OVA limitations
+
+PVC does not implement a *complete* OVA framework. While all basic elements of the OVA are included, the following areas require special attention.
+
+### Networks
+
+Because the PVC provisioner has its own conception of networks separate from the OVA profiles, the administrator must perform this mapping themselves, by first creating a network template, and the required networks on the PVC cluster, and then modifying the profile of the resulting OVA.
+
+The provisioner profile for the OVA can be safely modified to include this new network template at any time, and the resulting VM will be provisioned with these networks.
+
+This setup was chosen specifically to eliminate corner cases. Most OVA images include a single, "default" network interface, and expect the administrator of the hypervisor to modify this later. You can of course do this, but since PVC has its own conception of networks already in the provisioner, it makes more sense to ignore what the OVA specifies, and allow the administrator full control over this portion of the VM config, before deployment. It is thus always important to be aware of the network requirements of your OVA images, especially if they require specific network configurations, and then create a network template to match.
+
+### Storage
+
+During import, PVC splits the OVA into its constituent parts, including any disk images (usually VMDK-formatted). It will then create a separate PVC storage volume for each disk image. These storage volumes are then converted at deployment time from the VMDK format to the PVC native raw format based on their included size in the OVA. Once the storage volume for an actual VM deployment is created, it can then be resized as needed.
+
+Because of this, OVA profiles do not include storage templates like other PVC profiles. A storage template can still be added to such a profile, and the block devices will be added after the main block devices. However, this is generally not recommended; it is far better to modify the OVA to add additional volume(s) before uploading it instead.
+
+**WARNING**: Never adjust the sizes of the OVA VMDK-formatted storage volumes (named `ova_<NAME>_sdX`) or remove them without removing the OVA itself in the provisioner; doing so will prevent the deployment of the OVA, specifically the conversion of the images to raw format at deploy time, and render the OVA profile useless.
--- a/docs/manuals/swagger.json
+++ b/docs/manuals/swagger.json
@ -1302,6 +1302,19 @@
                    "description": "The topology of the assigned vCPUs in Sockets/Cores/Threads format",
                    "type": "string"
                },
+                "vnc": {
+                    "properties": {
+                        "listen": {
+                            "description": "The active VNC listen address or 'None'",
+                            "type": "string"
+                        },
+                        "port": {
+                            "description": "The active VNC port or 'None'",
+                            "type": "string"
+                        }
+                    },
+                    "type": "object"
+                },
                "xml": {
                    "description": "The raw Libvirt XML definition of the VM",
                    "type": "string"
--- a/node-daemon/pvcnoded/DNSAggregatorInstance.py
+++ b/node-daemon/pvcnoded/DNSAggregatorInstance.py
@ -76,8 +76,8 @@ class PowerDNSInstance(object):
        self.dns_server_daemon = None

        # Floating upstreams
-        self.vni_ipaddr, self.vni_cidrnetmask = self.config['vni_floating_ip'].split('/')
-        self.upstream_ipaddr, self.upstream_cidrnetmask = self.config['upstream_floating_ip'].split('/')
+        self.vni_floatingipaddr, self.vni_cidrnetmask = self.config['vni_floating_ip'].split('/')
+        self.upstream_floatingipaddr, self.upstream_cidrnetmask = self.config['upstream_floating_ip'].split('/')

    def start(self):
        self.logger.out(
@ -93,7 +93,7 @@ class PowerDNSInstance(object):
            '--disable-syslog=yes',              # Log only to stdout (which is then captured)
            '--disable-axfr=no',                 # Allow AXFRs
            '--allow-axfr-ips=0.0.0.0/0',        # Allow AXFRs to anywhere
-            '--local-address={},{}'.format(self.vni_ipaddr, self.upstream_ipaddr),  # Listen on floating IPs
+            '--local-address={},{}'.format(self.vni_floatingipaddr, self.upstream_floatingipaddr),  # Listen on floating IPs
            '--local-port=53',                   # On port 53
            '--log-dns-details=on',              # Log details
            '--loglevel=3',                      # Log info
--- a/node-daemon/pvcnoded/Daemon.py
+++ b/node-daemon/pvcnoded/Daemon.py
@ -54,7 +54,7 @@ import pvcnoded.CephInstance as CephInstance
 import pvcnoded.MetadataAPIInstance as MetadataAPIInstance

 # Version string for startup output
-version = '0.9.9'
+version = '0.9.11'

 ###############################################################################
 # PVCD - node daemon startup program
@ -1105,9 +1105,9 @@ def collect_ceph_stats(queue):
            logger.out("Set pool information in zookeeper (primary only)", state='d', prefix='ceph-thread')

        # Get pool info
-        command = {"prefix": "df", "format": "json"}
+        retcode, stdout, stderr = common.run_os_command('ceph df --format json', timeout=1)
        try:
-            ceph_pool_df_raw = json.loads(ceph_conn.mon_command(json.dumps(command), b'', timeout=1)[1])['pools']
+            ceph_pool_df_raw = json.loads(stdout)['pools']
        except Exception as e:
            logger.out('Failed to obtain Pool data (ceph df): {}'.format(e), state='w')
            ceph_pool_df_raw = []
@ -1275,6 +1275,8 @@ def collect_ceph_stats(queue):
        osd_stats = dict()

        for osd in osd_list:
+            if d_osd[osd].node == myhostname:
+                osds_this_node += 1
            try:
                this_dump = osd_dump[osd]
                this_dump.update(osd_df[osd])
@ -1284,12 +1286,12 @@ def collect_ceph_stats(queue):
                # One or more of the status commands timed out, just continue
                logger.out('Failed to parse OSD stats into dictionary: {}'.format(e), state='w')

-        # Trigger updates for each OSD on this node
-        if debug:
-            logger.out("Trigger updates for each OSD on this node", state='d', prefix='ceph-thread')
+        # Upload OSD data for the cluster (primary-only)
+        if this_node.router_state == 'primary':
+            if debug:
+                logger.out("Trigger updates for each OSD", state='d', prefix='ceph-thread')

-        for osd in osd_list:
-            if d_osd[osd].node == myhostname:
+            for osd in osd_list:
                try:
                    stats = json.dumps(osd_stats[osd])
                    zkhandler.writedata(zk_conn, {
@ -1298,7 +1300,6 @@ def collect_ceph_stats(queue):
                except KeyError as e:
                    # One or more of the status commands timed out, just continue
                    logger.out('Failed to upload OSD stats from dictionary: {}'.format(e), state='w')
-                osds_this_node += 1

    ceph_conn.shutdown()

@ -1355,9 +1356,11 @@ def collect_vm_stats(queue):
                if instance.getdom() is not None:
                    try:
                        if instance.getdom().state()[0] != libvirt.VIR_DOMAIN_RUNNING:
+                            logger.out("VM {} has failed".format(instance.domname), state='w', prefix='vm-thread')
                            raise
                    except Exception:
                        # Toggle a state "change"
+                        logger.out("Resetting state to {} for VM {}".format(instance.getstate(), instance.domname), state='i', prefix='vm-thread')
                        zkhandler.writedata(zk_conn, {'/domains/{}/state'.format(domain): instance.getstate()})
        elif instance.getnode() == this_node.name:
            memprov += instance.getmemory()
--- a/node-daemon/pvcnoded/NodeInstance.py
+++ b/node-daemon/pvcnoded/NodeInstance.py
@ -64,17 +64,28 @@ class NodeInstance(object):
        self.vcpualloc = 0
        # Floating IP configurations
        if self.config['enable_networking']:
-            self.vni_dev = self.config['vni_dev']
-            self.vni_ipaddr, self.vni_cidrnetmask = self.config['vni_floating_ip'].split('/')
            self.upstream_dev = self.config['upstream_dev']
-            self.upstream_ipaddr, self.upstream_cidrnetmask = self.config['upstream_floating_ip'].split('/')
+            self.upstream_floatingipaddr = self.config['upstream_floating_ip'].split('/')[0]
+            self.upstream_ipaddr, self.upstream_cidrnetmask = self.config['upstream_dev_ip'].split('/')
+            self.vni_dev = self.config['vni_dev']
+            self.vni_floatingipaddr = self.config['vni_floating_ip'].split('/')[0]
+            self.vni_ipaddr, self.vni_cidrnetmask = self.config['vni_dev_ip'].split('/')
+            self.storage_dev = self.config['storage_dev']
+            self.storage_floatingipaddr = self.config['storage_floating_ip'].split('/')[0]
+            self.storage_ipaddr, self.storage_cidrnetmask = self.config['storage_dev_ip'].split('/')
        else:
-            self.vni_dev = None
-            self.vni_ipaddr = None
-            self.vni_cidrnetmask = None
            self.upstream_dev = None
+            self.upstream_floatingipaddr = None
            self.upstream_ipaddr = None
            self.upstream_cidrnetmask = None
+            self.vni_dev = None
+            self.vni_floatingipaddr = None
+            self.vni_ipaddr = None
+            self.vni_cidrnetmask = None
+            self.storage_dev = None
+            self.storage_floatingipaddr = None
+            self.storage_ipaddr = None
+            self.storage_cidrnetmask = None
        # Threads
        self.flush_thread = None
        # Flags
@ -349,13 +360,13 @@ class NodeInstance(object):
        # 1. Add Upstream floating IP
        self.logger.out(
            'Creating floating upstream IP {}/{} on interface {}'.format(
-                self.upstream_ipaddr,
+                self.upstream_floatingipaddr,
                self.upstream_cidrnetmask,
                'brupstream'
            ),
            state='o'
        )
-        common.createIPAddress(self.upstream_ipaddr, self.upstream_cidrnetmask, 'brupstream')
+        common.createIPAddress(self.upstream_floatingipaddr, self.upstream_cidrnetmask, 'brupstream')
        self.logger.out('Releasing write lock for synchronization phase C', state='i')
        zkhandler.writedata(self.zk_conn, {'/locks/primary_node': ''})
        lock.release()
@ -367,16 +378,25 @@ class NodeInstance(object):
        lock.acquire()
        self.logger.out('Acquired write lock for synchronization phase D', state='o')
        time.sleep(0.2)  # Time fir reader to acquire the lock
-        # 2. Add Cluster floating IP
+        # 2. Add Cluster & Storage floating IP
        self.logger.out(
            'Creating floating management IP {}/{} on interface {}'.format(
-                self.vni_ipaddr,
+                self.vni_floatingipaddr,
                self.vni_cidrnetmask,
                'brcluster'
            ),
            state='o'
        )
-        common.createIPAddress(self.vni_ipaddr, self.vni_cidrnetmask, 'brcluster')
+        common.createIPAddress(self.vni_floatingipaddr, self.vni_cidrnetmask, 'brcluster')
+        self.logger.out(
+            'Creating floating storage IP {}/{} on interface {}'.format(
+                self.storage_floatingipaddr,
+                self.storage_cidrnetmask,
+                'brstorage'
+            ),
+            state='o'
+        )
+        common.createIPAddress(self.storage_floatingipaddr, self.storage_cidrnetmask, 'brstorage')
        self.logger.out('Releasing write lock for synchronization phase D', state='i')
        zkhandler.writedata(self.zk_conn, {'/locks/primary_node': ''})
        lock.release()
@ -541,13 +561,13 @@ class NodeInstance(object):
        # 5. Remove Upstream floating IP
        self.logger.out(
            'Removing floating upstream IP {}/{} from interface {}'.format(
-                self.upstream_ipaddr,
+                self.upstream_floatingipaddr,
                self.upstream_cidrnetmask,
                'brupstream'
            ),
            state='o'
        )
-        common.removeIPAddress(self.upstream_ipaddr, self.upstream_cidrnetmask, 'brupstream')
+        common.removeIPAddress(self.upstream_floatingipaddr, self.upstream_cidrnetmask, 'brupstream')
        self.logger.out('Releasing read lock for synchronization phase C', state='i')
        lock.release()
        self.logger.out('Released read lock for synchronization phase C', state='o')
@ -557,16 +577,25 @@ class NodeInstance(object):
        self.logger.out('Acquiring read lock for synchronization phase D', state='i')
        lock.acquire()
        self.logger.out('Acquired read lock for synchronization phase D', state='o')
-        # 6. Remove Cluster floating IP
+        # 6. Remove Cluster & Storage floating IP
        self.logger.out(
            'Removing floating management IP {}/{} from interface {}'.format(
-                self.vni_ipaddr,
+                self.vni_floatingipaddr,
                self.vni_cidrnetmask,
                'brcluster'
            ),
            state='o'
        )
-        common.removeIPAddress(self.vni_ipaddr, self.vni_cidrnetmask, 'brcluster')
+        common.removeIPAddress(self.vni_floatingipaddr, self.vni_cidrnetmask, 'brcluster')
+        self.logger.out(
+            'Removing floating storage IP {}/{} from interface {}'.format(
+                self.storage_floatingipaddr,
+                self.storage_cidrnetmask,
+                'brstorage'
+            ),
+            state='o'
+        )
+        common.removeIPAddress(self.storage_floatingipaddr, self.storage_cidrnetmask, 'brstorage')
        self.logger.out('Releasing read lock for synchronization phase D', state='i')
        lock.release()
        self.logger.out('Released read lock for synchronization phase D', state='o')
--- a/node-daemon/pvcnoded/VMConsoleWatcherInstance.py
+++ b/node-daemon/pvcnoded/VMConsoleWatcherInstance.py
@ -62,13 +62,13 @@ class VMConsoleWatcherInstance(object):
    def start(self):
        self.thread_stopper.clear()
        self.thread = Thread(target=self.run, args=(), kwargs={})
-        self.logger.out('Starting VM log parser', state='i', prefix='Domain {}:'.format(self.domuuid))
+        self.logger.out('Starting VM log parser', state='i', prefix='Domain {}'.format(self.domuuid))
        self.thread.start()

    # Stop execution thread
    def stop(self):
        if self.thread and self.thread.isAlive():
-            self.logger.out('Stopping VM log parser', state='i', prefix='Domain {}:'.format(self.domuuid))
+            self.logger.out('Stopping VM log parser', state='i', prefix='Domain {}'.format(self.domuuid))
            self.thread_stopper.set()
            # Do one final flush
            self.update()
--- a/node-daemon/pvcnoded/VMInstance.py
+++ b/node-daemon/pvcnoded/VMInstance.py
@ -27,6 +27,8 @@ import json

 from threading import Thread

+from xml.etree import ElementTree
+
 import pvcnoded.zkhandler as zkhandler
 import pvcnoded.common as common

@ -35,7 +37,7 @@ import pvcnoded.VMConsoleWatcherInstance as VMConsoleWatcherInstance
 import daemon_lib.common as daemon_common


-def flush_locks(zk_conn, logger, dom_uuid):
+def flush_locks(zk_conn, logger, dom_uuid, this_node=None):
    logger.out('Flushing RBD locks for VM "{}"'.format(dom_uuid), state='i')
    # Get the list of RBD images
    rbd_list = zkhandler.readdata(zk_conn, '/domains/{}/rbdlist'.format(dom_uuid)).split(',')
@ -56,11 +58,18 @@ def flush_locks(zk_conn, logger, dom_uuid):
        if lock_list:
            # Loop through the locks
            for lock in lock_list:
+                if this_node is not None and zkhandler.readdata(zk_conn, '/domains/{}/state'.format(dom_uuid)) != 'stop' and lock['address'].split(':')[0] != this_node.storage_ipaddr:
+                    logger.out('RBD lock does not belong to this host (lock owner: {}): freeing this lock would be unsafe, aborting'.format(lock['address'].split(':')[0], state='e'))
+                    zkhandler.writedata(zk_conn, {'/domains/{}/state'.format(dom_uuid): 'fail'})
+                    zkhandler.writedata(zk_conn, {'/domains/{}/failedreason'.format(dom_uuid): 'Could not safely free RBD lock {} ({}) on volume {}; stop VM and flush locks manually'.format(lock['id'], lock['address'], rbd)})
+                    break
                # Free the lock
                lock_remove_retcode, lock_remove_stdout, lock_remove_stderr = common.run_os_command('rbd lock remove {} "{}" "{}"'.format(rbd, lock['id'], lock['locker']))
                if lock_remove_retcode != 0:
-                    logger.out('Failed to free RBD lock "{}" on volume "{}"\n{}'.format(lock['id'], rbd, lock_remove_stderr), state='e')
-                    continue
+                    logger.out('Failed to free RBD lock "{}" on volume "{}": {}'.format(lock['id'], rbd, lock_remove_stderr), state='e')
+                    zkhandler.writedata(zk_conn, {'/domains/{}/state'.format(dom_uuid): 'fail'})
+                    zkhandler.writedata(zk_conn, {'/domains/{}/failedreason'.format(dom_uuid): 'Could not free RBD lock {} ({}) on volume {}: {}'.format(lock['id'], lock['address'], rbd, lock_remove_stderr)})
+                    break
                logger.out('Freed RBD lock "{}" on volume "{}"'.format(lock['id'], rbd), state='o')

    return True
@ -74,15 +83,14 @@ def run_command(zk_conn, logger, this_node, data):
    # Flushing VM RBD locks
    if command == 'flush_locks':
        dom_uuid = args
-        # If this node is taking over primary state, wait until it's done
-        while this_node.router_state == 'takeover':
-            time.sleep(1)
-        if this_node.router_state == 'primary':
+
+        # Verify that the VM is set to run on this node
+        if this_node.d_domain[dom_uuid].getnode() == this_node.name:
            # Lock the command queue
            zk_lock = zkhandler.writelock(zk_conn, '/cmd/domains')
            with zk_lock:
-                # Add the OSD
-                result = flush_locks(zk_conn, logger, dom_uuid)
+                # Flush the lock
+                result = flush_locks(zk_conn, logger, dom_uuid, this_node)
                # Command succeeded
                if result:
                    # Update the command queue
@ -202,6 +210,21 @@ class VMInstance(object):
            except Exception as e:
                self.logger.out('Error removing domain from list: {}'.format(e), state='e')

+    # Update the VNC live data
+    def update_vnc(self):
+        if self.dom is not None:
+            live_xml = ElementTree.fromstring(self.dom.XMLDesc(0))
+            graphics = live_xml.find('./devices/graphics')
+            if graphics is not None:
+                self.logger.out('Updating VNC data', state='i', prefix='Domain {}'.format(self.domuuid))
+                port = graphics.get('port', '')
+                listen = graphics.get('listen', '')
+                zkhandler.writedata(self.zk_conn, {'/domains/{}/vnc'.format(self.domuuid): '{}:{}'.format(listen, port)})
+            else:
+                zkhandler.writedata(self.zk_conn, {'/domains/{}/vnc'.format(self.domuuid): ''})
+        else:
+            zkhandler.writedata(self.zk_conn, {'/domains/{}/vnc'.format(self.domuuid): ''})
+
    # Start up the VM
    def start_vm(self):
        # Start the log watcher
@ -225,6 +248,17 @@ class VMInstance(object):
        except Exception:
            curstate = 'notstart'

+        # Handle situations where the VM crashed or the node unexpectedly rebooted
+        if self.getdom() is None or self.getdom().state()[0] != libvirt.VIR_DOMAIN_RUNNING:
+            # Flush locks
+            self.logger.out('Flushing RBD locks', state='i', prefix='Domain {}'.format(self.domuuid))
+            flush_locks(self.zk_conn, self.logger, self.domuuid, self.this_node)
+            if zkhandler.readdata(self.zk_conn, '/domains/{}/state'.format(self.domuuid)) == 'fail':
+                lv_conn.close()
+                self.dom = None
+                self.instart = False
+                return
+
        if curstate == libvirt.VIR_DOMAIN_RUNNING:
            # If it is running just update the model
            self.addDomainToList()
@ -243,7 +277,10 @@ class VMInstance(object):
                self.logger.out('Failed to create VM', state='e', prefix='Domain {}'.format(self.domuuid))
                zkhandler.writedata(self.zk_conn, {'/domains/{}/state'.format(self.domuuid): 'fail'})
                zkhandler.writedata(self.zk_conn, {'/domains/{}/failedreason'.format(self.domuuid): str(e)})
+                lv_conn.close()
                self.dom = None
+                self.instart = False
+                return

        lv_conn.close()

@ -719,7 +756,8 @@ class VMInstance(object):
                        self.removeDomainFromList()
                        # Stop the log watcher
                        self.console_log_instance.stop()
-
+                # Update the VNC information
+                self.update_vnc()
            else:
                # Conditional pass three - Is this VM currently running on this node
                if running == libvirt.VIR_DOMAIN_RUNNING:
--- a/node-daemon/pvcnoded/fencing.py
+++ b/node-daemon/pvcnoded/fencing.py
@ -124,6 +124,11 @@ def rebootViaIPMI(ipmi_hostname, ipmi_user, ipmi_password, logger):
    )
    ipmi_reset_retcode, ipmi_reset_stdout, ipmi_reset_stderr = common.run_os_command(ipmi_command_reset)

+    if ipmi_reset_retcode != 0:
+        logger.out('Failed to reboot dead node', state='e')
+        print(ipmi_reset_stderr)
+        return False
+
    time.sleep(2)

    # Ensure the node is powered on
@ -139,14 +144,14 @@ def rebootViaIPMI(ipmi_hostname, ipmi_user, ipmi_password, logger):
        )
        ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start)

-    # Declare success or failure
-    if ipmi_reset_retcode == 0:
-        logger.out('Successfully rebooted dead node', state='o')
-        return True
-    else:
-        logger.out('Failed to reboot dead node', state='e')
-        print(ipmi_reset_stderr)
-        return False
+        if ipmi_start_retcode != 0:
+            logger.out('Failed to start powered-off dead node', state='e')
+            print(ipmi_reset_stderr)
+            return False
+
+    # Declare success
+    logger.out('Successfully rebooted dead node', state='o')
+    return True


 #
Author	SHA1	Message	Date
Joshua M. Boniface	9fbe35fd24	Bump version to 0.9.11	2021-01-05 15:58:26 -05:00
Joshua M. Boniface	09fdb5da26	Fix a bad reference	2021-01-03 23:29:43 -05:00
Joshua M. Boniface	80a436d7b6	Fix bad links	2021-01-03 16:56:13 -05:00
Joshua M. Boniface	74a28f2edd	Improve documentation of networks	2021-01-03 16:50:18 -05:00
Joshua M. Boniface	b8aba3a498	Use proper type for passwd_root option	2021-01-03 16:41:15 -05:00
Joshua M. Boniface	8b584bc545	Update Ansible manual with recent changes	2021-01-03 16:38:07 -05:00
Joshua M. Boniface	a24724d9f0	Use external ceph cmd for ceph df	2020-12-26 14:04:21 -05:00
Joshua M. Boniface	d22a5aa7f2	Move information about memory utilization	2020-12-21 00:20:01 -05:00
Joshua M. Boniface	78c017d51d	Remove erroneous extra colon in log output	2020-12-20 16:06:35 -05:00
Joshua M. Boniface	1b6613c280	Add live VNC information to domain output Sets in the node daemon, returns via the API, and shows in the CLI, information about the live VNC listen address and port for VNC-enabled VMs. Closes #115	2020-12-20 16:00:55 -05:00
Joshua M. Boniface	9aeb86246a	Add even further documentation tweaks	2020-12-19 03:32:25 -05:00
Joshua M. Boniface	6abb8b2456	Clarify additional information in the cluster docs	2020-12-19 03:20:29 -05:00
Joshua M. Boniface	d6ef722997	Fix bad log message	2020-12-15 10:51:52 -05:00
Joshua M. Boniface	518d699c15	Bump version to 0.9.10	2020-12-15 10:45:15 -05:00
Joshua M. Boniface	ac3ef3d792	Revamp fencing order Prevents unnecessarily excessive timeouts if IPMI connections time out; before, would have to go through 3 timed out commands at ~20s each before failure was registered; reduced to 1 if the first times out.	2020-12-15 02:48:25 -05:00
Joshua M. Boniface	37c3b4ef80	Validate provisioner userdata with SafeLoader Given the issues with FullLoader and its eventual deprecation, just use SafeLoader instead. Any well-formatted Userdata document should conform.	2020-12-15 00:30:20 -05:00
Joshua M. Boniface	3705daff43	Better handle failing RBD lock frees If the VM is not in a stop state, failing to free the lock is now considered a fatal error and will put the domain into fail state, aborting the start. This is better than being unsafe or trying to start a VM which will fail to boot due to read-only volumes.	2020-12-14 16:04:38 -05:00
Joshua M. Boniface	7c99a7bda7	Safely reset RBD locks on failed VMs Should correct issues on cold start as well as if a VM crashes uncleanly, which would prevent the VM from starting due to stale RBD locks. This implementation has four parts: 1. Update how IP addresses are handled, specifically by replacing all previous instances of "vni_ipaddr" with "vni_floatingipaddr", and then adding the "vni_ipaddr" with the real data for this node's IPs. Also include the storage IPs in this where they weren't before, so each this_node actually has the local IPs plus floating IPs. This enables the next two steps. 2. Modify flush_locks to take this_node as an argument, and update the run_command function to only operate against this node, rather than on the primary coordinator. 3. Have the flush_locks check each lock against the current node, to verify that the lock is actually held by the current node. This is the only way to do this safely. During fencing, we override this by not passing a this_node which bypasses this check. 4. Have the VM start do the check for VM failure/startup and execute a flush_locks before actually starting the VM.	2020-12-14 15:53:18 -05:00
Joshua M. Boniface	68d87c0b99	Formatting fixes and typos	2020-12-13 16:38:30 -05:00
Joshua M. Boniface	1af7c545b2	Fix broken link	2020-12-13 04:45:38 -05:00
Joshua M. Boniface	9a1b86bbbf	Fix capitalization issue	2020-12-13 04:42:46 -05:00
Joshua M. Boniface	5b0066da3f	Update provisioner manual	2020-12-13 04:35:42 -05:00
Joshua M. Boniface	89c7e225a0	Move OSD stats uploading to primary only Instead of each node uploading its own OSD stats, which would not work if the PVC daemon wasn't running, instead have the primary upload stats for all OSDs in the cluster.	2020-12-09 02:46:09 -05:00