Massive rejigger into single daemon

Completely restructure the daemon code to move the 4 discrete daemons
into a single daemon that can be run on every hypervisor. Introduce the
idea of a static list of "coordinator" nodes which are configured at
install time to run Zookeeper and FRR in router mode, and which are
allowed to take on client network management duties (gateway, DHCP, DNS,
etc.) while also allowing them to run VMs (i.e. no dedicated "router"
nodes required).
This commit is contained in:
2018-10-14 02:01:35 -04:00
parent 25df845769
commit f198f62563
57 changed files with 1726 additions and 2307 deletions

View File

@ -2,65 +2,71 @@
#### NOTICE FOR GITHUB
This software is still incomplete, and should be considered pre-alpha and not suitable for production use! Not all features described below are implemented, and I will be committing directly to master until they are. If you wish to test out PVC, the following table details the currently-working features, but be warned that functionality may change regularly. Use the tag `v0.3` for a stable implementation of the working features.
* Working features: pvcvd, cli-client (for VM and hypervisor management)
* In progress features: pvcrd, pvcnd, cli-client support for the aforementioned
* Unstarted features: pvcpd, api-client, web-client
This software is still incomplete, and should be considered pre-alpha and not suitable for production use! Not all features described below are implemented, and I will be committing directly to master until they are (version 1.0).
[![pipeline status](https://git.bonifacelabs.ca/bonifacelabs/pvc/badges/master/pipeline.svg)](https://git.bonifacelabs.ca/bonifacelabs/pvc/commits/master)
![Logo](https://git.bonifacelabs.ca/uploads/-/system/project/avatar/135/pvc_logo.png)
PVC is a suite of Python 3 tools to manage virtualized clusters. It provides a fully-functional private cloud based on the priciple that "PVC is not hyperscale". It is designed to be administrator-friendly while powerful, but without the feature bloat and complexity of tools like OpenStack that are designed to support public clouds. With PVC, an administrator can provision, manage, and update a cluster of dozens or more hypervisors running thousands of VMs using a simple CLI tool, HTTP API, or web interface. PVC is based entirely on Debian GNU/Linux and Free-and-Open-Source tools, providing the glue to provision and manage the cluster.
PVC is a suite of Python 3 tools to manage virtualized clusters. It provides a fully-functional private cloud based on the priciple that "PVC is not hyperscale". It is designed to be administrator-friendly while powerful, but without the feature bloat and complexity of tools like OpenStack that are designed to support public clouds. With PVC, an administrator can provision, manage, and update a cluster of dozens or more hypervisors running thousands of VMs using a simple CLI tool, HTTP API, or web interface. PVC is based entirely on Debian GNU/Linux and Free-and-Open-Source tools, providing the glue to bootstrap, provision and manage the cluster. Just add physical servers.
## Architecture overview
A PVC deployment ("cluster") consists of a standard physical layout and suite of daemons to manage the physical elements. The cluster is backed by a Zookeeper instance running on a subset of the machines and which all daemons communicate with to coordinate state.
A PVC deployment ("cluster") consists of a cluster of hosts which share duties using a single daemon. The cluster is backed by a Zookeeper instance running on a subset of the machines and which all daemons communicate with to coordinate state.
### Physical infrastructure
A cluster consists of two main kinds of physical servers - routers and hypervisors. A cluster will normally have two routers in a failover pair, and at least three hypervisors.
The PVC system depends on a cluster of 3 or more physical servers. Each server must have the capability to run storage, client networks, and VMs, and a subset of these servers are configured at install time to also act as routers for the cluster.
Router nodes may be less powerful than full hypervisors; they act primarily as the gateway for VM networks and handles inter-network ACLs. While they are not strictly required, a proper deployment with all functionality will require them.
The underlying networking is left up to the administrator; the only requirement is that all routers and hypervisors must be reachable by each other. In the simplest deployment, all physical nodes may be connected to a single dumb switch. All inter-VM networking is handled dynamically via software-defined networking within the cluster itself and is handled transparently above the underlying network layer. More advanced configurations may be specified during cluster initalization, including upstream networks, storage networks, and advanced node-level network configuration (vLANs, bonds, etc.)
Hypervisor nodes should be scaled at the administrator's discretion; they may be low-power and scaled out, or high-power and scaled up. PVC provides a straightforward automated provisioning system to expand the cluster as required.
The coordinator hosts [see below] require an additional upstream network. These hosts advertise BGP routes to the cluster networks on their upstream interface, and accept traffic destined to the clients; they route between themselves to reach VMs out the primary gateway node, so all coordinators are valid route targets. The router components of the daemon makes no effort to perform NAT or Internet gateway functions; an upstream router should be configured for this purpose.
The underlying networking is left up to the administrator; the only requirement is that all routers and hypervisors must be reachable by each other. In the simplest deployment, all physical nodes may be connected to a single dumb switch. All inter-VM networking is handled dynamically via software-defined networking within the cluster itself and is handled transparently above the underlying network layer. More advanced configurations may be specified during cluster initalization.
PVC supports fencing of nodes when they do not update the Zookeeper database in a fixed, configurable time, to provide automated recovery from node failures. This feature requires IPMI networked BMC support, and credentials should be specified in in the configuration. Preparing IPMI for PVC's use is left to the administrator.
### Software infrastructure
The core functionality of PVC is obtained via Zookeeper. During cluster initalization, the administrator must set either 3 or 5 hypervisors to act as the Zookeeper coordination subcluster. These hypervisors are special in the cluster and should not be removed after creation. This configuration prevents Zookeeper cluster size bloat as the cluster grows while still providing adequate redundancy for Zookeeper.
The PVC server-side infrastructure consists of a single daemon, `pvcd`, which manages each node based on connectivity to the Zookeeper cluster. All nodes are capable of running virtual machines, Ceph storage OSDs, and passing traffic to virtual machines via configured networks.
All daemons communicate with Zookeeper to obtain state, and update Zookeper as required, providing a high degree of self-management. Most major failure conditions are handled transparently by the cluster.
A subset of the nodes are designated at install time to act as "coordinator" hosts for the cluster. By default, 3 or 5 nodes can be designated as coordinators; 3 is ideal for small deployments (<30 hypervisors) while 5 allow for much larger scaling. These coordinators run additional functions for the cluster beyond VMs and storage, mainly:
FRRouting is used to manage virtual networking via BGP EVPN, and Libvirt is used to manage virtual machines.
* running Zookeeper itself, acting as the central database for the cluster.
* running FRRouting in BGP server mode, performing route reflector and upstream routing functionality.
* running Ceph monitor and manager daemons for the storage cluster.
* acting as client network gateways, DHCP, and DNS servers.
* acting as provisioning servers for nodes and VMs.
PVC itself is composed of four daemons:
A single coordinator elects itself "primary" to perform this duty at startup, and passes it off on shutdown; this can be modified manually by the administrator. The primary coordinator handles provisioning and client network functionality (gateway, DHCP, DNS) for the whole cluster, which the "secondary" coordinators can take over automatically if needed. While this architecture can suffer from tromboning when there is a larger inter-network traffic flow, it preserves a consistent and simple layer-2 model inside each client network for administrative simplicity.
* Virtualization
* Network
* Router
* Provisioning
New nodes can be added dynamically; once running, the cluster supports the PXE booting of additional hypervisors which are then self-configured and added to the cluster via the provisioning framework. This framework also allows for the quick deployment of VMs based off Ceph-stored images and templates.
#### Virtualization
The core external components are:
The virtualization daemon (`pvcvd`, package `pvc-virtualization-daemon`) manages QEMU/KVM virtual machines on hypervisor nodes. Domain configurations are stored in Zookeeper and VMs are dynamically created on hypervisor nodes based on Zookeeper configuration values. The virtualization daemon handles all stages of the VM lifecycle, including triggering startup, restart, graceful ACPI shutdown, and forceful termination.
#### Zookeeper
By default, each VM lives on a particular "home" node, and can be live migrated away either temporarily (`migrate`) or permanently (`move`). During provisioning and normal `migrate`/`move` commands, the selection of the target hypervisor is dynamic, based on administator-configurable variables.
Zookeeper is the primary database of the cluster, running on the coordinator nodes. All activity in the cluster is mediated by Zookeeper: clients read and write data to it, and daemons determine and update object configuration and state from it. The bootstrap tool initializes the cluster on the initial set of coordinator hosts, and once configured requires manual administrative action to modify; future version using Zookeeper 3.5 may offer self-managing functionality.
#### Network
Coordinator hosts automatically attempt to start the Zookeeper daemon when they start up, if it has been shut down. If the Zookeeper cluster connection is lost, all clients will pause state update operations while waiting to reconnect. Note that fencing may be triggered if only one node loses Zookeeper connectivity, as the paused operations will prevent keepalives from being sent to the cluster. Take care when rebooting coordinator nodes so that the Zookeeper cluster continues to function normally.
The network daemon (`pvcnd`, package `pvc-network-daemon`) manages the hypervisor-side virtual networking for the cluster. It is responsible for provisioning VXLAN devices on hypervisor nodes for VM network access.
#### FRRouting
#### Router
FRRouting is used to provide BGP for management of client networks. It makes use of BGP EVPN to allow dynamic, software-defined VXLAN client networks presenting as simple layer-2 networks. VMs inside a particular client network can communicate directly as if they shared a switch. FRRouting also provides upstream BGP, allowing routes to the dynamic client networks to be learned by upstream routers.
The router daemon (`pvcrd`, package `pvc-router-daemon`) manages the router-side virtual networking for the cluster. It includes functionality for managing the gateways of each virtual network, as well as providing network ACLs and IP forwarding to an upsteam, and DHCP for client networks.
#### dnsmasq
#### Provisioning
dnsmasq is used by the coordinator nodes to provide DHCP and DNS support for client networks. An individual instance is started on the primary coordinator for each network, handling that network specifically.
The provisioning daemon (`pvcpd`, package `pvc-provisioning-daemon`) manages the setup and creation of new physical nodes, new virtual machines, as well as handling updates of the cluster. The provisioning daemon can be run on any nodes, but is normally run on the routers to simplify administration.
#### PowerDNS
PowerDNS is used by the coordinator nodes to aggregate client DNS records from the dnsmasq instances and present a complete picture of the cluster DNS to clients and the outside world. An instance runs on the primary coordinator aggregating dnsmasq entries, which can then be sent to other DNS servers via AXFR, including the in-cluster DNS servers usable by clients, which also make use of PowerDNS.
#### Libvirt
Libvirt is used to manage virtual machines in the cluster. It uses the TCP communication mode to perform live migrations between nodes and must be listening on daemon startup.
#### Ceph
Ceph provides the storage infrastructure to the cluster using RBD block devices. OSDs live in each node and VM disks are stored in copies of 3 across the cluster, ensuring a high degree of resiliency. The monitor and manager functions run on the coordinator nodes for scalability.
### Client interfaces
@ -91,6 +97,10 @@ While not specifically an interface, the Python functions used by the above inte
## Changelog
#### 0.4
* Recombination of daemons and expansion of functionality into client network management and routing.
#### 0.3
* Major revisions to expand functionality.