Compare commits
33 Commits
Author | SHA1 | Date | |
---|---|---|---|
c64e888d30 | |||
f1249452e5 | |||
0a93f526e0 | |||
7c9512fb22 | |||
e88b97f3a9 | |||
709c9cb73e | |||
f41c5176be | |||
38e43b46c3 | |||
ed9c37982a | |||
0f24184b78 | |||
1ba37fe33d | |||
1a05077b10 | |||
57c28376a6 | |||
e781d742e6 | |||
6c6d1508a1 | |||
741dafb26b | |||
032d3ebf18 | |||
5d9e83e8ed | |||
ad0bd8649f | |||
9b5e53e4b6 | |||
9617660342 | |||
ab0a1e0946 | |||
7c116b2fbc | |||
1023c55087 | |||
9235187c6f | |||
0c94f1b4f8 | |||
44a4f0e1f7 | |||
5d53a3e529 | |||
35e22cb50f | |||
a3171b666b | |||
48e41d7b05 | |||
d6aecf195e | |||
9329784010 |
@ -1,5 +1,14 @@
|
||||
## PVC Changelog
|
||||
|
||||
###### [v0.9.86](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.86)
|
||||
|
||||
* [API Daemon] Significantly improves the performance of several commands via async Zookeeper calls and removal of superfluous backend calls.
|
||||
* [Docs] Improves the project README and updates screenshot images to show the current output and more functionality.
|
||||
* [API Daemon/CLI] Corrects some bugs in VM metainformation output.
|
||||
* [Node Daemon] Fixes resource reporting bugs from 0.9.81 and properly clears node resource numbers on a fence.
|
||||
* [Health Daemon] Adds a wait during pvchealthd startup until the node is in run state, to avoid erroneous faults during node bootup.
|
||||
* [API Daemon] Fixes an incorrect reference to legacy pvcapid.yaml file in migration script.
|
||||
|
||||
###### [v0.9.85](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.85)
|
||||
|
||||
* [Packaging] Fixes a dependency bug introduced in 0.9.84
|
||||
|
71
README.md
@ -1,5 +1,5 @@
|
||||
<p align="center">
|
||||
<img alt="Logo banner" src="docs/images/pvc_logo_black.png"/>
|
||||
<img alt="Logo banner" src="images/pvc_logo_black.png"/>
|
||||
<br/><br/>
|
||||
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
|
||||
<a href="https://github.com/psf/black"><img alt="Code style: Black" src="https://img.shields.io/badge/code%20style-black-000000.svg"/></a>
|
||||
@ -23,37 +23,62 @@ Installation of PVC is accomplished by two main components: a [Node installer IS
|
||||
|
||||
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
|
||||
|
||||
|
||||
## What is it based on?
|
||||
|
||||
The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
|
||||
|
||||
* Debian GNU/Linux as the base OS.
|
||||
* Linux KVM, QEMU, and Libvirt for VM management.
|
||||
* Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
|
||||
* Ceph for storage management.
|
||||
* Apache Zookeeper for the primary cluster state database.
|
||||
* Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/getting-started/) page for details on configuring your first cluster.
|
||||
|
||||
To get started with PVC, please see the [About](https://docs.parallelvirtualcluster.org/en/latest/about-pvc/) page for general information about the project, and the [Getting Started](https://docs.parallelvirtualcluster.org/en/latest/deployment/getting-started/) page for details on configuring your first cluster.
|
||||
|
||||
## Changelog
|
||||
|
||||
View the changelog in [CHANGELOG.md](CHANGELOG.md).
|
||||
|
||||
View the changelog in [CHANGELOG.md](CHANGELOG.md). **Please note that any breaking changes are announced here; ensure you read the changelog before upgrading!**
|
||||
|
||||
## Screenshots
|
||||
|
||||
While PVC's API and internals aren't very screenshot-worthy, here is some example output of the CLI tool.
|
||||
These screenshots show some of the available functionality of the PVC system and CLI as of PVC v0.9.85.
|
||||
|
||||
<p><img alt="Node listing" src="docs/images/pvc-nodes.png"/><br/><i>Listing the nodes in a cluster</i></p>
|
||||
<p><img alt="0. Integrated help" src="images/0-integrated-help.png"/><br/>
|
||||
<i>The CLI features an integrated, fully-featured help system to show details about every possible command.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="Network listing" src="docs/images/pvc-networks.png"/><br/><i>Listing the networks in a cluster, showing 3 bridged and 1 IPv4-only managed networks</i></p>
|
||||
<p><img alt="1. Connection management" src="images/1-connection-management.png"/><br/>
|
||||
<i>A single CLI instance can manage multiple clusters, including a quick detail view, and will default to a "local" connection if an "/etc/pvc/pvc.conf" file is found; sensitive API keys are hidden by default.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="VM listing and migration" src="docs/images/pvc-migration.png"/><br/><i>Listing a limited set of VMs and migrating one with status updates</i></p>
|
||||
<p><img alt="2. Cluster details and output formats" src="images/2-cluster-details-and-output-formats.png"/><br/>
|
||||
<i>PVC can show the key details of your cluster at a glance, including health, persistent fault events, and key resources; the CLI can output both in pretty human format and JSON for easier machine parsing in scripts.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="Node logs" src="docs/images/pvc-nodelog.png"/><br/><i>Viewing the logs of a node (keepalives and VM [un]migration)</i></p>
|
||||
<p><img alt="3. Node information" src="images/3-node-information.png"/><br/>
|
||||
<i>PVC can show details about the nodes in the cluster, including their live health and resource utilization.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="4. VM information" src="images/4-vm-information.png"/><br/>
|
||||
<i>PVC can show details about the VMs in the cluster, including their state, resource allocations, current hosting node, and metadata.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="5. VM details" src="images/5-vm-details.png"/><br/>
|
||||
<i>In addition to the above basic details, PVC can also show extensive information about a running VM's devices and other resource utilization.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="6. Network information" src="images/6-network-information.png"/><br/>
|
||||
<i>PVC has two major client network types, and ensures a consistent configuration of client networks across the entire cluster; managed networks can feature DHCP, DNS, firewall, and other functionality including DHCP reservations.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="7. Storage information" src="images/7-storage-information.png"/><br/>
|
||||
<i>PVC provides a convenient abstracted view of the underlying Ceph system and can manage all core aspects of it.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="8. VM and node logs" src="images/8-vm-and-node-logs.png"/><br/>
|
||||
<i>PVC can display logs from VM serial consoles (if properly configured) and nodes in-client to facilitate quick troubleshooting.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="9. VM and worker tasks" src="images/9-vm-and-worker-tasks.png"/><br/>
|
||||
<i>PVC provides full VM lifecycle management, as well as long-running worker-based commands (in this example, clearing a VM's storage locks).</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="10. Provisioner" src="images/10-provisioner.png"/><br/>
|
||||
<i>PVC features an extensively customizable and configurable VM provisioner system, including EC2-compatible CloudInit support, allowing you to define flexible VM profiles and provision new VMs with a single command.</i>
|
||||
</p>
|
||||
|
||||
<p><img alt="11. Prometheus and Grafana dashboard" src="images/11-prometheus-grafana.png"/><br/>
|
||||
<i>PVC features several monitoring integration examples under "node-daemon/monitoring", including CheckMK, Munin, and, most recently, Prometheus, including an example Grafana dashboard for cluster monitoring and alerting.</i>
|
||||
</p>
|
||||
|
@ -3,7 +3,7 @@
|
||||
# Apply PVC database migrations
|
||||
# Part of the Parallel Virtual Cluster (PVC) system
|
||||
|
||||
export PVC_CONFIG_FILE="/etc/pvc/pvcapid.yaml"
|
||||
export PVC_CONFIG_FILE="/etc/pvc/pvc.conf"
|
||||
|
||||
if [[ ! -f ${PVC_CONFIG_FILE} ]]; then
|
||||
echo "Create a configuration file at ${PVC_CONFIG_FILE} before upgrading the database."
|
||||
|
@ -27,7 +27,7 @@ from distutils.util import strtobool as dustrtobool
|
||||
import daemon_lib.config as cfg
|
||||
|
||||
# Daemon version
|
||||
version = "0.9.85"
|
||||
version = "0.9.86"
|
||||
|
||||
# API version
|
||||
API_VERSION = 1.0
|
||||
|
@ -131,165 +131,12 @@ def cluster_metrics(zkhandler):
|
||||
Format status data from cluster_status into Prometheus-compatible metrics
|
||||
"""
|
||||
|
||||
# Get general cluster information
|
||||
status_retflag, status_data = pvc_cluster.get_info(zkhandler)
|
||||
if not status_retflag:
|
||||
return "Error: Status data threw error", 400
|
||||
|
||||
faults_retflag, faults_data = pvc_faults.get_list(zkhandler)
|
||||
if not faults_retflag:
|
||||
return "Error: Faults data threw error", 400
|
||||
|
||||
node_retflag, node_data = pvc_node.get_list(zkhandler)
|
||||
if not node_retflag:
|
||||
return "Error: Node data threw error", 400
|
||||
|
||||
vm_retflag, vm_data = pvc_vm.get_list(zkhandler)
|
||||
if not vm_retflag:
|
||||
return "Error: VM data threw error", 400
|
||||
|
||||
osd_retflag, osd_data = pvc_ceph.get_list_osd(zkhandler)
|
||||
if not osd_retflag:
|
||||
return "Error: OSD data threw error", 400
|
||||
|
||||
output_lines = list()
|
||||
|
||||
output_lines.append("# HELP pvc_info PVC cluster information")
|
||||
output_lines.append("# TYPE pvc_info gauge")
|
||||
output_lines.append(
|
||||
f"pvc_info{{primary_node=\"{status_data['primary_node']}\", version=\"{status_data['pvc_version']}\", upstream_ip=\"{status_data['upstream_ip']}\"}} 1"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_maintenance PVC cluster maintenance state")
|
||||
output_lines.append("# TYPE pvc_cluster_maintenance gauge")
|
||||
output_lines.append(
|
||||
f"pvc_cluster_maintenance {1 if bool(strtobool(status_data['maintenance'])) else 0}"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_health PVC cluster health status")
|
||||
output_lines.append("# TYPE pvc_cluster_health gauge")
|
||||
output_lines.append(f"pvc_cluster_health {status_data['cluster_health']['health']}")
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_faults PVC cluster new faults")
|
||||
output_lines.append("# TYPE pvc_cluster_faults gauge")
|
||||
fault_map = dict()
|
||||
for fault_type in pvc_common.fault_state_combinations:
|
||||
fault_map[fault_type] = 0
|
||||
for fault in faults_data:
|
||||
fault_map[fault["status"]] += 1
|
||||
for fault_type in fault_map:
|
||||
output_lines.append(
|
||||
f'pvc_cluster_faults{{status="{fault_type}"}} {fault_map[fault_type]}'
|
||||
)
|
||||
|
||||
# output_lines.append("# HELP pvc_cluster_faults PVC cluster health faults")
|
||||
# output_lines.append("# TYPE pvc_cluster_faults gauge")
|
||||
# for fault_msg in status_data["cluster_health"]["messages"]:
|
||||
# output_lines.append(
|
||||
# f"pvc_cluster_faults{{id=\"{fault_msg['id']}\", message=\"{fault_msg['text']}\"}} {fault_msg['health_delta']}"
|
||||
# )
|
||||
|
||||
output_lines.append("# HELP pvc_node_health PVC cluster node health status")
|
||||
output_lines.append("# TYPE pvc_node_health gauge")
|
||||
for node in status_data["node_health"]:
|
||||
if isinstance(status_data["node_health"][node]["health"], int):
|
||||
output_lines.append(
|
||||
f"pvc_node_health{{node=\"{node}\"}} {status_data['node_health'][node]['health']}"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_node_daemon_states PVC Node daemon state counts")
|
||||
output_lines.append("# TYPE pvc_node_daemon_states gauge")
|
||||
node_daemon_state_map = dict()
|
||||
for state in set([s.split(",")[0] for s in pvc_common.node_state_combinations]):
|
||||
node_daemon_state_map[state] = 0
|
||||
for node in node_data:
|
||||
node_daemon_state_map[node["daemon_state"]] += 1
|
||||
for state in node_daemon_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_node_daemon_states{{state="{state}"}} {node_daemon_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_node_domain_states PVC Node domain state counts")
|
||||
output_lines.append("# TYPE pvc_node_domain_states gauge")
|
||||
node_domain_state_map = dict()
|
||||
for state in set([s.split(",")[1] for s in pvc_common.node_state_combinations]):
|
||||
node_domain_state_map[state] = 0
|
||||
for node in node_data:
|
||||
node_domain_state_map[node["domain_state"]] += 1
|
||||
for state in node_domain_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_node_domain_states{{state="{state}"}} {node_domain_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_vm_states PVC VM state counts")
|
||||
output_lines.append("# TYPE pvc_vm_states gauge")
|
||||
vm_state_map = dict()
|
||||
for state in set(pvc_common.vm_state_combinations):
|
||||
vm_state_map[state] = 0
|
||||
for vm in vm_data:
|
||||
vm_state_map[vm["state"]] += 1
|
||||
for state in vm_state_map:
|
||||
output_lines.append(f'pvc_vm_states{{state="{state}"}} {vm_state_map[state]}')
|
||||
|
||||
output_lines.append("# HELP pvc_osd_up_states PVC OSD up state counts")
|
||||
output_lines.append("# TYPE pvc_osd_up_states gauge")
|
||||
osd_up_state_map = dict()
|
||||
for state in set([s.split(",")[0] for s in pvc_common.ceph_osd_state_combinations]):
|
||||
osd_up_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["stats"]["up"] > 0:
|
||||
osd_up_state_map["up"] += 1
|
||||
else:
|
||||
osd_up_state_map["down"] += 1
|
||||
for state in osd_up_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_osd_up_states{{state="{state}"}} {osd_up_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_osd_in_states PVC OSD in state counts")
|
||||
output_lines.append("# TYPE pvc_osd_in_states gauge")
|
||||
osd_in_state_map = dict()
|
||||
for state in set([s.split(",")[1] for s in pvc_common.ceph_osd_state_combinations]):
|
||||
osd_in_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["stats"]["in"] > 0:
|
||||
osd_in_state_map["in"] += 1
|
||||
else:
|
||||
osd_in_state_map["out"] += 1
|
||||
for state in osd_in_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_osd_in_states{{state="{state}"}} {osd_in_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_nodes PVC Node count")
|
||||
output_lines.append("# TYPE pvc_nodes gauge")
|
||||
output_lines.append(f"pvc_nodes {status_data['nodes']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_vms PVC VM count")
|
||||
output_lines.append("# TYPE pvc_vms gauge")
|
||||
output_lines.append(f"pvc_vms {status_data['vms']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_osds PVC OSD count")
|
||||
output_lines.append("# TYPE pvc_osds gauge")
|
||||
output_lines.append(f"pvc_osds {status_data['osds']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_networks PVC Network count")
|
||||
output_lines.append("# TYPE pvc_networks gauge")
|
||||
output_lines.append(f"pvc_networks {status_data['networks']}")
|
||||
|
||||
output_lines.append("# HELP pvc_pools PVC Storage Pool count")
|
||||
output_lines.append("# TYPE pvc_pools gauge")
|
||||
output_lines.append(f"pvc_pools {status_data['pools']}")
|
||||
|
||||
output_lines.append("# HELP pvc_volumes PVC Storage Volume count")
|
||||
output_lines.append("# TYPE pvc_volumes gauge")
|
||||
output_lines.append(f"pvc_volumes {status_data['volumes']}")
|
||||
|
||||
output_lines.append("# HELP pvc_snapshots PVC Storage Snapshot count")
|
||||
output_lines.append("# TYPE pvc_snapshots gauge")
|
||||
output_lines.append(f"pvc_snapshots {status_data['snapshots']}")
|
||||
|
||||
return "\n".join(output_lines) + "\n", 200
|
||||
retflag, retdata = pvc_cluster.get_metrics(zkhandler)
|
||||
if retflag:
|
||||
retcode = 200
|
||||
else:
|
||||
retcode = 400
|
||||
return retdata, retcode
|
||||
|
||||
|
||||
@pvc_common.Profiler(config)
|
||||
|
@ -249,6 +249,8 @@ def getOutputColours(node_information):
|
||||
daemon_state_colour = ansiprint.yellow()
|
||||
elif node_information["daemon_state"] == "dead":
|
||||
daemon_state_colour = ansiprint.red() + ansiprint.bold()
|
||||
elif node_information["daemon_state"] == "fenced":
|
||||
daemon_state_colour = ansiprint.red()
|
||||
else:
|
||||
daemon_state_colour = ansiprint.blue()
|
||||
|
||||
|
@ -1659,24 +1659,26 @@ def format_info(config, domain_information, long_output):
|
||||
)
|
||||
|
||||
if not domain_information.get("node_selector"):
|
||||
formatted_node_selector = "False"
|
||||
formatted_node_selector = "Default"
|
||||
else:
|
||||
formatted_node_selector = domain_information["node_selector"]
|
||||
formatted_node_selector = str(domain_information["node_selector"]).title()
|
||||
|
||||
if not domain_information.get("node_limit"):
|
||||
formatted_node_limit = "False"
|
||||
formatted_node_limit = "Any"
|
||||
else:
|
||||
formatted_node_limit = ", ".join(domain_information["node_limit"])
|
||||
|
||||
if not domain_information.get("node_autostart"):
|
||||
autostart_colour = ansiprint.blue()
|
||||
formatted_node_autostart = "False"
|
||||
else:
|
||||
formatted_node_autostart = domain_information["node_autostart"]
|
||||
autostart_colour = ansiprint.green()
|
||||
formatted_node_autostart = "True"
|
||||
|
||||
if not domain_information.get("migration_method"):
|
||||
formatted_migration_method = "any"
|
||||
formatted_migration_method = "Any"
|
||||
else:
|
||||
formatted_migration_method = domain_information["migration_method"]
|
||||
formatted_migration_method = str(domain_information["migration_method"]).title()
|
||||
|
||||
ainformation.append(
|
||||
"{}Migration selector:{} {}".format(
|
||||
@ -1689,8 +1691,12 @@ def format_info(config, domain_information, long_output):
|
||||
)
|
||||
)
|
||||
ainformation.append(
|
||||
"{}Autostart:{} {}".format(
|
||||
ansiprint.purple(), ansiprint.end(), formatted_node_autostart
|
||||
"{}Autostart:{} {}{}{}".format(
|
||||
ansiprint.purple(),
|
||||
ansiprint.end(),
|
||||
autostart_colour,
|
||||
formatted_node_autostart,
|
||||
ansiprint.end(),
|
||||
)
|
||||
)
|
||||
ainformation.append(
|
||||
@ -1736,13 +1742,17 @@ def format_info(config, domain_information, long_output):
|
||||
domain_information["tags"], key=lambda t: t["type"] + t["name"]
|
||||
):
|
||||
ainformation.append(
|
||||
" {tags_name: <{tags_name_length}} {tags_type: <{tags_type_length}} {tags_protected: <{tags_protected_length}}".format(
|
||||
" {tags_name: <{tags_name_length}} {tags_type: <{tags_type_length}} {tags_protected_colour}{tags_protected: <{tags_protected_length}}{end}".format(
|
||||
tags_name_length=tags_name_length,
|
||||
tags_type_length=tags_type_length,
|
||||
tags_protected_length=tags_protected_length,
|
||||
tags_name=tag["name"],
|
||||
tags_type=tag["type"],
|
||||
tags_protected=str(tag["protected"]),
|
||||
tags_protected_colour=ansiprint.green()
|
||||
if tag["protected"]
|
||||
else ansiprint.blue(),
|
||||
end=ansiprint.end(),
|
||||
)
|
||||
)
|
||||
else:
|
||||
|
@ -2,7 +2,7 @@ from setuptools import setup
|
||||
|
||||
setup(
|
||||
name="pvc",
|
||||
version="0.9.85",
|
||||
version="0.9.86",
|
||||
packages=["pvc.cli", "pvc.lib"],
|
||||
install_requires=[
|
||||
"Click",
|
||||
|
@ -215,14 +215,26 @@ def getClusterOSDList(zkhandler):
|
||||
|
||||
|
||||
def getOSDInformation(zkhandler, osd_id):
|
||||
# Get the devices
|
||||
osd_fsid = zkhandler.read(("osd.ofsid", osd_id))
|
||||
osd_node = zkhandler.read(("osd.node", osd_id))
|
||||
osd_device = zkhandler.read(("osd.device", osd_id))
|
||||
osd_is_split = bool(strtobool(zkhandler.read(("osd.is_split", osd_id))))
|
||||
osd_db_device = zkhandler.read(("osd.db_device", osd_id))
|
||||
(
|
||||
osd_fsid,
|
||||
osd_node,
|
||||
osd_device,
|
||||
_osd_is_split,
|
||||
osd_db_device,
|
||||
osd_stats_raw,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("osd.ofsid", osd_id),
|
||||
("osd.node", osd_id),
|
||||
("osd.device", osd_id),
|
||||
("osd.is_split", osd_id),
|
||||
("osd.db_device", osd_id),
|
||||
("osd.stats", osd_id),
|
||||
]
|
||||
)
|
||||
|
||||
osd_is_split = bool(strtobool(_osd_is_split))
|
||||
# Parse the stats data
|
||||
osd_stats_raw = zkhandler.read(("osd.stats", osd_id))
|
||||
osd_stats = dict(json.loads(osd_stats_raw))
|
||||
|
||||
osd_information = {
|
||||
@ -308,13 +320,18 @@ def get_list_osd(zkhandler, limit=None, is_fuzzy=True):
|
||||
#
|
||||
def getPoolInformation(zkhandler, pool):
|
||||
# Parse the stats data
|
||||
pool_stats_raw = zkhandler.read(("pool.stats", pool))
|
||||
(pool_stats_raw, tier, pgs,) = zkhandler.read_many(
|
||||
[
|
||||
("pool.stats", pool),
|
||||
("pool.tier", pool),
|
||||
("pool.pgs", pool),
|
||||
]
|
||||
)
|
||||
|
||||
pool_stats = dict(json.loads(pool_stats_raw))
|
||||
volume_count = len(getCephVolumes(zkhandler, pool))
|
||||
tier = zkhandler.read(("pool.tier", pool))
|
||||
if tier is None:
|
||||
tier = "default"
|
||||
pgs = zkhandler.read(("pool.pgs", pool))
|
||||
|
||||
pool_information = {
|
||||
"name": pool,
|
||||
|
@ -19,14 +19,12 @@
|
||||
#
|
||||
###############################################################################
|
||||
|
||||
from distutils.util import strtobool
|
||||
from json import loads
|
||||
|
||||
import daemon_lib.common as common
|
||||
import daemon_lib.faults as faults
|
||||
import daemon_lib.vm as pvc_vm
|
||||
import daemon_lib.node as pvc_node
|
||||
import daemon_lib.network as pvc_network
|
||||
import daemon_lib.ceph as pvc_ceph
|
||||
|
||||
|
||||
def set_maintenance(zkhandler, maint_state):
|
||||
@ -45,9 +43,7 @@ def set_maintenance(zkhandler, maint_state):
|
||||
return True, "Successfully set cluster in normal mode"
|
||||
|
||||
|
||||
def getClusterHealthFromFaults(zkhandler):
|
||||
faults_list = faults.getAllFaults(zkhandler)
|
||||
|
||||
def getClusterHealthFromFaults(zkhandler, faults_list):
|
||||
unacknowledged_faults = [fault for fault in faults_list if fault["status"] != "ack"]
|
||||
|
||||
# Generate total cluster health numbers
|
||||
@ -217,20 +213,40 @@ def getClusterHealth(zkhandler, node_list, vm_list, ceph_osd_list):
|
||||
|
||||
|
||||
def getNodeHealth(zkhandler, node_list):
|
||||
# Get the health state of all nodes
|
||||
node_health_reads = list()
|
||||
for node in node_list:
|
||||
node_health_reads += [
|
||||
("node.monitoring.health", node),
|
||||
("node.monitoring.plugins", node),
|
||||
]
|
||||
all_node_health_details = zkhandler.read_many(node_health_reads)
|
||||
# Parse out the Node health details
|
||||
node_health = dict()
|
||||
for index, node in enumerate(node_list):
|
||||
for nidx, node in enumerate(node_list):
|
||||
# Split the large list of return values by the IDX of this node
|
||||
# Each node result is 2 fields long
|
||||
pos_start = nidx * 2
|
||||
pos_end = nidx * 2 + 2
|
||||
node_health_value, node_health_plugins = tuple(
|
||||
all_node_health_details[pos_start:pos_end]
|
||||
)
|
||||
node_health_details = pvc_node.getNodeHealthDetails(
|
||||
zkhandler, node, node_health_plugins.split()
|
||||
)
|
||||
|
||||
node_health_messages = list()
|
||||
node_health_value = node["health"]
|
||||
for entry in node["health_details"]:
|
||||
for entry in node_health_details:
|
||||
if entry["health_delta"] > 0:
|
||||
node_health_messages.append(f"'{entry['name']}': {entry['message']}")
|
||||
|
||||
node_health_entry = {
|
||||
"health": node_health_value,
|
||||
"health": int(node_health_value)
|
||||
if isinstance(node_health_value, int)
|
||||
else node_health_value,
|
||||
"messages": node_health_messages,
|
||||
}
|
||||
|
||||
node_health[node["name"]] = node_health_entry
|
||||
node_health[node] = node_health_entry
|
||||
|
||||
return node_health
|
||||
|
||||
@ -239,78 +255,156 @@ def getClusterInformation(zkhandler):
|
||||
# Get cluster maintenance state
|
||||
maintenance_state = zkhandler.read("base.config.maintenance")
|
||||
|
||||
# Get node information object list
|
||||
retcode, node_list = pvc_node.get_list(zkhandler, None)
|
||||
|
||||
# Get primary node
|
||||
primary_node = common.getPrimaryNode(zkhandler)
|
||||
|
||||
# Get PVC version of primary node
|
||||
pvc_version = "0.0.0"
|
||||
for node in node_list:
|
||||
if node["name"] == primary_node:
|
||||
pvc_version = node["pvc_version"]
|
||||
|
||||
# Get vm information object list
|
||||
retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None, None)
|
||||
|
||||
# Get network information object list
|
||||
retcode, network_list = pvc_network.get_list(zkhandler, None, None)
|
||||
|
||||
# Get storage information object list
|
||||
retcode, ceph_osd_list = pvc_ceph.get_list_osd(zkhandler, None)
|
||||
retcode, ceph_pool_list = pvc_ceph.get_list_pool(zkhandler, None)
|
||||
retcode, ceph_volume_list = pvc_ceph.get_list_volume(zkhandler, None, None)
|
||||
retcode, ceph_snapshot_list = pvc_ceph.get_list_snapshot(
|
||||
zkhandler, None, None, None
|
||||
maintenance_state, primary_node = zkhandler.read_many(
|
||||
[
|
||||
("base.config.maintenance"),
|
||||
("base.config.primary_node"),
|
||||
]
|
||||
)
|
||||
|
||||
# Determine, for each subsection, the total count
|
||||
# Get PVC version of primary node
|
||||
pvc_version = zkhandler.read(("node.data.pvc_version", primary_node))
|
||||
|
||||
# Get the list of Nodes
|
||||
node_list = zkhandler.children("base.node")
|
||||
node_count = len(node_list)
|
||||
vm_count = len(vm_list)
|
||||
network_count = len(network_list)
|
||||
ceph_osd_count = len(ceph_osd_list)
|
||||
ceph_pool_count = len(ceph_pool_list)
|
||||
ceph_volume_count = len(ceph_volume_list)
|
||||
ceph_snapshot_count = len(ceph_snapshot_list)
|
||||
|
||||
# Format the Node states
|
||||
# Get the daemon and domain states of all Nodes
|
||||
node_state_reads = list()
|
||||
for node in node_list:
|
||||
node_state_reads += [
|
||||
("node.state.daemon", node),
|
||||
("node.state.domain", node),
|
||||
]
|
||||
all_node_states = zkhandler.read_many(node_state_reads)
|
||||
# Parse out the Node states
|
||||
node_data = list()
|
||||
formatted_node_states = {"total": node_count}
|
||||
for state in common.node_state_combinations:
|
||||
state_count = 0
|
||||
for node in node_list:
|
||||
node_state = f"{node['daemon_state']},{node['domain_state']}"
|
||||
if node_state == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_node_states[state] = state_count
|
||||
for nidx, node in enumerate(node_list):
|
||||
# Split the large list of return values by the IDX of this node
|
||||
# Each node result is 2 fields long
|
||||
pos_start = nidx * 2
|
||||
pos_end = nidx * 2 + 2
|
||||
node_daemon_state, node_domain_state = tuple(all_node_states[pos_start:pos_end])
|
||||
node_data.append(
|
||||
{
|
||||
"name": node,
|
||||
"daemon_state": node_daemon_state,
|
||||
"domain_state": node_domain_state,
|
||||
}
|
||||
)
|
||||
node_state = f"{node_daemon_state},{node_domain_state}"
|
||||
# Add to the count for this node's state
|
||||
if node_state in common.node_state_combinations:
|
||||
if formatted_node_states.get(node_state) is not None:
|
||||
formatted_node_states[node_state] += 1
|
||||
else:
|
||||
formatted_node_states[node_state] = 1
|
||||
|
||||
# Format the VM states
|
||||
# Get the list of VMs
|
||||
vm_list = zkhandler.children("base.domain")
|
||||
vm_count = len(vm_list)
|
||||
# Get the states of all VMs
|
||||
vm_state_reads = list()
|
||||
for vm in vm_list:
|
||||
vm_state_reads += [
|
||||
("domain", vm),
|
||||
("domain.state", vm),
|
||||
]
|
||||
all_vm_states = zkhandler.read_many(vm_state_reads)
|
||||
# Parse out the VM states
|
||||
vm_data = list()
|
||||
formatted_vm_states = {"total": vm_count}
|
||||
for state in common.vm_state_combinations:
|
||||
state_count = 0
|
||||
for vm in vm_list:
|
||||
if vm["state"] == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_vm_states[state] = state_count
|
||||
for vidx, vm in enumerate(vm_list):
|
||||
# Split the large list of return values by the IDX of this VM
|
||||
# Each VM result is 2 field long
|
||||
pos_start = vidx * 2
|
||||
pos_end = vidx * 2 + 2
|
||||
vm_name, vm_state = tuple(all_vm_states[pos_start:pos_end])
|
||||
vm_data.append(
|
||||
{
|
||||
"uuid": vm,
|
||||
"name": vm_name,
|
||||
"state": vm_state,
|
||||
}
|
||||
)
|
||||
# Add to the count for this VM's state
|
||||
if vm_state in common.vm_state_combinations:
|
||||
if formatted_vm_states.get(vm_state) is not None:
|
||||
formatted_vm_states[vm_state] += 1
|
||||
else:
|
||||
formatted_vm_states[vm_state] = 1
|
||||
|
||||
# Format the OSD states
|
||||
# Get the list of Ceph OSDs
|
||||
ceph_osd_list = zkhandler.children("base.osd")
|
||||
ceph_osd_count = len(ceph_osd_list)
|
||||
# Get the states of all OSDs ("stat" is not a typo since we're reading stats; states are in
|
||||
# the stats JSON object)
|
||||
osd_stat_reads = list()
|
||||
for osd in ceph_osd_list:
|
||||
osd_stat_reads += [("osd.stats", osd)]
|
||||
all_osd_stats = zkhandler.read_many(osd_stat_reads)
|
||||
# Parse out the OSD states
|
||||
osd_data = list()
|
||||
formatted_osd_states = {"total": ceph_osd_count}
|
||||
up_texts = {1: "up", 0: "down"}
|
||||
in_texts = {1: "in", 0: "out"}
|
||||
formatted_osd_states = {"total": ceph_osd_count}
|
||||
for state in common.ceph_osd_state_combinations:
|
||||
state_count = 0
|
||||
for ceph_osd in ceph_osd_list:
|
||||
ceph_osd_state = f"{up_texts[ceph_osd['stats']['up']]},{in_texts[ceph_osd['stats']['in']]}"
|
||||
if ceph_osd_state == state:
|
||||
state_count += 1
|
||||
if state_count > 0:
|
||||
formatted_osd_states[state] = state_count
|
||||
for oidx, osd in enumerate(ceph_osd_list):
|
||||
# Split the large list of return values by the IDX of this OSD
|
||||
# Each OSD result is 1 field long, so just use the IDX
|
||||
_osd_stats = all_osd_stats[oidx]
|
||||
# We have to load this JSON object and get our up/in states from it
|
||||
osd_stats = loads(_osd_stats)
|
||||
# Get our states
|
||||
osd_up = up_texts[osd_stats["up"]]
|
||||
osd_in = in_texts[osd_stats["in"]]
|
||||
osd_data.append(
|
||||
{
|
||||
"id": osd,
|
||||
"up": osd_up,
|
||||
"in": osd_in,
|
||||
}
|
||||
)
|
||||
osd_state = f"{osd_up},{osd_in}"
|
||||
# Add to the count for this OSD's state
|
||||
if osd_state in common.ceph_osd_state_combinations:
|
||||
if formatted_osd_states.get(osd_state) is not None:
|
||||
formatted_osd_states[osd_state] += 1
|
||||
else:
|
||||
formatted_osd_states[osd_state] = 1
|
||||
|
||||
# Get the list of Networks
|
||||
network_list = zkhandler.children("base.network")
|
||||
network_count = len(network_list)
|
||||
|
||||
# Get the list of Ceph pools
|
||||
ceph_pool_list = zkhandler.children("base.pool")
|
||||
ceph_pool_count = len(ceph_pool_list)
|
||||
|
||||
# Get the list of Ceph volumes
|
||||
ceph_volume_list = list()
|
||||
for pool in ceph_pool_list:
|
||||
ceph_volume_list_pool = zkhandler.children(("volume", pool))
|
||||
if ceph_volume_list_pool is not None:
|
||||
ceph_volume_list += [f"{pool}/{volume}" for volume in ceph_volume_list_pool]
|
||||
ceph_volume_count = len(ceph_volume_list)
|
||||
|
||||
# Get the list of Ceph snapshots
|
||||
ceph_snapshot_list = list()
|
||||
for volume in ceph_volume_list:
|
||||
ceph_snapshot_list_volume = zkhandler.children(("snapshot", volume))
|
||||
if ceph_snapshot_list_volume is not None:
|
||||
ceph_snapshot_list += [
|
||||
f"{volume}@{snapshot}" for snapshot in ceph_snapshot_list_volume
|
||||
]
|
||||
ceph_snapshot_count = len(ceph_snapshot_list)
|
||||
|
||||
# Get the list of faults
|
||||
faults_data = faults.getAllFaults(zkhandler)
|
||||
|
||||
# Format the status data
|
||||
cluster_information = {
|
||||
"cluster_health": getClusterHealthFromFaults(zkhandler),
|
||||
"cluster_health": getClusterHealthFromFaults(zkhandler, faults_data),
|
||||
"node_health": getNodeHealth(zkhandler, node_list),
|
||||
"maintenance": maintenance_state,
|
||||
"primary_node": primary_node,
|
||||
@ -323,6 +417,12 @@ def getClusterInformation(zkhandler):
|
||||
"pools": ceph_pool_count,
|
||||
"volumes": ceph_volume_count,
|
||||
"snapshots": ceph_snapshot_count,
|
||||
"detail": {
|
||||
"node": node_data,
|
||||
"vm": vm_data,
|
||||
"osd": osd_data,
|
||||
"faults": faults_data,
|
||||
},
|
||||
}
|
||||
|
||||
return cluster_information
|
||||
@ -337,6 +437,157 @@ def get_info(zkhandler):
|
||||
return False, "ERROR: Failed to obtain cluster information!"
|
||||
|
||||
|
||||
def get_metrics(zkhandler):
|
||||
# Get general cluster information
|
||||
status_retflag, status_data = get_info(zkhandler)
|
||||
if not status_retflag:
|
||||
return False, "Error: Status data threw error"
|
||||
|
||||
faults_data = status_data["detail"]["faults"]
|
||||
node_data = status_data["detail"]["node"]
|
||||
vm_data = status_data["detail"]["vm"]
|
||||
osd_data = status_data["detail"]["osd"]
|
||||
|
||||
output_lines = list()
|
||||
|
||||
output_lines.append("# HELP pvc_info PVC cluster information")
|
||||
output_lines.append("# TYPE pvc_info gauge")
|
||||
output_lines.append(
|
||||
f"pvc_info{{primary_node=\"{status_data['primary_node']}\", version=\"{status_data['pvc_version']}\", upstream_ip=\"{status_data['upstream_ip']}\"}} 1"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_maintenance PVC cluster maintenance state")
|
||||
output_lines.append("# TYPE pvc_cluster_maintenance gauge")
|
||||
output_lines.append(
|
||||
f"pvc_cluster_maintenance {1 if bool(strtobool(status_data['maintenance'])) else 0}"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_health PVC cluster health status")
|
||||
output_lines.append("# TYPE pvc_cluster_health gauge")
|
||||
output_lines.append(f"pvc_cluster_health {status_data['cluster_health']['health']}")
|
||||
|
||||
output_lines.append("# HELP pvc_cluster_faults PVC cluster new faults")
|
||||
output_lines.append("# TYPE pvc_cluster_faults gauge")
|
||||
fault_map = dict()
|
||||
for fault_type in common.fault_state_combinations:
|
||||
fault_map[fault_type] = 0
|
||||
for fault in faults_data:
|
||||
fault_map[fault["status"]] += 1
|
||||
for fault_type in fault_map:
|
||||
output_lines.append(
|
||||
f'pvc_cluster_faults{{status="{fault_type}"}} {fault_map[fault_type]}'
|
||||
)
|
||||
|
||||
# output_lines.append("# HELP pvc_cluster_faults PVC cluster health faults")
|
||||
# output_lines.append("# TYPE pvc_cluster_faults gauge")
|
||||
# for fault_msg in status_data["cluster_health"]["messages"]:
|
||||
# output_lines.append(
|
||||
# f"pvc_cluster_faults{{id=\"{fault_msg['id']}\", message=\"{fault_msg['text']}\"}} {fault_msg['health_delta']}"
|
||||
# )
|
||||
|
||||
output_lines.append("# HELP pvc_node_health PVC cluster node health status")
|
||||
output_lines.append("# TYPE pvc_node_health gauge")
|
||||
for node in status_data["node_health"]:
|
||||
if isinstance(status_data["node_health"][node]["health"], int):
|
||||
output_lines.append(
|
||||
f"pvc_node_health{{node=\"{node}\"}} {status_data['node_health'][node]['health']}"
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_node_daemon_states PVC Node daemon state counts")
|
||||
output_lines.append("# TYPE pvc_node_daemon_states gauge")
|
||||
node_daemon_state_map = dict()
|
||||
for state in set([s.split(",")[0] for s in common.node_state_combinations]):
|
||||
node_daemon_state_map[state] = 0
|
||||
for node in node_data:
|
||||
node_daemon_state_map[node["daemon_state"]] += 1
|
||||
for state in node_daemon_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_node_daemon_states{{state="{state}"}} {node_daemon_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_node_domain_states PVC Node domain state counts")
|
||||
output_lines.append("# TYPE pvc_node_domain_states gauge")
|
||||
node_domain_state_map = dict()
|
||||
for state in set([s.split(",")[1] for s in common.node_state_combinations]):
|
||||
node_domain_state_map[state] = 0
|
||||
for node in node_data:
|
||||
node_domain_state_map[node["domain_state"]] += 1
|
||||
for state in node_domain_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_node_domain_states{{state="{state}"}} {node_domain_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_vm_states PVC VM state counts")
|
||||
output_lines.append("# TYPE pvc_vm_states gauge")
|
||||
vm_state_map = dict()
|
||||
for state in set(common.vm_state_combinations):
|
||||
vm_state_map[state] = 0
|
||||
for vm in vm_data:
|
||||
vm_state_map[vm["state"]] += 1
|
||||
for state in vm_state_map:
|
||||
output_lines.append(f'pvc_vm_states{{state="{state}"}} {vm_state_map[state]}')
|
||||
|
||||
output_lines.append("# HELP pvc_osd_up_states PVC OSD up state counts")
|
||||
output_lines.append("# TYPE pvc_osd_up_states gauge")
|
||||
osd_up_state_map = dict()
|
||||
for state in set([s.split(",")[0] for s in common.ceph_osd_state_combinations]):
|
||||
osd_up_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["up"] == "up":
|
||||
osd_up_state_map["up"] += 1
|
||||
else:
|
||||
osd_up_state_map["down"] += 1
|
||||
for state in osd_up_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_osd_up_states{{state="{state}"}} {osd_up_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_osd_in_states PVC OSD in state counts")
|
||||
output_lines.append("# TYPE pvc_osd_in_states gauge")
|
||||
osd_in_state_map = dict()
|
||||
for state in set([s.split(",")[1] for s in common.ceph_osd_state_combinations]):
|
||||
osd_in_state_map[state] = 0
|
||||
for osd in osd_data:
|
||||
if osd["in"] == "in":
|
||||
osd_in_state_map["in"] += 1
|
||||
else:
|
||||
osd_in_state_map["out"] += 1
|
||||
for state in osd_in_state_map:
|
||||
output_lines.append(
|
||||
f'pvc_osd_in_states{{state="{state}"}} {osd_in_state_map[state]}'
|
||||
)
|
||||
|
||||
output_lines.append("# HELP pvc_nodes PVC Node count")
|
||||
output_lines.append("# TYPE pvc_nodes gauge")
|
||||
output_lines.append(f"pvc_nodes {status_data['nodes']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_vms PVC VM count")
|
||||
output_lines.append("# TYPE pvc_vms gauge")
|
||||
output_lines.append(f"pvc_vms {status_data['vms']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_osds PVC OSD count")
|
||||
output_lines.append("# TYPE pvc_osds gauge")
|
||||
output_lines.append(f"pvc_osds {status_data['osds']['total']}")
|
||||
|
||||
output_lines.append("# HELP pvc_networks PVC Network count")
|
||||
output_lines.append("# TYPE pvc_networks gauge")
|
||||
output_lines.append(f"pvc_networks {status_data['networks']}")
|
||||
|
||||
output_lines.append("# HELP pvc_pools PVC Storage Pool count")
|
||||
output_lines.append("# TYPE pvc_pools gauge")
|
||||
output_lines.append(f"pvc_pools {status_data['pools']}")
|
||||
|
||||
output_lines.append("# HELP pvc_volumes PVC Storage Volume count")
|
||||
output_lines.append("# TYPE pvc_volumes gauge")
|
||||
output_lines.append(f"pvc_volumes {status_data['volumes']}")
|
||||
|
||||
output_lines.append("# HELP pvc_snapshots PVC Storage Snapshot count")
|
||||
output_lines.append("# TYPE pvc_snapshots gauge")
|
||||
output_lines.append(f"pvc_snapshots {status_data['snapshots']}")
|
||||
|
||||
return True, "\n".join(output_lines) + "\n"
|
||||
|
||||
|
||||
def cluster_initialize(zkhandler, overwrite=False):
|
||||
# Abort if we've initialized the cluster before
|
||||
if zkhandler.exists("base.config.primary_node") and not overwrite:
|
||||
|
@ -401,13 +401,23 @@ def getDomainTags(zkhandler, dom_uuid):
|
||||
"""
|
||||
tags = list()
|
||||
|
||||
for tag in zkhandler.children(("domain.meta.tags", dom_uuid)):
|
||||
tag_type = zkhandler.read(("domain.meta.tags", dom_uuid, "tag.type", tag))
|
||||
protected = bool(
|
||||
strtobool(
|
||||
zkhandler.read(("domain.meta.tags", dom_uuid, "tag.protected", tag))
|
||||
)
|
||||
)
|
||||
all_tags = zkhandler.children(("domain.meta.tags", dom_uuid))
|
||||
|
||||
tag_reads = list()
|
||||
for tag in all_tags:
|
||||
tag_reads += [
|
||||
("domain.meta.tags", dom_uuid, "tag.type", tag),
|
||||
("domain.meta.tags", dom_uuid, "tag.protected", tag),
|
||||
]
|
||||
all_tag_data = zkhandler.read_many(tag_reads)
|
||||
|
||||
for tidx, tag in enumerate(all_tags):
|
||||
# Split the large list of return values by the IDX of this tag
|
||||
# Each tag result is 2 fields long
|
||||
pos_start = tidx * 2
|
||||
pos_end = tidx * 2 + 2
|
||||
tag_type, protected = tuple(all_tag_data[pos_start:pos_end])
|
||||
protected = bool(strtobool(protected))
|
||||
tags.append({"name": tag, "type": tag_type, "protected": protected})
|
||||
|
||||
return tags
|
||||
@ -422,19 +432,34 @@ def getDomainMetadata(zkhandler, dom_uuid):
|
||||
|
||||
The UUID must be validated before calling this function!
|
||||
"""
|
||||
domain_node_limit = zkhandler.read(("domain.meta.node_limit", dom_uuid))
|
||||
domain_node_selector = zkhandler.read(("domain.meta.node_selector", dom_uuid))
|
||||
domain_node_autostart = zkhandler.read(("domain.meta.autostart", dom_uuid))
|
||||
domain_migration_method = zkhandler.read(("domain.meta.migrate_method", dom_uuid))
|
||||
(
|
||||
domain_node_limit,
|
||||
domain_node_selector,
|
||||
domain_node_autostart,
|
||||
domain_migration_method,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("domain.meta.node_limit", dom_uuid),
|
||||
("domain.meta.node_selector", dom_uuid),
|
||||
("domain.meta.autostart", dom_uuid),
|
||||
("domain.meta.migrate_method", dom_uuid),
|
||||
]
|
||||
)
|
||||
|
||||
if not domain_node_limit:
|
||||
domain_node_limit = None
|
||||
else:
|
||||
domain_node_limit = domain_node_limit.split(",")
|
||||
|
||||
if not domain_node_selector or domain_node_selector == "none":
|
||||
domain_node_selector = None
|
||||
|
||||
if not domain_node_autostart:
|
||||
domain_node_autostart = None
|
||||
|
||||
if not domain_migration_method or domain_migration_method == "none":
|
||||
domain_migration_method = None
|
||||
|
||||
return (
|
||||
domain_node_limit,
|
||||
domain_node_selector,
|
||||
@ -451,10 +476,25 @@ def getInformationFromXML(zkhandler, uuid):
|
||||
Gather information about a VM from the Libvirt XML configuration in the Zookeper database
|
||||
and return a dict() containing it.
|
||||
"""
|
||||
domain_state = zkhandler.read(("domain.state", uuid))
|
||||
domain_node = zkhandler.read(("domain.node", uuid))
|
||||
domain_lastnode = zkhandler.read(("domain.last_node", uuid))
|
||||
domain_failedreason = zkhandler.read(("domain.failed_reason", uuid))
|
||||
(
|
||||
domain_state,
|
||||
domain_node,
|
||||
domain_lastnode,
|
||||
domain_failedreason,
|
||||
domain_profile,
|
||||
domain_vnc,
|
||||
stats_data,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("domain.state", uuid),
|
||||
("domain.node", uuid),
|
||||
("domain.last_node", uuid),
|
||||
("domain.failed_reason", uuid),
|
||||
("domain.profile", uuid),
|
||||
("domain.console.vnc", uuid),
|
||||
("domain.stats", uuid),
|
||||
]
|
||||
)
|
||||
|
||||
(
|
||||
domain_node_limit,
|
||||
@ -462,19 +502,17 @@ def getInformationFromXML(zkhandler, uuid):
|
||||
domain_node_autostart,
|
||||
domain_migration_method,
|
||||
) = getDomainMetadata(zkhandler, uuid)
|
||||
domain_tags = getDomainTags(zkhandler, uuid)
|
||||
domain_profile = zkhandler.read(("domain.profile", uuid))
|
||||
|
||||
domain_vnc = zkhandler.read(("domain.console.vnc", uuid))
|
||||
domain_tags = getDomainTags(zkhandler, uuid)
|
||||
|
||||
if domain_vnc:
|
||||
domain_vnc_listen, domain_vnc_port = domain_vnc.split(":")
|
||||
else:
|
||||
domain_vnc_listen = "None"
|
||||
domain_vnc_port = "None"
|
||||
domain_vnc_listen = None
|
||||
domain_vnc_port = None
|
||||
|
||||
parsed_xml = getDomainXML(zkhandler, uuid)
|
||||
|
||||
stats_data = zkhandler.read(("domain.stats", uuid))
|
||||
if stats_data is not None:
|
||||
try:
|
||||
stats_data = loads(stats_data)
|
||||
@ -491,6 +529,7 @@ def getInformationFromXML(zkhandler, uuid):
|
||||
domain_vcpu,
|
||||
domain_vcputopo,
|
||||
) = getDomainMainDetails(parsed_xml)
|
||||
|
||||
domain_networks = getDomainNetworks(parsed_xml, stats_data)
|
||||
|
||||
(
|
||||
|
@ -95,12 +95,24 @@ def getFault(zkhandler, fault_id):
|
||||
return None
|
||||
|
||||
fault_id = fault_id
|
||||
fault_last_time = zkhandler.read(("faults.last_time", fault_id))
|
||||
fault_first_time = zkhandler.read(("faults.first_time", fault_id))
|
||||
fault_ack_time = zkhandler.read(("faults.ack_time", fault_id))
|
||||
fault_status = zkhandler.read(("faults.status", fault_id))
|
||||
fault_delta = int(zkhandler.read(("faults.delta", fault_id)))
|
||||
fault_message = zkhandler.read(("faults.message", fault_id))
|
||||
|
||||
(
|
||||
fault_last_time,
|
||||
fault_first_time,
|
||||
fault_ack_time,
|
||||
fault_status,
|
||||
fault_delta,
|
||||
fault_message,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("faults.last_time", fault_id),
|
||||
("faults.first_time", fault_id),
|
||||
("faults.ack_time", fault_id),
|
||||
("faults.status", fault_id),
|
||||
("faults.delta", fault_id),
|
||||
("faults.message", fault_id),
|
||||
]
|
||||
)
|
||||
|
||||
# Acknowledged faults have a delta of 0
|
||||
if fault_ack_time != "":
|
||||
@ -112,7 +124,7 @@ def getFault(zkhandler, fault_id):
|
||||
"first_reported": fault_first_time,
|
||||
"acknowledged_at": fault_ack_time,
|
||||
"status": fault_status,
|
||||
"health_delta": fault_delta,
|
||||
"health_delta": int(fault_delta),
|
||||
"message": fault_message,
|
||||
}
|
||||
|
||||
@ -126,11 +138,42 @@ def getAllFaults(zkhandler, sort_key="last_reported"):
|
||||
|
||||
all_faults = zkhandler.children(("base.faults"))
|
||||
|
||||
faults_detail = list()
|
||||
|
||||
faults_reads = list()
|
||||
for fault_id in all_faults:
|
||||
fault_detail = getFault(zkhandler, fault_id)
|
||||
faults_detail.append(fault_detail)
|
||||
faults_reads += [
|
||||
("faults.last_time", fault_id),
|
||||
("faults.first_time", fault_id),
|
||||
("faults.ack_time", fault_id),
|
||||
("faults.status", fault_id),
|
||||
("faults.delta", fault_id),
|
||||
("faults.message", fault_id),
|
||||
]
|
||||
all_faults_data = list(zkhandler.read_many(faults_reads))
|
||||
|
||||
faults_detail = list()
|
||||
for fidx, fault_id in enumerate(all_faults):
|
||||
# Split the large list of return values by the IDX of this fault
|
||||
# Each fault result is 6 fields long
|
||||
pos_start = fidx * 6
|
||||
pos_end = fidx * 6 + 6
|
||||
(
|
||||
fault_last_time,
|
||||
fault_first_time,
|
||||
fault_ack_time,
|
||||
fault_status,
|
||||
fault_delta,
|
||||
fault_message,
|
||||
) = tuple(all_faults_data[pos_start:pos_end])
|
||||
fault_output = {
|
||||
"id": fault_id,
|
||||
"last_reported": fault_last_time,
|
||||
"first_reported": fault_first_time,
|
||||
"acknowledged_at": fault_ack_time,
|
||||
"status": fault_status,
|
||||
"health_delta": int(fault_delta),
|
||||
"message": fault_message,
|
||||
}
|
||||
faults_detail.append(fault_output)
|
||||
|
||||
sorted_faults = sorted(faults_detail, key=lambda x: x[sort_key])
|
||||
# Sort newest-first for time-based sorts
|
||||
|
@ -142,19 +142,37 @@ def getNetworkACLs(zkhandler, vni, _direction):
|
||||
|
||||
|
||||
def getNetworkInformation(zkhandler, vni):
|
||||
description = zkhandler.read(("network", vni))
|
||||
nettype = zkhandler.read(("network.type", vni))
|
||||
mtu = zkhandler.read(("network.mtu", vni))
|
||||
domain = zkhandler.read(("network.domain", vni))
|
||||
name_servers = zkhandler.read(("network.nameservers", vni))
|
||||
ip6_network = zkhandler.read(("network.ip6.network", vni))
|
||||
ip6_gateway = zkhandler.read(("network.ip6.gateway", vni))
|
||||
dhcp6_flag = zkhandler.read(("network.ip6.dhcp", vni))
|
||||
ip4_network = zkhandler.read(("network.ip4.network", vni))
|
||||
ip4_gateway = zkhandler.read(("network.ip4.gateway", vni))
|
||||
dhcp4_flag = zkhandler.read(("network.ip4.dhcp", vni))
|
||||
dhcp4_start = zkhandler.read(("network.ip4.dhcp_start", vni))
|
||||
dhcp4_end = zkhandler.read(("network.ip4.dhcp_end", vni))
|
||||
(
|
||||
description,
|
||||
nettype,
|
||||
mtu,
|
||||
domain,
|
||||
name_servers,
|
||||
ip6_network,
|
||||
ip6_gateway,
|
||||
dhcp6_flag,
|
||||
ip4_network,
|
||||
ip4_gateway,
|
||||
dhcp4_flag,
|
||||
dhcp4_start,
|
||||
dhcp4_end,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("network", vni),
|
||||
("network.type", vni),
|
||||
("network.mtu", vni),
|
||||
("network.domain", vni),
|
||||
("network.nameservers", vni),
|
||||
("network.ip6.network", vni),
|
||||
("network.ip6.gateway", vni),
|
||||
("network.ip6.dhcp", vni),
|
||||
("network.ip4.network", vni),
|
||||
("network.ip4.gateway", vni),
|
||||
("network.ip4.dhcp", vni),
|
||||
("network.ip4.dhcp_start", vni),
|
||||
("network.ip4.dhcp_end", vni),
|
||||
]
|
||||
)
|
||||
|
||||
# Construct a data structure to represent the data
|
||||
network_information = {
|
||||
@ -818,31 +836,45 @@ def getSRIOVVFInformation(zkhandler, node, vf):
|
||||
if not zkhandler.exists(("node.sriov.vf", node, "sriov_vf", vf)):
|
||||
return []
|
||||
|
||||
pf = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pf", vf))
|
||||
mtu = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mtu", vf))
|
||||
mac = zkhandler.read(("node.sriov.vf", node, "sriov_vf.mac", vf))
|
||||
vlan_id = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf))
|
||||
vlan_qos = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf))
|
||||
tx_rate_min = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf)
|
||||
(
|
||||
pf,
|
||||
mtu,
|
||||
mac,
|
||||
vlan_id,
|
||||
vlan_qos,
|
||||
tx_rate_min,
|
||||
tx_rate_max,
|
||||
link_state,
|
||||
spoof_check,
|
||||
trust,
|
||||
query_rss,
|
||||
pci_domain,
|
||||
pci_bus,
|
||||
pci_slot,
|
||||
pci_function,
|
||||
used,
|
||||
used_by_domain,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("node.sriov.vf", node, "sriov_vf.pf", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.mtu", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.mac", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.vlan_id", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.vlan_qos", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_min", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.link_state", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.trust", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.config.query_rss", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.domain", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.bus", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.slot", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.pci.function", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.used", vf),
|
||||
("node.sriov.vf", node, "sriov_vf.used_by", vf),
|
||||
]
|
||||
)
|
||||
tx_rate_max = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.tx_rate_max", vf)
|
||||
)
|
||||
link_state = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.link_state", vf)
|
||||
)
|
||||
spoof_check = zkhandler.read(
|
||||
("node.sriov.vf", node, "sriov_vf.config.spoof_check", vf)
|
||||
)
|
||||
trust = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.trust", vf))
|
||||
query_rss = zkhandler.read(("node.sriov.vf", node, "sriov_vf.config.query_rss", vf))
|
||||
pci_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.domain", vf))
|
||||
pci_bus = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.bus", vf))
|
||||
pci_slot = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.slot", vf))
|
||||
pci_function = zkhandler.read(("node.sriov.vf", node, "sriov_vf.pci.function", vf))
|
||||
used = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used", vf))
|
||||
used_by_domain = zkhandler.read(("node.sriov.vf", node, "sriov_vf.used_by", vf))
|
||||
|
||||
vf_information = {
|
||||
"phy": vf,
|
||||
|
@ -26,69 +26,134 @@ import json
|
||||
import daemon_lib.common as common
|
||||
|
||||
|
||||
def getNodeInformation(zkhandler, node_name):
|
||||
"""
|
||||
Gather information about a node from the Zookeeper database and return a dict() containing it.
|
||||
"""
|
||||
node_daemon_state = zkhandler.read(("node.state.daemon", node_name))
|
||||
node_coordinator_state = zkhandler.read(("node.state.router", node_name))
|
||||
node_domain_state = zkhandler.read(("node.state.domain", node_name))
|
||||
node_static_data = zkhandler.read(("node.data.static", node_name)).split()
|
||||
node_pvc_version = zkhandler.read(("node.data.pvc_version", node_name))
|
||||
node_cpu_count = int(node_static_data[0])
|
||||
node_kernel = node_static_data[1]
|
||||
node_os = node_static_data[2]
|
||||
node_arch = node_static_data[3]
|
||||
node_vcpu_allocated = int(zkhandler.read(("node.vcpu.allocated", node_name)))
|
||||
node_mem_total = int(zkhandler.read(("node.memory.total", node_name)))
|
||||
node_mem_allocated = int(zkhandler.read(("node.memory.allocated", node_name)))
|
||||
node_mem_provisioned = int(zkhandler.read(("node.memory.provisioned", node_name)))
|
||||
node_mem_used = int(zkhandler.read(("node.memory.used", node_name)))
|
||||
node_mem_free = int(zkhandler.read(("node.memory.free", node_name)))
|
||||
node_load = float(zkhandler.read(("node.cpu.load", node_name)))
|
||||
node_domains_count = int(
|
||||
zkhandler.read(("node.count.provisioned_domains", node_name))
|
||||
)
|
||||
node_running_domains = zkhandler.read(("node.running_domains", node_name)).split()
|
||||
try:
|
||||
node_health = int(zkhandler.read(("node.monitoring.health", node_name)))
|
||||
except Exception:
|
||||
node_health = "N/A"
|
||||
try:
|
||||
node_health_plugins = zkhandler.read(
|
||||
("node.monitoring.plugins", node_name)
|
||||
).split()
|
||||
except Exception:
|
||||
node_health_plugins = list()
|
||||
|
||||
node_health_details = list()
|
||||
def getNodeHealthDetails(zkhandler, node_name, node_health_plugins):
|
||||
plugin_reads = list()
|
||||
for plugin in node_health_plugins:
|
||||
plugin_last_run = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.last_run", plugin)
|
||||
)
|
||||
plugin_health_delta = zkhandler.read(
|
||||
plugin_reads += [
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.last_run",
|
||||
plugin,
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.health_delta",
|
||||
plugin,
|
||||
)
|
||||
)
|
||||
plugin_message = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.message", plugin)
|
||||
)
|
||||
plugin_data = zkhandler.read(
|
||||
("node.monitoring.data", node_name, "monitoring_plugin.data", plugin)
|
||||
)
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.message",
|
||||
plugin,
|
||||
),
|
||||
(
|
||||
"node.monitoring.data",
|
||||
node_name,
|
||||
"monitoring_plugin.data",
|
||||
plugin,
|
||||
),
|
||||
]
|
||||
all_plugin_data = list(zkhandler.read_many(plugin_reads))
|
||||
|
||||
node_health_details = list()
|
||||
for pidx, plugin in enumerate(node_health_plugins):
|
||||
# Split the large list of return values by the IDX of this plugin
|
||||
# Each plugin result is 4 fields long
|
||||
pos_start = pidx * 4
|
||||
pos_end = pidx * 4 + 4
|
||||
(
|
||||
plugin_last_run,
|
||||
plugin_health_delta,
|
||||
plugin_message,
|
||||
plugin_data,
|
||||
) = tuple(all_plugin_data[pos_start:pos_end])
|
||||
plugin_output = {
|
||||
"name": plugin,
|
||||
"last_run": int(plugin_last_run),
|
||||
"last_run": int(plugin_last_run) if plugin_last_run is not None else None,
|
||||
"health_delta": int(plugin_health_delta),
|
||||
"message": plugin_message,
|
||||
"data": json.loads(plugin_data),
|
||||
}
|
||||
node_health_details.append(plugin_output)
|
||||
|
||||
return node_health_details
|
||||
|
||||
|
||||
def getNodeInformation(zkhandler, node_name):
|
||||
"""
|
||||
Gather information about a node from the Zookeeper database and return a dict() containing it.
|
||||
"""
|
||||
|
||||
(
|
||||
node_daemon_state,
|
||||
node_coordinator_state,
|
||||
node_domain_state,
|
||||
node_pvc_version,
|
||||
_node_static_data,
|
||||
_node_vcpu_allocated,
|
||||
_node_mem_total,
|
||||
_node_mem_allocated,
|
||||
_node_mem_provisioned,
|
||||
_node_mem_used,
|
||||
_node_mem_free,
|
||||
_node_load,
|
||||
_node_domains_count,
|
||||
_node_running_domains,
|
||||
_node_health,
|
||||
_node_health_plugins,
|
||||
) = zkhandler.read_many(
|
||||
[
|
||||
("node.state.daemon", node_name),
|
||||
("node.state.router", node_name),
|
||||
("node.state.domain", node_name),
|
||||
("node.data.pvc_version", node_name),
|
||||
("node.data.static", node_name),
|
||||
("node.vcpu.allocated", node_name),
|
||||
("node.memory.total", node_name),
|
||||
("node.memory.allocated", node_name),
|
||||
("node.memory.provisioned", node_name),
|
||||
("node.memory.used", node_name),
|
||||
("node.memory.free", node_name),
|
||||
("node.cpu.load", node_name),
|
||||
("node.count.provisioned_domains", node_name),
|
||||
("node.running_domains", node_name),
|
||||
("node.monitoring.health", node_name),
|
||||
("node.monitoring.plugins", node_name),
|
||||
]
|
||||
)
|
||||
|
||||
node_static_data = _node_static_data.split()
|
||||
node_cpu_count = int(node_static_data[0])
|
||||
node_kernel = node_static_data[1]
|
||||
node_os = node_static_data[2]
|
||||
node_arch = node_static_data[3]
|
||||
|
||||
node_vcpu_allocated = int(_node_vcpu_allocated)
|
||||
node_mem_total = int(_node_mem_total)
|
||||
node_mem_allocated = int(_node_mem_allocated)
|
||||
node_mem_provisioned = int(_node_mem_provisioned)
|
||||
node_mem_used = int(_node_mem_used)
|
||||
node_mem_free = int(_node_mem_free)
|
||||
node_load = float(_node_load)
|
||||
node_domains_count = int(_node_domains_count)
|
||||
node_running_domains = _node_running_domains.split()
|
||||
|
||||
try:
|
||||
node_health = int(_node_health)
|
||||
except Exception:
|
||||
node_health = "N/A"
|
||||
|
||||
try:
|
||||
node_health_plugins = _node_health_plugins.split()
|
||||
except Exception:
|
||||
node_health_plugins = list()
|
||||
|
||||
node_health_details = getNodeHealthDetails(
|
||||
zkhandler, node_name, node_health_plugins
|
||||
)
|
||||
|
||||
# Construct a data structure to represent the data
|
||||
node_information = {
|
||||
"name": node_name,
|
||||
@ -269,6 +334,8 @@ def get_list(
|
||||
):
|
||||
node_list = []
|
||||
full_node_list = zkhandler.children("base.node")
|
||||
if full_node_list is None:
|
||||
full_node_list = list()
|
||||
full_node_list.sort()
|
||||
|
||||
if is_fuzzy and limit:
|
||||
|
@ -19,6 +19,7 @@
|
||||
#
|
||||
###############################################################################
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import time
|
||||
import uuid
|
||||
@ -239,10 +240,41 @@ class ZKHandler(object):
|
||||
# This path is invalid; this is likely due to missing schema entries, so return None
|
||||
return None
|
||||
|
||||
return self.zk_conn.get(path)[0].decode(self.encoding)
|
||||
res = self.zk_conn.get(path)
|
||||
return res[0].decode(self.encoding)
|
||||
except NoNodeError:
|
||||
return None
|
||||
|
||||
async def read_async(self, key):
|
||||
"""
|
||||
Read data from a key asynchronously
|
||||
"""
|
||||
try:
|
||||
path = self.get_schema_path(key)
|
||||
if path is None:
|
||||
# This path is invalid; this is likely due to missing schema entries, so return None
|
||||
return None
|
||||
|
||||
val = self.zk_conn.get_async(path)
|
||||
data = val.get()
|
||||
return data[0].decode(self.encoding)
|
||||
except NoNodeError:
|
||||
return None
|
||||
|
||||
async def _read_many(self, keys):
|
||||
"""
|
||||
Async runner for read_many
|
||||
"""
|
||||
res = await asyncio.gather(*(self.read_async(key) for key in keys))
|
||||
return tuple(res)
|
||||
|
||||
def read_many(self, keys):
|
||||
"""
|
||||
Read data from several keys, asynchronously. Returns a tuple of all key values once all
|
||||
reads are complete.
|
||||
"""
|
||||
return asyncio.run(self._read_many(keys))
|
||||
|
||||
def write(self, kvpairs):
|
||||
"""
|
||||
Create or update one or more keys' data
|
||||
|
11
debian/changelog
vendored
@ -1,3 +1,14 @@
|
||||
pvc (0.9.86-0) unstable; urgency=high
|
||||
|
||||
* [API Daemon] Significantly improves the performance of several commands via async Zookeeper calls and removal of superfluous backend calls.
|
||||
* [Docs] Improves the project README and updates screenshot images to show the current output and more functionality.
|
||||
* [API Daemon/CLI] Corrects some bugs in VM metainformation output.
|
||||
* [Node Daemon] Fixes resource reporting bugs from 0.9.81 and properly clears node resource numbers on a fence.
|
||||
* [Health Daemon] Adds a wait during pvchealthd startup until the node is in run state, to avoid erroneous faults during node bootup.
|
||||
* [API Daemon] Fixes an incorrect reference to legacy pvcapid.yaml file in migration script.
|
||||
|
||||
-- Joshua M. Boniface <joshua@boniface.me> Thu, 14 Dec 2023 14:46:29 -0500
|
||||
|
||||
pvc (0.9.85-0) unstable; urgency=high
|
||||
|
||||
* [Packaging] Fixes a dependency bug introduced in 0.9.84
|
||||
|
Before Width: | Height: | Size: 88 KiB |
Before Width: | Height: | Size: 41 KiB |
Before Width: | Height: | Size: 300 KiB |
Before Width: | Height: | Size: 42 KiB |
@ -33,7 +33,7 @@ import os
|
||||
import signal
|
||||
|
||||
# Daemon version
|
||||
version = "0.9.85"
|
||||
version = "0.9.86"
|
||||
|
||||
|
||||
##########################################################
|
||||
@ -80,6 +80,11 @@ def entrypoint():
|
||||
# Connect to Zookeeper and return our handler and current schema version
|
||||
zkhandler, _ = pvchealthd.util.zookeeper.connect(logger, config)
|
||||
|
||||
logger.out("Waiting for node daemon to be operating", state="s")
|
||||
while zkhandler.read(("node.state.daemon", config["node_hostname"])) != "run":
|
||||
sleep(5)
|
||||
logger.out("Node daemon in run state, continuing health daemon startup", state="s")
|
||||
|
||||
# Define a cleanup function
|
||||
def cleanup(failure=False):
|
||||
nonlocal logger, zkhandler, monitoring_instance
|
||||
|
BIN
images/0-integrated-help.png
Normal file
After Width: | Height: | Size: 100 KiB |
BIN
images/1-connection-management.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
images/10-provisioner.png
Normal file
After Width: | Height: | Size: 124 KiB |
BIN
images/11-prometheus-grafana.png
Normal file
After Width: | Height: | Size: 168 KiB |
BIN
images/2-cluster-details-and-output-formats.png
Normal file
After Width: | Height: | Size: 140 KiB |
BIN
images/3-node-information.png
Normal file
After Width: | Height: | Size: 97 KiB |
BIN
images/4-vm-information.png
Normal file
After Width: | Height: | Size: 109 KiB |
BIN
images/5-vm-details.png
Normal file
After Width: | Height: | Size: 136 KiB |
BIN
images/6-network-information.png
Normal file
After Width: | Height: | Size: 118 KiB |
BIN
images/7-storage-information.png
Normal file
After Width: | Height: | Size: 166 KiB |
BIN
images/8-vm-and-node-logs.png
Normal file
After Width: | Height: | Size: 177 KiB |
BIN
images/9-vm-and-worker-tasks.png
Normal file
After Width: | Height: | Size: 67 KiB |
Before Width: | Height: | Size: 49 KiB After Width: | Height: | Size: 49 KiB |
@ -70,7 +70,7 @@ def check_pvc(item, params, section):
|
||||
summary = f"Cluster health is {cluster_health}% (maintenance {maintenance})"
|
||||
|
||||
if len(cluster_messages) > 0:
|
||||
details = ", ".join(cluster_messages)
|
||||
details = ", ".join([m["text"] for m in cluster_messages])
|
||||
|
||||
if cluster_health <= 50 and maintenance == "off":
|
||||
state = State.CRIT
|
||||
|
@ -2555,7 +2555,9 @@
|
||||
],
|
||||
"refresh": "5s",
|
||||
"schemaVersion": 38,
|
||||
"tags": [],
|
||||
"tags": [
|
||||
"pvc"
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
@ -2592,6 +2594,6 @@
|
||||
"timezone": "",
|
||||
"title": "PVC Cluster",
|
||||
"uid": "fbddd9f9-aadb-4c97-8aea-57c29e5de234",
|
||||
"version": 56,
|
||||
"version": 57,
|
||||
"weekStart": ""
|
||||
}
|
@ -48,7 +48,7 @@ import re
|
||||
import json
|
||||
|
||||
# Daemon version
|
||||
version = "0.9.85"
|
||||
version = "0.9.86"
|
||||
|
||||
|
||||
##########################################################
|
||||
|
@ -115,6 +115,27 @@ def fence_node(node_name, zkhandler, config, logger):
|
||||
):
|
||||
migrateFromFencedNode(zkhandler, node_name, config, logger)
|
||||
|
||||
# Reset all node resource values
|
||||
logger.out(
|
||||
f"Resetting all resource values for dead node {node_name} to zero",
|
||||
state="i",
|
||||
prefix=f"fencing {node_name}",
|
||||
)
|
||||
zkhandler.write(
|
||||
[
|
||||
(("node.running_domains", node_name), "0"),
|
||||
(("node.count.provisioned_domains", node_name), "0"),
|
||||
(("node.cpu.load", node_name), "0"),
|
||||
(("node.vcpu.allocated", node_name), "0"),
|
||||
(("node.memory.total", node_name), "0"),
|
||||
(("node.memory.used", node_name), "0"),
|
||||
(("node.memory.free", node_name), "0"),
|
||||
(("node.memory.allocated", node_name), "0"),
|
||||
(("node.memory.provisioned", node_name), "0"),
|
||||
(("node.monitoring.health", node_name), None),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
# Migrate hosts away from a fenced node
|
||||
def migrateFromFencedNode(zkhandler, node_name, config, logger):
|
||||
|
@ -477,6 +477,10 @@ def collect_vm_stats(logger, config, zkhandler, this_node, queue):
|
||||
fixed_d_domain = this_node.d_domain.copy()
|
||||
for domain, instance in fixed_d_domain.items():
|
||||
if domain in this_node.domain_list:
|
||||
# Add the allocated memory to our memalloc value
|
||||
memalloc += instance.getmemory()
|
||||
memprov += instance.getmemory()
|
||||
vcpualloc += instance.getvcpus()
|
||||
if instance.getstate() == "start" and instance.getnode() == this_node.name:
|
||||
if instance.getdom() is not None:
|
||||
try:
|
||||
@ -532,11 +536,6 @@ def collect_vm_stats(logger, config, zkhandler, this_node, queue):
|
||||
continue
|
||||
domain_memory_stats = domain.memoryStats()
|
||||
domain_cpu_stats = domain.getCPUStats(True)[0]
|
||||
|
||||
# Add the allocated memory to our memalloc value
|
||||
memalloc += instance.getmemory()
|
||||
memprov += instance.getmemory()
|
||||
vcpualloc += instance.getvcpus()
|
||||
except Exception as e:
|
||||
if debug:
|
||||
try:
|
||||
@ -701,7 +700,7 @@ def node_keepalive(logger, config, zkhandler, this_node):
|
||||
|
||||
runtime_start = datetime.now()
|
||||
logger.out(
|
||||
"Starting node keepalive run at {datetime.now()}",
|
||||
f"Starting node keepalive run at {datetime.now()}",
|
||||
state="t",
|
||||
)
|
||||
|
||||
|
@ -44,7 +44,7 @@ from daemon_lib.vmbuilder import (
|
||||
)
|
||||
|
||||
# Daemon version
|
||||
version = "0.9.85"
|
||||
version = "0.9.86"
|
||||
|
||||
|
||||
config = cfg.get_configuration()
|
||||
|