Compare commits

...

225 Commits

Author SHA1 Message Date
71e4d0b32a Bump version to 0.9.28 2021-07-19 09:29:34 -04:00
f16bad4691 Revamp confirmation options for vm modify
Before, "-y"/"--yes" only confirmed the reboot portion. Instead, modify
this to confirm both the diff portion and the restart portion, and add
separate flags to bypass one or the other independently, ensuring the
administrator has lots of flexibility. UNSAFE mode implies "-y" so both
would be auto-confirmed if that option is set.
2021-07-19 00:25:43 -04:00
15d92c483f Bump version to 0.9.27 2021-07-19 00:03:40 -04:00
7dd17e71e7 Fix bug with VM editing with file
Current config is needed for the diff but it was in a conditional.
2021-07-19 00:02:19 -04:00
5be968123f Readd 1 second queue get timeout
Otherwise daemon stops will sometimes inexplicably block.
2021-07-18 22:17:57 -04:00
99fd7ebe63 Fix excessive CPU due to looping 2021-07-18 22:06:50 -04:00
cffc96d156 Fix failure in creating base keys 2021-07-18 21:00:23 -04:00
602093029c Bump version to 0.9.26 2021-07-18 20:49:52 -04:00
bd7a773d6b Add node log following functionality 2021-07-18 20:37:53 -04:00
8d671b3422 Add some tag tests to test-cluster.sh 2021-07-18 20:37:37 -04:00
2358ad6bbe Reduce the number of lines per call
500 was a lot every half second; 200 seems more reasonable. Even a fast
kernel boot should generate < 200 lines in half a second.
2021-07-18 20:23:45 -04:00
a0e9b57d39 Increase log line frequency 2021-07-18 20:19:59 -04:00
2d48127e9c Use even better/faster set comparison 2021-07-18 20:18:35 -04:00
55f2b00366 Add some spaces for better readability 2021-07-18 20:18:23 -04:00
ba257048ad Improve output formatting of node logs 2021-07-18 20:06:08 -04:00
b770e15a91 Fix final termination of logger
We need to do a bit more finagling with the logger on termination to
ensure that all messages are written and the queue drained before
actually terminating.
2021-07-18 19:53:00 -04:00
e23a65128a Remove del of logger item 2021-07-18 19:03:47 -04:00
982dfd52c6 Adjust date output format 2021-07-18 19:00:54 -04:00
3a2478ee0c Cleanly terminate logger on cleanup 2021-07-18 18:57:44 -04:00
a088aa4484 Add node log functions to API and CLI 2021-07-18 18:54:28 -04:00
323c7c41ae Implement node logging into Zookeeper
Adds the ability to send node daemon logs to Zookeeper to facilitate a
command like "pvc node log", similar to "pvc vm log". Each node stores
its logs in a separate tree under "/logs" which can then be combined or
queried. By default, set by config, only 2000 lines are kept.
2021-07-18 17:11:43 -04:00
cd1db3d587 Ensure node name is part of confing 2021-07-18 16:38:58 -04:00
401f102344 Add serial BIOS to default libvirt schema 2021-07-15 10:45:14 -04:00
4ac020888b Add some tag tests to test-cluster.sh 2021-07-14 15:02:03 -04:00
8f3b68d48a Mention multiple option for tags in VM define 2021-07-14 01:12:10 -04:00
6d4c26c8d8 Don't show tag line in info if no tags 2021-07-14 00:59:24 -04:00
75fb60b1b4 Add VM list filtering by tag
Uses same method as state or node filtering, rather than altering how
the main LIMIT field works.
2021-07-14 00:59:20 -04:00
9ea9ac3b8a Revamp tag handling and display
Add an additional protected class, limit manipulation to one at a time,
and ensure future flexibility. Also makes display consistent with other
VM elements.
2021-07-13 22:39:52 -04:00
27f1758791 Add tags manipulation to API
Also fixes some checks for Metadata too since these two actions are
almost identical, and adds tags to define endpoint.
2021-07-13 19:05:33 -04:00
c0a3467b70 Simplify VM metadata reads
Directly call the new common getDomainMetadata function to avoid
excessive Zookeeper calls for this information.
2021-07-13 19:05:33 -04:00
9a199992a1 Add functions for manipulating VM tags
Adds tags to schema (v3), to VM definition, adds function to modify
tags, adds function to get tags, and adds tags to VM data output.

Tags will enable more granular classification of VMs based either on
administrator configuration or from automated system events.
2021-07-13 19:05:33 -04:00
c6d552ae57 Rework success checks for IPMI fencing
Previously, if the node failed to restart, it was declared a "bad fence"
and no further action would be taken. However, there are some
situations, for instance critical hardware failures, where intelligent
systems will not attempt (or succeed at) starting up the node in such a
case, which would result in dead, known-offline nodes without recovery.

Tweak this behaviour somewhat. The main path of Reboot -> Check On ->
Success + fence-flush is retained, but some additional side-paths are
now defined:

1. We attempt to power "on" the chassis 1 second after the reboot, just
in case it is off and can be recovered. We then wait another 2 seconds
and check the power status (as we did before).

2. If the reboot succeeded, follow this series of choices:

    a. If the chassis is on, the fence succeeded.

    b. If the chassis is off, the fence "succeeded" as well.

    c. If the chassis is in some other state, the fence failed.

3. If the reboot failed, follow this series of choices:

    a. If the chassis is off, the fence itself failed, but we can treat
    it as "succeeded"" since the chassis is in a known-offline state.
    This is the most likely situation when there is a critical hardware
    failure, and the server's IPMI does not allow itself to start back
    up again.

    b. If the chassis is in any other state ("on" or unknown), the fence
    itself failed and we must treat this as a fence failure.

Overall, this should alleviate the aforementioned issue of a critical
failure rendering the node persistently "off" not triggering a
fence-flush and ensure fencing is more robust.
2021-07-13 17:54:41 -04:00
2e9f6ac201 Bump version to 0.9.25 2021-07-11 23:19:09 -04:00
f09849bedf Don't overwrite shutdown state on termination
Just a minor quibble and not really impactful.
2021-07-11 23:18:14 -04:00
8c975e5c46 Add chroot context manager example to debootstrap
Closes #132
2021-07-11 23:10:41 -04:00
c76149141f Only log ZK connections when persistent
Prevents spam in the API logs.
2021-07-10 23:35:49 -04:00
f00c4d07f4 Add date output to keepalive
Helps track when there is a log follow in "-o cat" mode.
2021-07-10 23:24:59 -04:00
20b66c10e1 Move two more commands to Rados library 2021-07-10 17:28:42 -04:00
cfeba50b17 Revert "Return to all command-based Ceph gathering"
This reverts commit 65d14ccd92.

This was actually a bad idea. For inexplicable reasons, running these
Ceph commands manually (not even via Python, but in a normal shell)
takes 7 * two orders of magnitude longer than running them with the
Rados module, so long in fact that some basic commands like "ceph
health" would sometimes take longer than the 1 second timeout to
complete. The Rados commands would however take about 1ms instead.

Despite the occasional issues when monitors drop out, the Rados module
is clearly far superior to the shell commands for any moderately-loaded
Ceph cluster. We can look into solving timeouts another way (perhaps
with Processes instead of Threads) at a later time.

Rados module "ceph health":
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001204 (s)
    b'{"checks":{},"status":"HEALTH_OK"}'
    0.001258 (s)
Command "ceph health":
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.772s
    user    0m0.707s
    sys     0m0.046s
    joshua@hv1.c.bonilan.net ~ $ time ceph health >/dev/null
    real    0m0.796s
    user    0m0.728s
    sys     0m0.054s
2021-07-10 03:47:45 -04:00
0699c48d10 Fix bad schema path name 2021-07-09 16:47:09 -04:00
551bae2518 Bump version to 0.9.24 2021-07-09 15:58:36 -04:00
4832245d9c Handle non-RBD disks and non-RBD errors better 2021-07-09 15:48:57 -04:00
2138f2f59f Fail VM removal on disk removal failures
Prevents bad states where the VM is "removed" but some of its disks
remain due to e.g. stuck watchers.

Rearrange the sequence so it goes stop, delete disks, then delete VM,
and then return a failure if any of the disk(s) fail to remove, allowing
the task to be rerun after fixing the problem.
2021-07-09 15:39:06 -04:00
d1d355a96b Avoid errors if stats data is None 2021-07-09 13:13:54 -04:00
2b5dc286ab Correct failure to get ceph_health data 2021-07-09 13:10:28 -04:00
c0c9327a7d Return an empty log if the value is None 2021-07-09 13:08:00 -04:00
5ffabcfef5 Avoid failing if we can't get the future data 2021-07-09 13:05:37 -04:00
330cf14638 Remove return statements in keepalive collectors
These seem to bork the keepalive timer process, so just remove them and
let it continue to press on.
2021-07-09 13:04:17 -04:00
9d0eb20197 Mention UUID matching in vm list help 2021-07-09 11:51:20 -04:00
3f5b7045a2 Allow raw listing of cluster names in CLI 2021-07-09 10:53:20 -04:00
80fe96b24d Add some additional docstrings 2021-07-07 12:28:08 -04:00
80f04ce8ee Remove connection renewal in state handler
Regenerating the ZK connection was fraught with issues, including
duplicate connections, strange failures to reconnect, and various other
wonkiness.

Instead let Kazoo handle states sensibly. Kazoo moves to SUSPENDED state
when it loses connectivity, and stays there indefinitely (based on
cursory tests). And Kazoo seems to always resume from this just fine on
its own. Thus all that hackery did nothing but complicate reconnection.

This therefore turns the listener into a purely informational function,
providing logs of when/why it failed, and we also add some additional
output messages during initial connection and final disconnection.
2021-07-07 11:55:12 -04:00
65d14ccd92 Return to all command-based Ceph gathering
Using the Rados module was very problematic, specifically because it had
no sensible timeout parameters and thus would hang for many seconds.
This has poor implications since it blocks further keepalives.

Instead, remove the Rados usage entirely and go back completely to using
manual OS commands to gather this information. While this may cause PID
exhaustion more quickly it's worthwhile to avoid failure scenarios when
Ceph stats time out.

Closes #137
2021-07-06 11:30:45 -04:00
adc022f55d Add missing install of pvcapid-worker.sh 2021-07-06 09:40:42 -04:00
7082982a33 Bump version to 0.9.23 2021-07-05 23:40:32 -04:00
5b6ef71909 Ensure daemon mode is updated on startup
Fixes the side effect of the previous bug during deploys of 0.9.22.
2021-07-05 23:39:23 -04:00
a8c28786dd Better handle empty ipaths in schema
When trying to write to sub-item paths that don't yet exist, the
previous method would just blindly write to whatever the root key is,
which is never what we actually want.

Instead, check explicitly for a "base path" situation, and handle that.
Then, if we try to get a subpath that isn't valid, return None. Finally
in the various functions, if the path is None, just continue (or return
false/None) and (try to) chug along.
2021-07-05 23:35:03 -04:00
be7b0be8ed Fix typo in schema path name 2021-07-05 23:23:23 -04:00
c45804e8c1 Revert "Return none if a schema path is not found"
This reverts commit b1fcf6a4a5.
2021-07-05 23:16:39 -04:00
b1fcf6a4a5 Return none if a schema path is not found
This can cause overwriting of unintended keys, so should not be
happening. Will have to find the bugs this causes.
2021-07-05 17:15:55 -04:00
47f39a1a2a Fix ordering issue in test-cluster script 2021-07-05 15:14:34 -04:00
54f82a3ea0 Fix bug in VM network list with SR-IOV 2021-07-05 15:14:01 -04:00
37cd278bc2 Bump version to 0.9.22 2021-07-05 14:18:51 -04:00
47a522f8af Use manual zkhandler creation in Benchmark job
Like the other Celery job this does not work properly with the
ZKConnection decorator due to conflicting "self", so just connect
manually exactly like the provisioner task does.
2021-07-05 14:12:56 -04:00
087c23859c Adjust layout of Provisioner lists output
Use the same header format as the others.
2021-07-05 14:06:22 -04:00
6c21a52714 Adjust layout of Ceph/storage lists output
Use the same header format as node, VM, and network lists.
2021-07-05 12:57:18 -04:00
afde436cd0 Adjust layout of Network lists output
Use the same header format as node and VM lists.
2021-07-05 11:48:39 -04:00
1fe71969ca Adjust layout of VM list output
Matches the new node list output format with the additional header line,
as well as revamps some other aspects:

    1. Adjusts the UUID to be under the name in the info output.
    2. Removes the UUID from the list output to save space, because this
       is generally not needed in day-to-day quick-list output.
    3. Renames the "Node" header to "Current" to better reflect what
       that column actually means and avoid conflicting with the parent
       header.
2021-07-05 10:52:48 -04:00
2b04df22a6 Add PVC version to node information output
Also adjusts the layout of the node list output to avoid excessively
long lines. Adds another header line with categories and spacing dashes
for easier visual parsing.
2021-07-05 10:45:20 -04:00
a69105569f Add node PVC version data to Node information
Allows API client to see the currently-active version of the node
daemon.
2021-07-05 09:57:38 -04:00
21a1a7da9e Fix bad schema reference
Not sure how this didn't cause an issue until now, but the wrong key
path was used and this was getting unexpected data with the newly-added
version string instead of the proper mode string.
2021-07-05 09:53:51 -04:00
e44f3d623e Remove unnecessary try/except blocks from VM reads
The zkhandler read() function takes care of ensuring there is a None
value returned if these fail, so these aren't required. Makes the code a
fair bit more readable here.
2021-07-02 12:01:58 -04:00
f0fd3d3f0e Make extra sure VMs terminate when told
When doing a stop_vm or terminate_vm, check again after 0.2 seconds
and try re-terminating if it's still running. Covers cases where a VM
doesn't stop if given the 'stop' state.
2021-07-02 11:40:34 -04:00
f12de6727d Adjust logo slightly and add debug state 2021-07-02 02:32:08 -04:00
e94f5354e6 Update startup messages with new ASCII logo 2021-07-02 02:21:30 -04:00
c51023ba81 Add profiler to keepalive function 2021-07-02 01:55:15 -04:00
61465ef38f Add profiler to several other functions in API 2021-07-02 01:53:19 -04:00
64c6b04352 Ensure all edited files are restored 2021-07-02 01:50:25 -04:00
20542c3653 Add profiler to cluster status function 2021-07-01 17:35:29 -04:00
00b503824e Set unstable version in API and CLI too 2021-07-01 17:35:11 -04:00
43009486ae Move Ceph pool/volume list assembly to thread pool
Same reasons as the VM list, though less impactful.
2021-07-01 17:33:13 -04:00
58789f1db4 Move VM list assembly to thread pool
This helps parallelize the numerous Zookeeper calls a little bit, at
least within the bounds of the GIL, to improve performance when getting
a large list of VMs. The max_workers value is capped at 32 to avoid
causing too many threads during concurrent executions, but still
provides a noticeable speedup (on the order of 0.2-0.4 seconds with 75
VMs, scaling up further as counts grow).
2021-07-01 17:32:47 -04:00
baf4c3fbc7 Add performance profiler function
Usable anywhere that the global daemon "config" parameter can be passed
in (e.g. pvcapid/helper.py, pvcnoded/Daemon.py, etc.). Stores results in
a subdirectory of the PVC logdir called "profiler" if this directory can
be created, or prints results.

The debug config parameter ensures that the profiler can be added to
functions and not run unless the server is explicitly in debug mode.
Might not be useful as I don't initially plan to add this to every
function (only when investigating performance problems), but this
flexibility allows that to change later.
2021-07-01 14:01:33 -04:00
e093efceb1 Add NoNodeError handlers in ZK locks
Instead of looping 5+ times acquiring an impossible lock on a
nonexistent key, just fail on a different error and return failure
immediately.

This is likely a major corner case that shouldn't happen, but better to
be safe than 500.
2021-07-01 01:17:38 -04:00
a080598781 Avoid superfluous ZK exists calls
These cause a major (2x) slowdown in read calls since Zookeeper
connections are expensive/slow. Instead, just try the thing and return
None if there's no key there.

Also wrap the children command in similar error handling since that did
not exist and could likely cause some bugs at some point.
2021-07-01 01:15:51 -04:00
39e82ee426 Cast base schema version to int
Or all our comparisons will fail later and nodes can't start.
2021-06-30 09:40:33 -04:00
fe0a1d582a Bump version to 0.9.21 2021-06-29 19:21:31 -04:00
64feabf7c0 Fix adds in bump-version 2021-06-29 19:20:13 -04:00
cc841898b4 Pad dimensions of logos slightly 2021-06-29 19:16:35 -04:00
de5599b800 Revert "Try dark image instead"
This reverts commit 87f963df4c.
2021-06-29 19:11:38 -04:00
87f963df4c Try dark image instead 2021-06-29 19:09:39 -04:00
de23c8c57e Add background colours to logos 2021-06-29 19:08:10 -04:00
c62a0a6c6a Revamp introduction text 2021-06-29 18:47:01 -04:00
12212177ef Update to new logo 2021-06-29 18:41:36 -04:00
6adaf1f669 Fix incorrect handling of deletions in init 2021-06-29 18:41:02 -04:00
b05c93e260 Fix bad return from initialize call 2021-06-29 18:31:56 -04:00
ffdd6bf3f8 Fix typo in command argument 2021-06-29 18:22:39 -04:00
aae9ae2e80 Fix incorrect handling of overwrite flag 2021-06-29 18:22:01 -04:00
f91c07fdcf Re-add UUID limit matching for full UUIDs
This *was* valuable when passing a full UUID in, so go back to that.
Verify first that the limit string is an actual UUID, and then compare
against it if applicable.
2021-06-28 12:27:43 -04:00
4e2a1c3e52 Add worker wrapper to fix Deb incompatibility
Celery 5.x introduced a new worker argument format that is not
backwards-compatible with the older Celery 4.x format. This created a
conundrum since we use one service unit for both Debian 10 (4.x) and
Debian 11 (5.x). Instead of worse hacks, create a wrapper script to
start the worker with the correct arguments instead.
2021-06-28 12:19:29 -04:00
dbfa339cfb Ensure postinst and prerm always succeed 2021-06-23 20:35:40 -04:00
c54f66efa8 Limit match only on VM name
I can see no possible reason to want to do limits against UUIDs, but
supporting that means match is not what one would expect since a random
UUID could match the limit. So only limit based on the name.
2021-06-23 19:17:35 -04:00
cd860bae6b Optimize VM list in API
With many VMs this slows down linearly. Rework it a bit so there are
fewer calls to getInformationFromXML and so the processing could happen
in parallel at some point.
2021-06-23 19:14:26 -04:00
bbb132414c Restore shebang and don't do store if completion 2021-06-23 05:26:50 -04:00
04fa63f081 Only hit the network endpoint once
Otherwise this is hit for every VM which gets very slow very fast.
2021-06-23 05:15:48 -04:00
f248d579df Convert pvc-client-cli into a proper Python module
Also fixes up the Debian packaging such that this works how I would
want, with proper module installation while leaving everything else
untouched. Finally implements automatic installation and removal of the
BASH completion for the PVC command.
2021-06-23 05:03:19 -04:00
f0db631947 Ignore root-level venv for testing 2021-06-23 02:15:49 -04:00
e91a597591 Merge branch 'sriov'
Implement SR-IOV support in PVC.

Closes #130
2021-06-23 00:58:44 -04:00
8d21da9041 Add some additional interaction tests 2021-06-22 22:08:51 -04:00
1ae34c1960 Fix bad messages in volume remove 2021-06-22 04:31:02 -04:00
75f2560217 Add documentation on SR-IOV client networks 2021-06-22 04:20:38 -04:00
7d2b7441c2 Mention SR-IOV in the Daemon and Ansible manuals 2021-06-22 03:55:19 -04:00
5ec198bf98 Update API doc with remaining items 2021-06-22 03:47:27 -04:00
e6b26745ce Adjust some help messages in pvc.py 2021-06-22 03:40:21 -04:00
3490ecbb59 Remove explicit ZK address from Patronictl command 2021-06-22 03:31:06 -04:00
2928d695c9 Ensure migration method is updated on state changes 2021-06-22 03:20:15 -04:00
7d2a3b5361 Ensure Macvtap NICs can use a model
Defaults to virtio like a bridged NIC. Otherwise performance is abysmal.
2021-06-22 02:38:16 -04:00
07dbd55f03 Use list comprehension to compare against source 2021-06-22 02:31:14 -04:00
26dd24e3f5 Ensure MTU is set on VF when starting up 2021-06-22 02:26:14 -04:00
6cd0ccf0ad Fix network check on VM config modification 2021-06-22 02:21:55 -04:00
1787a970ab Fix bug in address check format string 2021-06-22 02:21:32 -04:00
e623909a43 Store PHY MAC for VFs and restore after free 2021-06-22 00:56:47 -04:00
60e1da09dd Don't try any shenannegans when updating NICs
Trying to do this on the VMInstance side had problems because we can't
differentiate the 3 types of migration there. So, just update this in
the API side and hope everything goes well.

This introduces an edge bug: if a VM is using a macvtap SR-IOV device,
and then tries to migrate, and the migrate is aborted, the NIC lists
will be inconsistent.

When I revamp the VMInstance in the future, I should be able to correct
this, but for now we'll have to live with that edgecase.
2021-06-22 00:00:50 -04:00
dc560c1dcb Better handle retcodes in migrate update 2021-06-21 23:46:47 -04:00
68c7481aa2 Ensure offline migrations update SR-IOV NIC states 2021-06-21 23:35:52 -04:00
7d42fba373 Ensure being in migrate doesn't abort shutdown 2021-06-21 23:28:53 -04:00
b532bc9104 Add missing managed flag for hostdev 2021-06-21 23:22:36 -04:00
24ce361a04 Ensure SR-IOV NIC states are updated on migration 2021-06-21 23:18:34 -04:00
eeb83da97d Add support for SR-IOV NICs to VMs 2021-06-21 23:18:22 -04:00
93c2fdec93 Swap order of networks and disks in provisioner
Done to make the resulting config match the expectations when using "vm
network add", which is that networks are below disks, not above.

Not a functional change, just ensures the VM XML is consistent after
many changes.
2021-06-21 21:59:57 -04:00
904337b677 Fix busted changelog from previous commit 2021-06-21 21:19:06 -04:00
64d1a37b3c Add PCIe device paths to SR-IOV VF information
This will be used when adding VM network interfaces of type hostdev.
2021-06-21 21:08:46 -04:00
13cc0f986f Implement SR-IOV VF config set
Also fixes some random bugs, adds proper interface sorting, and assorted
tweaks.
2021-06-21 18:40:11 -04:00
e13baf8bd3 Add initial SR-IOV list/info to CLI 2021-06-21 17:12:53 -04:00
ae480d6cc1 Add SR-IOV listing/info endpoints to API 2021-06-21 17:12:45 -04:00
33195c3c29 Ensure VF list is sorted 2021-06-21 17:11:48 -04:00
a697c2db2e Add SRIOV PF and VF listing to API 2021-06-21 01:42:55 -04:00
ca11dbf491 Sort the list of VFs for easier parsing 2021-06-21 01:40:05 -04:00
e8bd1bf2c4 Ensure used/used_by are set on creation 2021-06-21 01:25:38 -04:00
bff6d71e18 Add logging to SRIOVVFInstance and fix bug 2021-06-17 02:02:41 -04:00
57b041dc62 Ensure default for vLAN and QOS is 0 not empty 2021-06-17 01:54:37 -04:00
509afd4d05 Add hostdev net_type to handler as well 2021-06-17 01:52:58 -04:00
5607a6bb62 Avoid overwriting VF data
Ensures that the configuration of a VF is not overwritten in Zookeeper
on a node restart. The SRIOVVFInstance handlers were modified to start
with None values, so that the DataWatch statements will always trigger
updates to the live system interfaces on daemon startup, thus ensuring
that the config stored in Zookeeper is applied to the system on startup
(mostly relevant after a cold boot or if the API changes them during a
daemon restart).
2021-06-17 01:45:22 -04:00
8f1af2a642 Ignore hostdev interfaces in VM net stat gathering
Prevents errors if a SR-IOV hostdev interface is configured until this
is more defined.
2021-06-17 01:33:11 -04:00
e7b6a3eac1 Implement SR-IOV PF and VF instances
Adds support for the node daemon managing SR-IOV PF and VF instances.

PFs are added to Zookeeper automatically based on the config at startup
during network configuration, and are otherwise completely static. PFs
are automatically removed from Zookeeper, along with all coresponding
VFs, should the PF phy device be removed from the configuration.

VFs are configured based on the (autocreated) VFs of each PF device,
added to Zookeeper, and then a new class instance, SRIOVVFInstance, is
used to watch them for configuration changes. This will enable the
runtime management of VF settings by the API. The set of keys ensures
that both configuration and details of the NIC can be tracked.

Most keys are self-explanatory, especially for PFs and the basic keys
for VFs. The configuration tree is also self-explanatory, being based
entirely on the options available in the `ip link set {dev} vf` command.

Two additional keys are also present: `used` and `used_by`, which will
be able to track the (boolean) state of usage, as well as the VM that
uses a given VIF. Since the VM side implementation will support both
macvtap and direct "hostdev" assignments, this will ensure that this
state can be tracked on both the VF and the VM side.
2021-06-17 01:33:03 -04:00
0ad6d55dff Add initial SR-IOV support to node daemon
Adds configuration values for enabled flag and SR-IOV devices to the
configuration and sets up the initial SR-IOV configuration on daemon
startup (inserting the module, configuring the VF count, etc.).
2021-06-15 22:56:09 -04:00
eada5db5e4 Add diagram and info about invalid georedundancy 2021-06-15 10:20:42 -04:00
164becd3ef Fix info and list matching 2021-06-15 02:32:34 -04:00
e4a65230a1 Just do the shutdown command itself 2021-06-15 02:32:14 -04:00
da48304d4a Avoid hackery in VNI list and support direct type 2021-06-15 00:31:13 -04:00
f540dd320b Allow VNI for "direct" type vNICs 2021-06-15 00:27:01 -04:00
284c581845 Ensure shutdown migrations actually time out
Without this a VM that fails to respond to a shutdown will just spin
forever, blocking state changes.
2021-06-15 00:23:15 -04:00
7b85d5e3f3 Stop VM before removing 2021-06-14 21:44:17 -04:00
23318524b9 Ensure validate writes a valid schema version 2021-06-14 21:27:37 -04:00
5f11b3198b Fix base schema None issue in handler too 2021-06-14 21:13:40 -04:00
953e46055a Fix issue with loading None version schema 2021-06-14 21:09:55 -04:00
96f1d7df83 Fix bad quote 2021-06-14 20:36:28 -04:00
d2bcfe5cf7 Bump version to 0.9.20 2021-06-14 18:06:27 -04:00
ef1701b4c8 Handle an additional exception case 2021-06-14 17:15:40 -04:00
08dc756549 Actually disable the pvcapid service
Prevents it from trying to start itself during updates or reboots on
non-primary coordinators.
2021-06-14 17:13:22 -04:00
0a9c0c1ccb Use a nicer reload method on hot schema update
Instead of exiting and trusting systemd to restart us, instead leverage
the os.execv() call to reload the process in the current PID context.

Also improves the log messages so it's very clear what's going on.
2021-06-14 17:10:21 -04:00
e34a7d4d2a Handle hot reloads properly
A hot reload isn't possible due to DataWatch and ChildrenWatch
constructs, so we instead need to terminate the daemon to "apply" the
schema update. Thus we use exit code 150 (Application defined in LSB)
and reorder some of the elements of the schema validation to ensure
things happen in the right order.
2021-06-14 12:52:43 -04:00
ddd3eeedda Remove needless literal_eval statements 2021-06-14 01:46:30 -04:00
6fdc6674cf Fix grabbing existing version
The schema `version = ` now messes this up.
2021-06-14 01:40:10 -04:00
78453a173c Add functional testing script
Since trying to unit test this monstrous program at this point is a
daunting task, instead create a functional testing script. Can be
theoretically run against any cluster with an appropriate "test"
provisioner profile, but I mostly just run it against my own.
2021-06-14 01:14:20 -04:00
20c773413c Fix bug in snapshot rename 2021-06-14 00:55:26 -04:00
49f4feb482 Fix typo bug in key rename 2021-06-14 00:51:45 -04:00
a2205bec13 Allow VM dump to file directly
Similar to the cluster backup task.
2021-06-13 22:32:54 -04:00
7727221b59 Correctly use the Click file in backups 2021-06-13 22:17:35 -04:00
30a160d5ff Fix invalid type_key 2021-06-13 21:20:10 -04:00
1cbc66dccf Fix bugs in lease listing 2021-06-13 21:10:42 -04:00
bbd903e568 Fix bad schema name 2021-06-13 21:02:44 -04:00
1f49bfa1b2 Fix name of schema element 2021-06-13 20:56:17 -04:00
9511dc9864 Correct issue with invalid ACL ordering 2021-06-13 20:55:28 -04:00
3013973975 Fix bad schema names 2021-06-13 20:32:41 -04:00
8269930d40 Fix bad entry in network add 2021-06-13 18:22:13 -04:00
647bce2a22 Ensure we don't grab None data 2021-06-13 16:43:25 -04:00
ae79113f7c Correct key typo and add error handler 2021-06-13 15:49:30 -04:00
3bad3de720 Verify if key exists before reading 2021-06-13 15:39:43 -04:00
d2f93b3a2e Fix call to celery 2021-06-13 14:56:09 -04:00
680c62a6e4 Fix schema path call and version check 2021-06-13 14:46:30 -04:00
26b1f531e9 Fix bad variable interpolation 2021-06-13 14:37:23 -04:00
be9f1e8636 Use more compatible is_alive in thread 2021-06-13 14:36:27 -04:00
88a1d89501 Fix bad key name 2021-06-13 14:29:54 -04:00
7110a42e5f Add final schema elements after refactoring 2021-06-13 14:26:17 -04:00
01c82f5d19 Move backup and restore into common 2021-06-13 14:25:51 -04:00
059230d369 Convert vm.py to new ZK schema handler 2021-06-13 13:41:21 -04:00
f6e37906a9 Convert node.py to new ZK schema handler 2021-06-13 13:18:34 -04:00
0a162b304a Convert network.py to new ZK schema handler 2021-06-12 18:40:25 -04:00
f071343333 Add DHCP lease schema and temp workaround 2021-06-12 18:22:43 -04:00
01c762a362 Convert common.py to new ZK schema handler 2021-06-12 17:59:09 -04:00
9b1bd8476f Convert cluster.py to new ZK schema handler 2021-06-12 17:11:32 -04:00
6d00ec07b5 Convert ceph.py to new ZK schema handler 2021-06-12 17:09:29 -04:00
247ae4fe2d Fix pre-refactor path bug 2021-06-10 01:18:33 -04:00
b694945010 Fix incorrect name bug 2021-06-10 01:11:14 -04:00
b1c13c9fc1 Fix another bug with read call 2021-06-10 01:08:18 -04:00
75fc40a1e8 Fix bug with nkipath 2021-06-10 01:00:40 -04:00
2aa7f87ca9 Fix bug in creating child path keys 2021-06-10 00:55:54 -04:00
5273c4ebfa Fix bug with encoding raw creates 2021-06-10 00:52:07 -04:00
8dc9fd6dcb Fix bug with sub self command path/key 2021-06-10 00:49:01 -04:00
058c2ceef3 Convert VXNetworkInstance to new ZK schema handler 2021-06-10 00:36:18 -04:00
e7d60260a0 Fix typo in CephInstance path 2021-06-10 00:36:02 -04:00
f030ed974c Correct schema and handling of network subkeys
Required a bit of refactoring in the validation code to ensure we have
direct access, without relying on the translations done in the normal
zkhandler functions.
2021-06-10 00:35:42 -04:00
9985e1dadd Add support for 2-level dynamic keys 2021-06-09 23:52:21 -04:00
85aba7cc18 Convert VMInstance to new ZK schema handler 2021-06-09 23:15:08 -04:00
7e42118e6f Adjust lock schema in NodeInstance and VMInstance
Removes a superfluous lock and puts the sync_lock keys in more usable
places.
2021-06-09 22:51:00 -04:00
24663a3333 Add missing VM schema entry 2021-06-09 22:12:24 -04:00
2704badfbe Convert VMConsole... to new ZK schema handler 2021-06-09 22:08:32 -04:00
450bf6b153 Convert NodeInstance to new ZK schema handler 2021-06-09 22:07:32 -04:00
b94fe88405 Convert fencing to new ZK schema handler 2021-06-09 21:29:01 -04:00
610f6e8f2c Convert CephInstance to new ZK schema handler 2021-06-09 21:17:09 -04:00
f913f42a6d Replace schema paths with updated zkhandler 2021-06-09 20:29:42 -04:00
a9a57533a7 Integrate schema handling within ZKHandler
Abstracts away the schema management, especially when doing actions, to
prevent duplication in other areas.
2021-06-09 13:23:57 -04:00
76c37e6628 Tweak some field names slightly and add missing 2021-06-09 09:58:18 -04:00
0a04adf8f9 Allow empty sub_paths 2021-06-09 01:54:29 -04:00
ae269bdfde Add scripts to generate ZK migration JSON 2021-06-09 00:04:38 -04:00
f2b55ba937 Fix some bugs with migrations 2021-06-09 00:04:16 -04:00
e475552391 Fix some bugs with hot reload 2021-06-09 00:03:26 -04:00
5540bdc86b Add automatic schema upgrade to nodes
Performs an automatic schema upgrade when all nodes are updated to the
latest version.

Addresses #129
2021-06-08 23:35:39 -04:00
3c102b3769 Add per-node schema tracking
This will allow nodes to start with their own schema versions, and then
be updated simultaneously by the API.

References #129
2021-06-08 23:35:39 -04:00
a4aaf89681 Add ZKSchema loading and validation to Daemon
Also removes some previous hack migrations from pre-0.9.19.

Addresses #129
2021-06-08 23:35:39 -04:00
602dd7b714 Update version 0 schema and add full validation
Addresses #129
2021-06-08 23:35:39 -04:00
126f0742cd Add Zookeeper schema manager to zkhandler
Adds a new class, ZKSchema, to handle schema management in Zookeeper in
an automated and consistent way. This should solve several issues:

1. Pain in managing changes to ZK keys
2. Pain in handling those changes during live upgrades
3. Simplifying the codebase to remove hardcoded ZK paths

The current master schema for PVC 0.9.19 is committed as version 0.

Addresses #129
2021-06-08 23:35:39 -04:00
5843d8aff4 Fix fence call to findTargetNode 2021-06-08 23:34:49 -04:00
fb78be3d8d Add mentions of Debian Bullseye support 2021-06-06 18:09:16 -04:00
76 changed files with 6451 additions and 9797 deletions

5
.gitignore vendored
View File

@ -1,3 +1,8 @@
*.pyc
*.tmp
*.swp
# Ignore build artifacts
debian/pvc-*/
debian/*.log
debian/*.substvars
debian/files

1
.version Normal file
View File

@ -0,0 +1 @@
0.9.28

View File

@ -672,8 +672,3 @@ may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<https://www.gnu.org/licenses/why-not-lgpl.html>.
----
Logo contains elements Copyright Anjara Begue via <http://www.vecteezy.com>
which are released under a Creative Commons Attribution license.

110
README.md
View File

@ -1,25 +1,121 @@
# PVC - The Parallel Virtual Cluster system
<p align="center">
<img alt="Logo banner" src="https://git.bonifacelabs.ca/uploads/-/system/project/avatar/135/pvc_logo.png"/>
<img alt="Logo banner" src="docs/images/pvc_logo_black.png"/>
<br/><br/>
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://parallelvirtualcluster.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
</p>
**NOTICE FOR GITHUB**: This repository is a read-only mirror of the PVC repositories from my personal GitLab instance. Pull requests submitted here will not be merged. Issues submitted here will however be treated as authoritative.
## What is PVC?
PVC is a KVM+Ceph+Zookeeper-based, Free Software, scalable, redundant, self-healing, and self-managing private cloud solution designed with administrator simplicity in mind. It is built from the ground-up to be redundant at the host layer, allowing the cluster to gracefully handle the loss of nodes or their components, both due to hardware failure or due to maintenance. It is able to scale from a minimum of 3 nodes up to 12 or more nodes, while retaining performance and flexibility, allowing the administrator to build a small cluster today and grow it as needed.
PVC is a virtual machine-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.
PVC is a complete HCI solution, built from well-known and well-trusted Free Software tools, to assist an administrator in creating and managing a cluster of servers to run virtual machines, as well as self-managing several important aspects including storage failover, node failure and recovery, virtual machine failure and recovery, and network plumbing. It is designed to act consistently, reliably, and unobtrusively, letting the administrator concentrate on more important things.
PVC is highly scalable. From a minimum (production) node count of 3, up to 12 or more, and supporting many dozens of VMs, PVC scales along with your workload and requirements. Deploy a cluster once and grow it as your needs expand.
As a consequence of its features, PVC makes administrating very high-uptime VMs extremely easy, featuring VM live migration, built-in always-enabled shared storage with transparent multi-node replication, and consistent network plumbing throughout the cluster. Nodes can also be seamlessly removed from or added to service, with zero VM downtime, to facilitate maintenance, upgrades, or other work.
PVC also features an optional, fully customizable VM provisioning framework, designed to automate and simplify VM deployments using custom provisioning profiles, scripts, and CloudInit userdata API support.
Installation of PVC is accomplished by two main components: a [Node installer ISO](https://github.com/parallelvirtualcluster/pvc-installer) which creates on-demand installer ISOs, and an [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) to configure, bootstrap, and administrate the nodes. Once up, the cluster is managed via an HTTP REST API, accessible via a Python Click CLI client or WebUI.
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
## What is it based on?
The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
* Debian GNU/Linux as the base OS.
* Linux KVM, QEMU, and Libvirt for VM management.
* Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
* Ceph for storage management.
* Apache Zookeeper for the primary cluster state database.
* Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
The major goal of PVC is to be administrator friendly, providing the power of Enterprise-grade private clouds like OpenStack, Nutanix, and VMWare to homelabbers, SMBs, and small ISPs, without the cost or complexity. It believes in picking the best tool for a job and abstracting it behind the cluster as a whole, freeing the administrator from the boring and time-consuming task of selecting the best component, and letting them get on with the things that really matter. Administration can be done from a simple CLI or via a RESTful API capable of building full-featured web frontends or additional applications, taking a self-documenting approach to keep the administrator learning curvet as low as possible. Setup is easy and straightforward with an [ISO-based node installer](https://git.bonifacelabs.ca/parallelvirtualcluster/pvc-installer) and [Ansible role framework](https://git.bonifacelabs.ca/parallelvirtualcluster/pvc-ansible) designed to get a cluster up and running as quickly as possible. Build your cloud in an hour, grow it as you need, and never worry about it: just add physical servers.
## Getting Started
To get started with PVC, please see the [About](https://parallelvirtualcluster.readthedocs.io/en/latest/about/) page for general information about the project, and the [Getting Started](https://parallelvirtualcluster.readthedocs.io/en/latest/getting-started/) page for details on configuring your cluster.
To get started with PVC, please see the [About](https://parallelvirtualcluster.readthedocs.io/en/latest/about/) page for general information about the project, and the [Getting Started](https://parallelvirtualcluster.readthedocs.io/en/latest/getting-started/) page for details on configuring your first cluster.
## Changelog
#### v0.9.28
* [CLI Client] Revamp confirmation options for "vm modify" command
#### v0.9.27
* [CLI Client] Fixes a bug with vm modify command when passed a file
#### v0.9.26
* [Node Daemon] Corrects some bad assumptions about fencing results during hardware failures
* [All] Implements VM tagging functionality
* [All] Implements Node log access via PVC functionality
#### v0.9.25
* [Node Daemon] Returns to Rados library calls for Ceph due to performance problems
* [Node Daemon] Adds a date output to keepalive messages
* [Daemons] Configures ZK connection logging only for persistent connections
* [API Provisioner] Add context manager-based chroot to Debootstrap example script
* [Node Daemon] Fixes a bug where shutdown daemon state was overwritten
#### v0.9.24
* [Node Daemon] Removes Rados module polling of Ceph cluster and returns to command-based polling for timeout purposes, and removes some flaky return statements
* [Node Daemon] Removes flaky Zookeeper connection renewals that caused problems
* [CLI Client] Allow raw lists of clusters from `pvc cluster list`
* [API Daemon] Fixes several issues when getting VM data without stats
* [API Daemon] Fixes issues with removing VMs while disks are still in use (failed provisioning, etc.)
#### v0.9.23
* [Daemons] Fixes a critical overwriting bug in zkhandler when schema paths are not yet valid
* [Node Daemon] Ensures the daemon mode is updated on every startup (fixes the side effect of the above bug in 0.9.22)
#### v0.9.22
* [API Daemon] Drastically improves performance when getting large lists (e.g. VMs)
* [Daemons] Adds profiler functions for use in debug mode
* [Daemons] Improves reliability of ZK locking
* [Daemons] Adds the new logo in ASCII form to the Daemon startup message
* [Node Daemon] Fixes bug where VMs would sometimes not stop
* [Node Daemon] Code cleanups in various classes
* [Node Daemon] Fixes a bug when reading node schema data
* [All] Adds node PVC version information to the list output
* [CLI Client] Improves the style and formatting of list output including a new header line
* [API Worker] Fixes a bug that prevented the storage benchmark job from running
#### v0.9.21
* [API Daemon] Ensures VMs stop before removing them
* [Node Daemon] Fixes a bug with VM shutdowns not timing out
* [Documentation] Adds information about georedundancy caveats
* [All] Adds support for SR-IOV NICs (hostdev and macvtap) and surrounding documentation
* [Node Daemon] Fixes a bug where shutdown aborted migrations unexpectedly
* [Node Daemon] Fixes a bug where the migration method was not updated realtime
* [Node Daemon] Adjusts the Patroni commands to remove reference to Zookeeper path
* [CLI Client] Adjusts several help messages and fixes some typos
* [CLI Client] Converts the CLI client to a proper Python module
* [API Daemon] Improves VM list performance
* [API Daemon] Adjusts VM list matching critera (only matches against the UUID if it's a full UUID)
* [API Worker] Fixes incompatibility between Deb 10 and 11 in launching Celery worker
* [API Daemon] Corrects several bugs with initialization command
* [Documentation] Adds a shiny new logo and revamps introduction text
#### v0.9.20
* [Daemons] Implemented a Zookeeper schema handler and version 0 schema
* [Daemons] Completes major refactoring of codebase to make use of the schema handler
* [Daemons] Adds support for dynamic chema changges and "hot reloading" of pvcnoded processes
* [Daemons] Adds a functional testing script for verifying operation against a test cluster
* [Daemons, CLI] Fixes several minor bugs found by the above script
* [Daemons, CLI] Add support for Debian 11 "Bullseye"
#### v0.9.19
* [CLI] Corrects some flawed conditionals

View File

@ -34,6 +34,29 @@
# with that.
import os
from contextlib import contextmanager
# Create a chroot context manager
# This can be used later in the script to chroot to the destination directory
# for instance to run commands within the target.
@contextmanager
def chroot_target(destination):
try:
real_root = os.open("/", os.O_RDONLY)
os.chroot(destination)
fake_root = os.open("/", os.O_RDONLY)
os.fchdir(fake_root)
yield
finally:
os.fchdir(real_root)
os.chroot(".")
os.fchdir(real_root)
os.close(fake_root)
os.close(real_root)
del fake_root
del real_root
# Installation function - performs a debootstrap install of a Debian system
# Note that the only arguments are keyword arguments.
@ -193,40 +216,25 @@ GRUB_DISABLE_LINUX_UUID=false
fh.write(data)
# Chroot, do some in-root tasks, then exit the chroot
# EXITING THE CHROOT IS VERY IMPORTANT OR THE FOLLOWING STAGES OF THE PROVISIONER
# WILL FAIL IN UNEXPECTED WAYS! Keep this in mind when using chroot in your scripts.
real_root = os.open("/", os.O_RDONLY)
os.chroot(temporary_directory)
fake_root = os.open("/", os.O_RDONLY)
os.fchdir(fake_root)
# Install and update GRUB
os.system(
"grub-install --force /dev/rbd/{}/{}_{}".format(root_disk['pool'], vm_name, root_disk['disk_id'])
)
os.system(
"update-grub"
)
# Set a really dumb root password [TEMPORARY]
os.system(
"echo root:test123 | chpasswd"
)
# Enable cloud-init target on (first) boot
# NOTE: Your user-data should handle this and disable it once done, or things get messy.
# That cloud-init won't run without this hack seems like a bug... but even the official
# Debian cloud images are affected, so who knows.
os.system(
"systemctl enable cloud-init.target"
)
# Restore our original root/exit the chroot
# EXITING THE CHROOT IS VERY IMPORTANT OR THE FOLLOWING STAGES OF THE PROVISIONER
# WILL FAIL IN UNEXPECTED WAYS! Keep this in mind when using chroot in your scripts.
os.fchdir(real_root)
os.chroot(".")
os.fchdir(real_root)
os.close(fake_root)
os.close(real_root)
with chroot_target(temporary_directory):
# Install and update GRUB
os.system(
"grub-install --force /dev/rbd/{}/{}_{}".format(root_disk['pool'], vm_name, root_disk['disk_id'])
)
os.system(
"update-grub"
)
# Set a really dumb root password [TEMPORARY]
os.system(
"echo root:test123 | chpasswd"
)
# Enable cloud-init target on (first) boot
# NOTE: Your user-data should handle this and disable it once done, or things get messy.
# That cloud-init won't run without this hack seems like a bug... but even the official
# Debian cloud images are affected, so who knows.
os.system(
"systemctl enable cloud-init.target"
)
# Unmount the bound devfs
os.system(
@ -235,8 +243,4 @@ GRUB_DISABLE_LINUX_UUID=false
)
)
# Clean up file handles so paths can be unmounted
del fake_root
del real_root
# Everything else is done via cloud-init user-data

View File

@ -29,7 +29,7 @@
# This script will run under root privileges as the provisioner does. Be careful
# with that.
# Installation function - performs a debootstrap install of a Debian system
# Installation function - performs no actions then returns
# Note that the only arguments are keyword arguments.
def install(**kwargs):
# The provisioner has already mounted the disks on kwargs['temporary_directory'].

24
api-daemon/pvcapid-manage-zk.py Executable file
View File

@ -0,0 +1,24 @@
#!/usr/bin/env python3
# pvcapid-manage-zk.py - PVC Zookeeper migration generator
# Part of the Parallel Virtual Cluster (PVC) system
#
# Copyright (C) 2018-2021 Joshua M. Boniface <joshua@boniface.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
###############################################################################
from daemon_lib.zkhandler import ZKSchema
ZKSchema.write()

View File

@ -9,7 +9,7 @@ Type = simple
WorkingDirectory = /usr/share/pvc
Environment = PYTHONUNBUFFERED=true
Environment = PVC_CONFIG_FILE=/etc/pvc/pvcapid.yaml
ExecStart = /usr/bin/celery worker -A pvcapid.flaskapi.celery --concurrency 1 --loglevel INFO
ExecStart = /usr/share/pvc/pvcapid-worker.sh
Restart = on-failure
[Install]

40
api-daemon/pvcapid-worker.sh Executable file
View File

@ -0,0 +1,40 @@
#!/usr/bin/env bash
# pvcapid-worker.py - API Celery worker daemon startup stub
# Part of the Parallel Virtual Cluster (PVC) system
#
# Copyright (C) 2018-2021 Joshua M. Boniface <joshua@boniface.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
###############################################################################
CELERY_BIN="$( which celery )"
# This absolute hackery is needed because Celery got the bright idea to change how their
# app arguments work in a non-backwards-compatible way with Celery 5.
case "$( cat /etc/debian_version )" in
10.*)
CELERY_ARGS="worker --app pvcapid.flaskapi.celery --concurrency 1 --loglevel INFO"
;;
11.*)
CELERY_ARGS="--app pvcapid.flaskapi.celery worker --concurrency 1 --loglevel INFO"
;;
*)
echo "Invalid Debian version found!"
exit 1
;;
esac
${CELERY_BIN} ${CELERY_ARGS}
exit $?

View File

@ -25,7 +25,7 @@ import yaml
from distutils.util import strtobool as dustrtobool
# Daemon version
version = '0.9.19'
version = '0.9.28'
# API version
API_VERSION = 1.0
@ -117,21 +117,21 @@ def entrypoint():
# Print our startup messages
print('')
print('|--------------------------------------------------|')
print('| ######## ## ## ###### |')
print('| ## ## ## ## ## ## |')
print('| ## ## ## ## ## |')
print('| ######## ## ## ## |')
print('| ## ## ## ## |')
print('| ## ## ## ## ## |')
print('| ## ### ###### |')
print('|--------------------------------------------------|')
print('| Parallel Virtual Cluster API daemon v{0: <11} |'.format(version))
print('| API version: v{0: <34} |'.format(API_VERSION))
print('| Listen: {0: <40} |'.format('{}:{}'.format(config['listen_address'], config['listen_port'])))
print('| SSL: {0: <43} |'.format(str(config['ssl_enabled'])))
print('| Authentication: {0: <32} |'.format(str(config['auth_enabled'])))
print('|--------------------------------------------------|')
print('|----------------------------------------------------------|')
print('| |')
print('| ███████████ ▜█▙ ▟█▛ █████ █ █ █ |')
print('| ██ ▜█▙ ▟█▛ ██ |')
print('| ███████████ ▜█▙ ▟█▛ ██ |')
print('| ██ ▜█▙▟█▛ ███████████ |')
print('| |')
print('|----------------------------------------------------------|')
print('| Parallel Virtual Cluster API daemon v{0: <19} |'.format(version))
print('| Debug: {0: <49} |'.format(str(config['debug'])))
print('| API version: v{0: <42} |'.format(API_VERSION))
print('| Listen: {0: <48} |'.format('{}:{}'.format(config['listen_address'], config['listen_port'])))
print('| SSL: {0: <51} |'.format(str(config['ssl_enabled'])))
print('| Authentication: {0: <40} |'.format(str(config['auth_enabled'])))
print('|----------------------------------------------------------|')
print('')
pvc_api.app.run(config['listen_address'], config['listen_port'], threaded=True, ssl_context=context)

View File

@ -24,7 +24,7 @@ import psycopg2.extras
from pvcapid.Daemon import config
from daemon_lib.zkhandler import ZKConnection
from daemon_lib.zkhandler import ZKHandler
import daemon_lib.common as pvc_common
import daemon_lib.ceph as pvc_ceph
@ -103,8 +103,7 @@ def list_benchmarks(job=None):
return {'message': 'No benchmark found.'}, 404
@ZKConnection(config)
def run_benchmark(self, zkhandler, pool):
def run_benchmark(self, pool):
# Runtime imports
import time
import json
@ -123,6 +122,13 @@ def run_benchmark(self, zkhandler, pool):
print('FATAL - failed to connect to Postgres')
raise Exception
try:
zkhandler = ZKHandler(config)
zkhandler.connect()
except Exception:
print('FATAL - failed to connect to Zookeeper')
raise Exception
print("Storing running status for job '{}' in database".format(cur_time))
try:
query = "INSERT INTO storage_benchmarks (job, result) VALUES (%s, %s);"
@ -445,4 +451,7 @@ def run_benchmark(self, zkhandler, pool):
raise BenchmarkError("Failed to store test results: {}".format(e), cur_time=cur_time, db_conn=db_conn, db_cur=db_cur, zkhandler=zkhandler)
close_database(db_conn, db_cur)
zkhandler.disconnect()
del zkhandler
return {'status': "Storage benchmark '{}' completed successfully.", 'current': 3, 'total': 3}

View File

@ -299,13 +299,12 @@ class API_Initialize(Resource):
400:
description: Bad request
"""
if reqargs.get('overwrite', False):
if reqargs.get('overwrite', 'False') == 'True':
overwrite_flag = True
if api_helper.initialize_cluster(overwrite=overwrite_flag):
return {"message": "Successfully initialized a new PVC cluster"}, 200
else:
return {"message": "PVC cluster already initialized"}, 400
overwrite_flag = False
return api_helper.initialize_cluster(overwrite=overwrite_flag)
api.add_resource(API_Initialize, '/initialize')
@ -536,6 +535,9 @@ class API_Node_Root(Resource):
domain_state:
type: string
description: The current domain (VM) state
pvc_version:
type: string
description: The current running PVC node daemon version
cpu_count:
type: integer
description: The number of available CPU cores
@ -590,7 +592,7 @@ class API_Node_Root(Resource):
name: limit
type: string
required: false
description: A search limit; fuzzy by default, use ^/$ to force exact matches
description: A search limit in the name, tags, or an exact UUID; fuzzy by default, use ^/$ to force exact matches
- in: query
name: daemon_state
type: string
@ -832,6 +834,52 @@ class API_Node_DomainState(Resource):
api.add_resource(API_Node_DomainState, '/node/<node>/domain-state')
# /node/<node</log
class API_Node_Log(Resource):
@RequestParser([
{'name': 'lines'}
])
@Authenticator
def get(self, node, reqargs):
"""
Return the recent logs of {node}
---
tags:
- node
parameters:
- in: query
name: lines
type: integer
required: false
description: The number of lines to retrieve
responses:
200:
description: OK
schema:
type: object
id: NodeLog
properties:
name:
type: string
description: The name of the Node
data:
type: string
description: The recent log text
404:
description: Node not found
schema:
type: object
id: Message
"""
return api_helper.node_log(
node,
reqargs.get('lines', None)
)
api.add_resource(API_Node_Log, '/node/<node>/log')
##########################################################
# Client API - VM
##########################################################
@ -842,6 +890,7 @@ class API_VM_Root(Resource):
{'name': 'limit'},
{'name': 'node'},
{'name': 'state'},
{'name': 'tag'},
])
@Authenticator
def get(self, reqargs):
@ -890,6 +939,22 @@ class API_VM_Root(Resource):
migration_method:
type: string
description: The preferred migration method (live, shutdown, none)
tags:
type: array
description: The tag(s) of the VM
items:
type: object
id: VMTag
properties:
name:
type: string
description: The name of the tag
type:
type: string
description: The type of the tag (user, system)
protected:
type: boolean
description: Whether the tag is protected or not
description:
type: string
description: The description of the VM
@ -1074,7 +1139,7 @@ class API_VM_Root(Resource):
name: limit
type: string
required: false
description: A name search limit; fuzzy by default, use ^/$ to force exact matches
description: A search limit in the name, tags, or an exact UUID; fuzzy by default, use ^/$ to force exact matches
- in: query
name: node
type: string
@ -1085,6 +1150,11 @@ class API_VM_Root(Resource):
type: string
required: false
description: Limit list to VMs in this state
- in: query
name: tag
type: string
required: false
description: Limit list to VMs with this tag
responses:
200:
description: OK
@ -1096,6 +1166,7 @@ class API_VM_Root(Resource):
return api_helper.vm_list(
reqargs.get('node', None),
reqargs.get('state', None),
reqargs.get('tag', None),
reqargs.get('limit', None)
)
@ -1105,6 +1176,8 @@ class API_VM_Root(Resource):
{'name': 'selector', 'choices': ('mem', 'vcpus', 'load', 'vms', 'none'), 'helptext': "A valid selector must be specified"},
{'name': 'autostart'},
{'name': 'migration_method', 'choices': ('live', 'shutdown', 'none'), 'helptext': "A valid migration_method must be specified"},
{'name': 'user_tags', 'action': 'append'},
{'name': 'protected_tags', 'action': 'append'},
{'name': 'xml', 'required': True, 'helptext': "A Libvirt XML document must be specified"},
])
@Authenticator
@ -1156,6 +1229,20 @@ class API_VM_Root(Resource):
- live
- shutdown
- none
- in: query
name: user_tags
type: array
required: false
description: The user tag(s) of the VM
items:
type: string
- in: query
name: protected_tags
type: array
required: false
description: The protected user tag(s) of the VM
items:
type: string
responses:
200:
description: OK
@ -1168,13 +1255,22 @@ class API_VM_Root(Resource):
type: object
id: Message
"""
user_tags = reqargs.get('user_tags', None)
if user_tags is None:
user_tags = []
protected_tags = reqargs.get('protected_tags', None)
if protected_tags is None:
protected_tags = []
return api_helper.vm_define(
reqargs.get('xml'),
reqargs.get('node', None),
reqargs.get('limit', None),
reqargs.get('selector', 'none'),
bool(strtobool(reqargs.get('autostart', 'false'))),
reqargs.get('migration_method', 'none')
reqargs.get('migration_method', 'none'),
user_tags,
protected_tags
)
@ -1201,7 +1297,7 @@ class API_VM_Element(Resource):
type: object
id: Message
"""
return api_helper.vm_list(None, None, vm, is_fuzzy=False)
return api_helper.vm_list(None, None, None, vm, is_fuzzy=False)
@RequestParser([
{'name': 'limit'},
@ -1209,6 +1305,8 @@ class API_VM_Element(Resource):
{'name': 'selector', 'choices': ('mem', 'vcpus', 'load', 'vms', 'none'), 'helptext': "A valid selector must be specified"},
{'name': 'autostart'},
{'name': 'migration_method', 'choices': ('live', 'shutdown', 'none'), 'helptext': "A valid migration_method must be specified"},
{'name': 'user_tags', 'action': 'append'},
{'name': 'protected_tags', 'action': 'append'},
{'name': 'xml', 'required': True, 'helptext': "A Libvirt XML document must be specified"},
])
@Authenticator
@ -1263,6 +1361,20 @@ class API_VM_Element(Resource):
- live
- shutdown
- none
- in: query
name: user_tags
type: array
required: false
description: The user tag(s) of the VM
items:
type: string
- in: query
name: protected_tags
type: array
required: false
description: The protected user tag(s) of the VM
items:
type: string
responses:
200:
description: OK
@ -1275,13 +1387,22 @@ class API_VM_Element(Resource):
type: object
id: Message
"""
user_tags = reqargs.get('user_tags', None)
if user_tags is None:
user_tags = []
protected_tags = reqargs.get('protected_tags', None)
if protected_tags is None:
protected_tags = []
return api_helper.vm_define(
reqargs.get('xml'),
reqargs.get('node', None),
reqargs.get('limit', None),
reqargs.get('selector', 'none'),
bool(strtobool(reqargs.get('autostart', 'false'))),
reqargs.get('migration_method', 'none')
reqargs.get('migration_method', 'none'),
user_tags,
protected_tags
)
@RequestParser([
@ -1399,7 +1520,7 @@ class API_VM_Metadata(Resource):
type: string
description: The preferred migration method (live, shutdown, none)
404:
description: Not found
description: VM not found
schema:
type: object
id: Message
@ -1467,6 +1588,11 @@ class API_VM_Metadata(Resource):
schema:
type: object
id: Message
404:
description: VM not found
schema:
type: object
id: Message
"""
return api_helper.update_vm_meta(
vm,
@ -1481,6 +1607,99 @@ class API_VM_Metadata(Resource):
api.add_resource(API_VM_Metadata, '/vm/<vm>/meta')
# /vm/<vm>/tags
class API_VM_Tags(Resource):
@Authenticator
def get(self, vm):
"""
Return the tags of {vm}
---
tags:
- vm
responses:
200:
description: OK
schema:
type: object
id: VMTags
properties:
name:
type: string
description: The name of the VM
tags:
type: array
description: The tag(s) of the VM
items:
type: object
id: VMTag
404:
description: VM not found
schema:
type: object
id: Message
"""
return api_helper.get_vm_tags(vm)
@RequestParser([
{'name': 'action', 'choices': ('add', 'remove'), 'helptext': "A valid action must be specified"},
{'name': 'tag'},
{'name': 'protected'}
])
@Authenticator
def post(self, vm, reqargs):
"""
Set the tags of {vm}
---
tags:
- vm
parameters:
- in: query
name: action
type: string
required: true
description: The action to perform with the tag
enum:
- add
- remove
- in: query
name: tag
type: string
required: true
description: The text value of the tag
- in: query
name: protected
type: boolean
required: false
default: false
description: Set the protected state of the tag
responses:
200:
description: OK
schema:
type: object
id: Message
400:
description: Bad request
schema:
type: object
id: Message
404:
description: VM not found
schema:
type: object
id: Message
"""
return api_helper.update_vm_tag(
vm,
reqargs.get('action'),
reqargs.get('tag'),
reqargs.get('protected', False)
)
api.add_resource(API_VM_Tags, '/vm/<vm>/tags')
# /vm/<vm</state
class API_VM_State(Resource):
@Authenticator
@ -2719,6 +2938,301 @@ class API_Network_ACL_Element(Resource):
api.add_resource(API_Network_ACL_Element, '/network/<vni>/acl/<description>')
##########################################################
# Client API - SR-IOV
##########################################################
# /sriov
class API_SRIOV_Root(Resource):
@Authenticator
def get(self):
pass
api.add_resource(API_SRIOV_Root, '/sriov')
# /sriov/pf
class API_SRIOV_PF_Root(Resource):
@RequestParser([
{'name': 'node', 'required': True, 'helptext': "A valid node must be specified."},
])
@Authenticator
def get(self, reqargs):
"""
Return a list of SR-IOV PFs on a given node
---
tags:
- network / sriov
responses:
200:
description: OK
schema:
type: object
id: sriov_pf
properties:
phy:
type: string
description: The name of the SR-IOV PF device
mtu:
type: string
description: The MTU of the SR-IOV PF device
vfs:
type: list
items:
type: string
description: The PHY name of a VF of this PF
"""
return api_helper.sriov_pf_list(reqargs.get('node'))
api.add_resource(API_SRIOV_PF_Root, '/sriov/pf')
# /sriov/pf/<node>
class API_SRIOV_PF_Node(Resource):
@Authenticator
def get(self, node):
"""
Return a list of SR-IOV PFs on node {node}
---
tags:
- network / sriov
responses:
200:
description: OK
schema:
$ref: '#/definitions/sriov_pf'
"""
return api_helper.sriov_pf_list(node)
api.add_resource(API_SRIOV_PF_Node, '/sriov/pf/<node>')
# /sriov/vf
class API_SRIOV_VF_Root(Resource):
@RequestParser([
{'name': 'node', 'required': True, 'helptext': "A valid node must be specified."},
{'name': 'pf', 'required': False, 'helptext': "A PF parent may be specified."},
])
@Authenticator
def get(self, reqargs):
"""
Return a list of SR-IOV VFs on a given node, optionally limited to those in the specified PF
---
tags:
- network / sriov
responses:
200:
description: OK
schema:
type: object
id: sriov_vf
properties:
phy:
type: string
description: The name of the SR-IOV VF device
pf:
type: string
description: The name of the SR-IOV PF parent of this VF device
mtu:
type: integer
description: The current MTU of the VF device
mac:
type: string
description: The current MAC address of the VF device
config:
type: object
id: sriov_vf_config
properties:
vlan_id:
type: string
description: The tagged vLAN ID of the SR-IOV VF device
vlan_qos:
type: string
description: The QOS group of the tagged vLAN
tx_rate_min:
type: string
description: The minimum TX rate of the SR-IOV VF device
tx_rate_max:
type: string
description: The maximum TX rate of the SR-IOV VF device
spoof_check:
type: boolean
description: Whether device spoof checking is enabled or disabled
link_state:
type: string
description: The current SR-IOV VF link state (either enabled, disabled, or auto)
trust:
type: boolean
description: Whether guest device trust is enabled or disabled
query_rss:
type: boolean
description: Whether VF RSS querying is enabled or disabled
usage:
type: object
id: sriov_vf_usage
properties:
used:
type: boolean
description: Whether the SR-IOV VF is currently used by a VM or not
domain:
type: boolean
description: The UUID of the domain the SR-IOV VF is currently used by
"""
return api_helper.sriov_vf_list(reqargs.get('node'), reqargs.get('pf', None))
api.add_resource(API_SRIOV_VF_Root, '/sriov/vf')
# /sriov/vf/<node>
class API_SRIOV_VF_Node(Resource):
@RequestParser([
{'name': 'pf', 'required': False, 'helptext': "A PF parent may be specified."},
])
@Authenticator
def get(self, node, reqargs):
"""
Return a list of SR-IOV VFs on node {node}, optionally limited to those in the specified PF
---
tags:
- network / sriov
responses:
200:
description: OK
schema:
$ref: '#/definitions/sriov_vf'
"""
return api_helper.sriov_vf_list(node, reqargs.get('pf', None))
api.add_resource(API_SRIOV_VF_Node, '/sriov/vf/<node>')
# /sriov/vf/<node>/<vf>
class API_SRIOV_VF_Element(Resource):
@Authenticator
def get(self, node, vf):
"""
Return information about {vf} on {node}
---
tags:
- network / sriov
responses:
200:
description: OK
schema:
$ref: '#/definitions/sriov_vf'
404:
description: Not found
schema:
type: object
id: Message
"""
vf_list = list()
full_vf_list, _ = api_helper.sriov_vf_list(node)
for vf_element in full_vf_list:
if vf_element['phy'] == vf:
vf_list.append(vf_element)
if len(vf_list) == 1:
return vf_list, 200
else:
return {'message': "No VF '{}' found on node '{}'".format(vf, node)}, 404
@RequestParser([
{'name': 'vlan_id'},
{'name': 'vlan_qos'},
{'name': 'tx_rate_min'},
{'name': 'tx_rate_max'},
{'name': 'link_state', 'choices': ('auto', 'enable', 'disable'), 'helptext': "A valid state must be specified"},
{'name': 'spoof_check'},
{'name': 'trust'},
{'name': 'query_rss'},
])
@Authenticator
def put(self, node, vf, reqargs):
"""
Set the configuration of {vf} on {node}
---
tags:
- network / sriov
parameters:
- in: query
name: vlan_id
type: integer
required: false
description: The vLAN ID for vLAN tagging (0 is disabled)
- in: query
name: vlan_qos
type: integer
required: false
description: The vLAN QOS priority (0 is disabled)
- in: query
name: tx_rate_min
type: integer
required: false
description: The minimum TX rate (0 is disabled)
- in: query
name: tx_rate_max
type: integer
required: false
description: The maximum TX rate (0 is disabled)
- in: query
name: link_state
type: string
required: false
description: The administrative link state
enum:
- auto
- enable
- disable
- in: query
name: spoof_check
type: boolean
required: false
description: Enable or disable spoof checking
- in: query
name: trust
type: boolean
required: false
description: Enable or disable VF user trust
- in: query
name: query_rss
type: boolean
required: false
description: Enable or disable query RSS support
responses:
200:
description: OK
schema:
type: object
id: Message
400:
description: Bad request
schema:
type: object
id: Message
"""
return api_helper.update_sriov_vf_config(
node,
vf,
reqargs.get('vlan_id', None),
reqargs.get('vlan_qos', None),
reqargs.get('tx_rate_min', None),
reqargs.get('tx_rate_max', None),
reqargs.get('link_state', None),
reqargs.get('spoof_check', None),
reqargs.get('trust', None),
reqargs.get('query_rss', None),
)
api.add_resource(API_SRIOV_VF_Element, '/sriov/vf/<node>/<vf>')
##########################################################
# Client API - Storage
##########################################################

View File

@ -45,89 +45,33 @@ def initialize_cluster(zkhandler, overwrite=False):
"""
Initialize a new cluster
"""
# Abort if we've initialized the cluster before
if zkhandler.exists('/config/primary_node') and not overwrite:
return False
retflag, retmsg = pvc_cluster.cluster_initialize(zkhandler, overwrite)
if overwrite:
# Delete the existing keys; ignore any errors
status = zkhandler.delete([
'/config'
'/nodes',
'/domains',
'/networks',
'/ceph',
'/ceph/osds',
'/ceph/pools',
'/ceph/volumes',
'/ceph/snapshots',
'/cmd',
'/cmd/domains',
'/cmd/ceph',
'/locks',
'/locks/flush_lock',
'/locks/primary_node'
], recursive=True)
retmsg = {
'message': retmsg
}
if retflag:
retcode = 200
else:
retcode = 400
if not status:
return False
# Create the root keys
status = zkhandler.write([
('/config', ''),
('/config/primary_node', 'none'),
('/config/upstream_ip', 'none'),
('/config/maintenance', 'False'),
('/config/migration_target_selector', 'none'),
('/nodes', ''),
('/domains', ''),
('/networks', ''),
('/ceph', ''),
('/ceph/osds', ''),
('/ceph/pools', ''),
('/ceph/volumes', ''),
('/ceph/snapshots', ''),
('/cmd', ''),
('/cmd/domains', ''),
('/cmd/ceph', ''),
('/locks', ''),
('/locks/flush_lock', ''),
('/locks/primary_node', ''),
])
return status
return retmsg, retcode
@ZKConnection(config)
def backup_cluster(zkhandler):
# Dictionary of values to come
cluster_data = dict()
retflag, retdata = pvc_cluster.cluster_backup(zkhandler)
def get_data(path):
data = zkhandler.read(path)
children = zkhandler.children(path)
if retflag:
retcode = 200
retdata = json.dumps(retdata)
else:
retcode = 400
retdata = {
'message': retdata
}
cluster_data[path] = data
if children:
if path == '/':
child_prefix = '/'
else:
child_prefix = path + '/'
for child in children:
if child_prefix + child == '/zookeeper':
# We must skip the built-in /zookeeper tree
continue
if child_prefix + child == '/patroni':
# We must skip the /patroni tree
continue
get_data(child_prefix + child)
get_data('/')
return json.dumps(cluster_data), 200
return retdata, retcode
@ZKConnection(config)
@ -135,26 +79,25 @@ def restore_cluster(zkhandler, cluster_data_raw):
try:
cluster_data = json.loads(cluster_data_raw)
except Exception as e:
return {"message": "Failed to parse JSON data: {}.".format(e)}, 400
return {'message': 'ERROR: Failed to parse JSON data: {}'.format(e)}, 400
# Build a key+value list
kv = []
for key in cluster_data:
data = cluster_data[key]
kv.append((key, data))
retflag, retdata = pvc_cluster.cluster_restore(zkhandler, cluster_data)
# Close the Zookeeper connection
result = zkhandler.write(kv)
if result:
return {'message': 'Restore completed successfully.'}, 200
retdata = {
'message': retdata
}
if retflag:
retcode = 200
else:
return {'message': 'Restore failed.'}, 500
retcode = 400
return retdata, retcode
#
# Cluster functions
#
@pvc_common.Profiler(config)
@ZKConnection(config)
def cluster_status(zkhandler):
"""
@ -186,6 +129,7 @@ def cluster_maintenance(zkhandler, maint_state='false'):
#
# Node functions
#
@pvc_common.Profiler(config)
@ZKConnection(config)
def node_list(zkhandler, limit=None, daemon_state=None, coordinator_state=None, domain_state=None, is_fuzzy=True):
"""
@ -363,6 +307,34 @@ def node_ready(zkhandler, node, wait):
return output, retcode
@ZKConnection(config)
def node_log(zkhandler, node, lines=None):
"""
Return the current logs for Node.
"""
# Default to 10 lines of log if not set
try:
lines = int(lines)
except TypeError:
lines = 10
retflag, retdata = pvc_node.get_node_log(zkhandler, node, lines)
if retflag:
retcode = 200
retdata = {
'name': node,
'data': retdata
}
else:
retcode = 400
retdata = {
'message': retdata
}
return retdata, retcode
#
# VM functions
#
@ -376,12 +348,13 @@ def vm_is_migrated(zkhandler, vm):
return retdata
@pvc_common.Profiler(config)
@ZKConnection(config)
def vm_state(zkhandler, vm):
"""
Return the state of virtual machine VM.
"""
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, vm, is_fuzzy=False)
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, None, vm, is_fuzzy=False)
if retflag:
if retdata:
@ -404,12 +377,13 @@ def vm_state(zkhandler, vm):
return retdata, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def vm_node(zkhandler, vm):
"""
Return the current node of virtual machine VM.
"""
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, vm, is_fuzzy=False)
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, None, vm, is_fuzzy=False)
if retflag:
if retdata:
@ -461,12 +435,13 @@ def vm_console(zkhandler, vm, lines=None):
return retdata, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def vm_list(zkhandler, node=None, state=None, limit=None, is_fuzzy=True):
def vm_list(zkhandler, node=None, state=None, tag=None, limit=None, is_fuzzy=True):
"""
Return a list of VMs with limit LIMIT.
"""
retflag, retdata = pvc_vm.get_list(zkhandler, node, state, limit, is_fuzzy)
retflag, retdata = pvc_vm.get_list(zkhandler, node, state, tag, limit, is_fuzzy)
if retflag:
if retdata:
@ -486,7 +461,7 @@ def vm_list(zkhandler, node=None, state=None, limit=None, is_fuzzy=True):
@ZKConnection(config)
def vm_define(zkhandler, xml, node, limit, selector, autostart, migration_method):
def vm_define(zkhandler, xml, node, limit, selector, autostart, migration_method, user_tags=[], protected_tags=[]):
"""
Define a VM from Libvirt XML in the PVC cluster.
"""
@ -497,7 +472,65 @@ def vm_define(zkhandler, xml, node, limit, selector, autostart, migration_method
except Exception as e:
return {'message': 'XML is malformed or incorrect: {}'.format(e)}, 400
retflag, retdata = pvc_vm.define_vm(zkhandler, new_cfg, node, limit, selector, autostart, migration_method, profile=None)
tags = list()
for tag in user_tags:
tags.append({'name': tag, 'type': 'user', 'protected': False})
for tag in protected_tags:
tags.append({'name': tag, 'type': 'user', 'protected': True})
retflag, retdata = pvc_vm.define_vm(zkhandler, new_cfg, node, limit, selector, autostart, migration_method, profile=None, tags=tags)
if retflag:
retcode = 200
else:
retcode = 400
output = {
'message': retdata.replace('\"', '\'')
}
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def get_vm_meta(zkhandler, vm):
"""
Get metadata of a VM.
"""
dom_uuid = pvc_vm.getDomainUUID(zkhandler, vm)
if not dom_uuid:
return {"message": "VM not found."}, 404
domain_node_limit, domain_node_selector, domain_node_autostart, domain_migrate_method = pvc_common.getDomainMetadata(zkhandler, dom_uuid)
retcode = 200
retdata = {
'name': vm,
'node_limit': domain_node_limit,
'node_selector': domain_node_selector,
'node_autostart': domain_node_autostart,
'migration_method': domain_migrate_method
}
return retdata, retcode
@ZKConnection(config)
def update_vm_meta(zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method):
"""
Update metadata of a VM.
"""
dom_uuid = pvc_vm.getDomainUUID(zkhandler, vm)
if not dom_uuid:
return {"message": "VM not found."}, 404
if autostart is not None:
try:
autostart = bool(strtobool(autostart))
except Exception:
autostart = False
retflag, retdata = pvc_vm.modify_vm_metadata(zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method)
if retflag:
retcode = 200
@ -511,47 +544,38 @@ def vm_define(zkhandler, xml, node, limit, selector, autostart, migration_method
@ZKConnection(config)
def get_vm_meta(zkhandler, vm):
def get_vm_tags(zkhandler, vm):
"""
Get metadata of a VM.
Get the tags of a VM.
"""
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, vm, is_fuzzy=False)
dom_uuid = pvc_vm.getDomainUUID(zkhandler, vm)
if not dom_uuid:
return {"message": "VM not found."}, 404
if retflag:
if retdata:
retcode = 200
retdata = {
'name': vm,
'node_limit': retdata['node_limit'],
'node_selector': retdata['node_selector'],
'node_autostart': retdata['node_autostart'],
'migration_method': retdata['migration_method']
}
else:
retcode = 404
retdata = {
'message': 'VM not found.'
}
else:
retcode = 400
retdata = {
'message': retdata
}
tags = pvc_common.getDomainTags(zkhandler, dom_uuid)
retcode = 200
retdata = {
'name': vm,
'tags': tags
}
return retdata, retcode
@ZKConnection(config)
def update_vm_meta(zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method):
def update_vm_tag(zkhandler, vm, action, tag, protected=False):
"""
Update metadata of a VM.
Update a tag of a VM.
"""
if autostart is not None:
try:
autostart = bool(strtobool(autostart))
except Exception:
autostart = False
retflag, retdata = pvc_vm.modify_vm_metadata(zkhandler, vm, limit, selector, autostart, provisioner_profile, migration_method)
if action not in ['add', 'remove']:
return {"message": "Tag action must be one of 'add', 'remove'."}, 400
dom_uuid = pvc_vm.getDomainUUID(zkhandler, vm)
if not dom_uuid:
return {"message": "VM not found."}, 404
retflag, retdata = pvc_vm.modify_vm_tag(zkhandler, vm, action, tag, protected=protected)
if retflag:
retcode = 200
@ -804,7 +828,7 @@ def vm_flush_locks(zkhandler, vm):
"""
Flush locks of a (stopped) VM.
"""
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, vm, is_fuzzy=False)
retflag, retdata = pvc_vm.get_list(zkhandler, None, None, None, vm, is_fuzzy=False)
if retdata[0].get('state') not in ['stop', 'disable']:
return {"message": "VM must be stopped to flush locks"}, 400
@ -825,6 +849,7 @@ def vm_flush_locks(zkhandler, vm):
#
# Network functions
#
@pvc_common.Profiler(config)
@ZKConnection(config)
def net_list(zkhandler, limit=None, is_fuzzy=True):
"""
@ -916,6 +941,7 @@ def net_remove(zkhandler, network):
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def net_dhcp_list(zkhandler, network, limit=None, static=False):
"""
@ -976,6 +1002,7 @@ def net_dhcp_remove(zkhandler, network, macaddress):
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def net_acl_list(zkhandler, network, limit=None, direction=None, is_fuzzy=True):
"""
@ -1036,6 +1063,82 @@ def net_acl_remove(zkhandler, network, description):
return output, retcode
#
# SR-IOV functions
#
@pvc_common.Profiler(config)
@ZKConnection(config)
def sriov_pf_list(zkhandler, node):
"""
List all PFs on a given node.
"""
retflag, retdata = pvc_network.get_list_sriov_pf(zkhandler, node)
if retflag:
if retdata:
retcode = 200
else:
retcode = 404
retdata = {
'message': 'PF not found.'
}
else:
retcode = 400
retdata = {
'message': retdata
}
return retdata, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def sriov_vf_list(zkhandler, node, pf=None):
"""
List all VFs on a given node, optionally limited to PF.
"""
retflag, retdata = pvc_network.get_list_sriov_vf(zkhandler, node, pf)
if retflag:
retcode = 200
else:
retcode = 400
if retflag:
if retdata:
retcode = 200
else:
retcode = 404
retdata = {
'message': 'VF not found.'
}
else:
retcode = 400
retdata = {
'message': retdata
}
return retdata, retcode
@ZKConnection(config)
def update_sriov_vf_config(zkhandler, node, vf, vlan_id, vlan_qos, tx_rate_min, tx_rate_max, link_state, spoof_check, trust, query_rss):
"""
Update configuration of a VF on NODE.
"""
retflag, retdata = pvc_network.set_sriov_vf_config(zkhandler, node, vf, vlan_id, vlan_qos, tx_rate_min, tx_rate_max, link_state, spoof_check, trust, query_rss)
if retflag:
retcode = 200
else:
retcode = 400
output = {
'message': retdata.replace('\"', '\'')
}
return output, retcode
#
# Ceph functions
#
@ -1069,6 +1172,7 @@ def ceph_util(zkhandler):
return retdata, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def ceph_osd_list(zkhandler, limit=None):
"""
@ -1225,6 +1329,7 @@ def ceph_osd_unset(zkhandler, option):
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def ceph_pool_list(zkhandler, limit=None, is_fuzzy=True):
"""
@ -1285,6 +1390,7 @@ def ceph_pool_remove(zkhandler, name):
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def ceph_volume_list(zkhandler, pool=None, limit=None, is_fuzzy=True):
"""
@ -1539,6 +1645,7 @@ def ceph_volume_upload(zkhandler, pool, volume, img_type):
return output, retcode
@pvc_common.Profiler(config)
@ZKConnection(config)
def ceph_volume_snapshot_list(zkhandler, pool=None, volume=None, limit=None, is_fuzzy=True):
"""

View File

@ -41,6 +41,7 @@ libvirt_header = """<domain type='kvm'>
<bootmenu enable='yes'/>
<boot dev='cdrom'/>
<boot dev='hd'/>
<bios useserial='yes' rebootTimeout='5'/>
</os>
<features>
<acpi/>

View File

@ -1323,6 +1323,30 @@ def create_vm(self, vm_name, vm_profile, define_vm=True, start_vm=True, script_r
vm_architecture=system_architecture
)
# Add disk devices
monitor_list = list()
coordinator_names = config['storage_hosts']
for coordinator in coordinator_names:
monitor_list.append("{}.{}".format(coordinator, config['storage_domain']))
ceph_storage_secret = config['ceph_storage_secret_uuid']
for volume in vm_data['volumes']:
vm_schema += libvirt_schema.devices_disk_header.format(
ceph_storage_secret=ceph_storage_secret,
disk_pool=volume['pool'],
vm_name=vm_name,
disk_id=volume['disk_id']
)
for monitor in monitor_list:
vm_schema += libvirt_schema.devices_disk_coordinator.format(
coordinator_name=monitor,
coordinator_ceph_mon_port=config['ceph_monitor_port']
)
vm_schema += libvirt_schema.devices_disk_footer
vm_schema += libvirt_schema.devices_vhostmd
# Add network devices
network_id = 0
for network in vm_data['networks']:
@ -1364,30 +1388,6 @@ def create_vm(self, vm_name, vm_profile, define_vm=True, start_vm=True, script_r
network_id += 1
# Add disk devices
monitor_list = list()
coordinator_names = config['storage_hosts']
for coordinator in coordinator_names:
monitor_list.append("{}.{}".format(coordinator, config['storage_domain']))
ceph_storage_secret = config['ceph_storage_secret_uuid']
for volume in vm_data['volumes']:
vm_schema += libvirt_schema.devices_disk_header.format(
ceph_storage_secret=ceph_storage_secret,
disk_pool=volume['pool'],
vm_name=vm_name,
disk_id=volume['disk_id']
)
for monitor in monitor_list:
vm_schema += libvirt_schema.devices_disk_coordinator.format(
coordinator_name=monitor,
coordinator_ceph_mon_port=config['ceph_monitor_port']
)
vm_schema += libvirt_schema.devices_disk_footer
vm_schema += libvirt_schema.devices_vhostmd
# Add default devices
vm_schema += libvirt_schema.devices_default

View File

@ -10,9 +10,11 @@ new_ver="${base_ver}~git-$(git rev-parse --short HEAD)"
echo ${new_ver} >&3
# Back up the existing changelog and Daemon.py files
tmpdir=$( mktemp -d )
cp -a debian/changelog node-daemon/pvcnoded/Daemon.py ${tmpdir}/
cp -a debian/changelog client-cli/setup.py ${tmpdir}/
cp -a node-daemon/pvcnoded/Daemon.py ${tmpdir}/node-Daemon.py
cp -a api-daemon/pvcapid/Daemon.py ${tmpdir}/api-Daemon.py
# Replace the "base" version with the git revision version
sed -i "s/version = '${base_ver}'/version = '${new_ver}'/" node-daemon/pvcnoded/Daemon.py
sed -i "s/version = '${base_ver}'/version = '${new_ver}'/" node-daemon/pvcnoded/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py
sed -i "s/${base_ver}-0/${new_ver}/" debian/changelog
cat <<EOF > debian/changelog
pvc (${new_ver}) unstable; urgency=medium
@ -27,7 +29,10 @@ dh_make -p pvc_${new_ver} --createorig --single --yes
dpkg-buildpackage -us -uc
# Restore original changelog and Daemon.py files
cp -a ${tmpdir}/changelog debian/changelog
cp -a ${tmpdir}/Daemon.py node-daemon/pvcnoded/Daemon.py
cp -a ${tmpdir}/setup.py client-cli/setup.py
cp -a ${tmpdir}/node-Daemon.py node-daemon/pvcnoded/Daemon.py
cp -a ${tmpdir}/api-Daemon.py api-daemon/pvcapid/Daemon.py
# Clean up
rm -r ${tmpdir}
dh_clean

View File

@ -7,7 +7,7 @@ if [[ -z ${new_version} ]]; then
exit 1
fi
current_version="$( grep 'version = ' node-daemon/pvcnoded/Daemon.py | awk -F "'" '{ print $2 }' )"
current_version="$( cat .version )"
echo "${current_version} -> ${new_version}"
changelog_file=$( mktemp )
@ -18,6 +18,8 @@ changelog="$( cat ${changelog_file} | grep -v '^#' | sed 's/^*/ */' )"
sed -i "s,version = '${current_version}',version = '${new_version}'," node-daemon/pvcnoded/Daemon.py
sed -i "s,version = '${current_version}',version = '${new_version}'," api-daemon/pvcapid/Daemon.py
sed -i "s,version='${current_version}',version='${new_version}'," client-cli/setup.py
echo ${new_version} > .version
readme_tmpdir=$( mktemp -d )
cp README.md ${readme_tmpdir}/
@ -47,7 +49,7 @@ echo -e "${deb_changelog_new}" >> ${deb_changelog_file}
echo -e "${deb_changelog_orig}" >> ${deb_changelog_file}
mv ${deb_changelog_file} debian/changelog
git add node-daemon/pvcnoded/Daemon.py api-daemon/pvcapid/Daemon.py README.md docs/index.md debian/changelog
git add node-daemon/pvcnoded/Daemon.py api-daemon/pvcapid/Daemon.py client-cli/setup.py README.md docs/index.md debian/changelog .version
git commit -v
echo

View File

View File

View File

@ -24,8 +24,8 @@ import math
from requests_toolbelt.multipart.encoder import MultipartEncoder, MultipartEncoderMonitor
import cli_lib.ansiprint as ansiprint
from cli_lib.common import UploadProgressBar, call_api
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import UploadProgressBar, call_api
#
# Supplemental functions
@ -419,6 +419,21 @@ def format_list_osd(osd_list):
osd_rddata_length = _osd_rddata_length
# Format the output header
osd_list_output.append('{bold}{osd_header: <{osd_header_length}} {state_header: <{state_header_length}} {details_header: <{details_header_length}} {read_header: <{read_header_length}} {write_header: <{write_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
osd_header_length=osd_id_length + osd_node_length + 1,
state_header_length=osd_up_length + osd_in_length + 1,
details_header_length=osd_size_length + osd_pgs_length + osd_weight_length + osd_reweight_length + osd_used_length + osd_free_length + osd_util_length + osd_var_length + 7,
read_header_length=osd_rdops_length + osd_rddata_length + 1,
write_header_length=osd_wrops_length + osd_wrdata_length + 1,
osd_header='OSDs ' + ''.join(['-' for _ in range(5, osd_id_length + osd_node_length)]),
state_header='State ' + ''.join(['-' for _ in range(6, osd_up_length + osd_in_length)]),
details_header='Details ' + ''.join(['-' for _ in range(8, osd_size_length + osd_pgs_length + osd_weight_length + osd_reweight_length + osd_used_length + osd_free_length + osd_util_length + osd_var_length + 6)]),
read_header='Read ' + ''.join(['-' for _ in range(5, osd_rdops_length + osd_rddata_length)]),
write_header='Write ' + ''.join(['-' for _ in range(6, osd_wrops_length + osd_wrdata_length)]))
)
osd_list_output.append('{bold}\
{osd_id: <{osd_id_length}} \
{osd_node: <{osd_node_length}} \
@ -428,13 +443,13 @@ def format_list_osd(osd_list):
{osd_pgs: <{osd_pgs_length}} \
{osd_weight: <{osd_weight_length}} \
{osd_reweight: <{osd_reweight_length}} \
Sp: {osd_used: <{osd_used_length}} \
{osd_used: <{osd_used_length}} \
{osd_free: <{osd_free_length}} \
{osd_util: <{osd_util_length}} \
{osd_var: <{osd_var_length}} \
Rd: {osd_rdops: <{osd_rdops_length}} \
{osd_rdops: <{osd_rdops_length}} \
{osd_rddata: <{osd_rddata_length}} \
Wr: {osd_wrops: <{osd_wrops_length}} \
{osd_wrops: <{osd_wrops_length}} \
{osd_wrdata: <{osd_wrdata_length}} \
{end_bold}'.format(
bold=ansiprint.bold(),
@ -495,13 +510,13 @@ Wr: {osd_wrops: <{osd_wrops_length}} \
{osd_pgs: <{osd_pgs_length}} \
{osd_weight: <{osd_weight_length}} \
{osd_reweight: <{osd_reweight_length}} \
{osd_used: <{osd_used_length}} \
{osd_used: <{osd_used_length}} \
{osd_free: <{osd_free_length}} \
{osd_util: <{osd_util_length}} \
{osd_var: <{osd_var_length}} \
{osd_rdops: <{osd_rdops_length}} \
{osd_rdops: <{osd_rdops_length}} \
{osd_rddata: <{osd_rddata_length}} \
{osd_wrops: <{osd_wrops_length}} \
{osd_wrops: <{osd_wrops_length}} \
{osd_wrdata: <{osd_wrdata_length}} \
{end_bold}'.format(
bold='',
@ -648,7 +663,7 @@ def format_list_pool(pool_list):
pool_name_length = 5
pool_id_length = 3
pool_used_length = 5
pool_usedpct_length = 5
pool_usedpct_length = 6
pool_free_length = 5
pool_num_objects_length = 6
pool_num_clones_length = 7
@ -737,19 +752,32 @@ def format_list_pool(pool_list):
pool_read_data_length = _pool_read_data_length
# Format the output header
pool_list_output.append('{bold}{pool_header: <{pool_header_length}} {objects_header: <{objects_header_length}} {read_header: <{read_header_length}} {write_header: <{write_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
pool_header_length=pool_id_length + pool_name_length + pool_used_length + pool_usedpct_length + pool_free_length + 4,
objects_header_length=pool_num_objects_length + pool_num_clones_length + pool_num_copies_length + pool_num_degraded_length + 3,
read_header_length=pool_read_ops_length + pool_read_data_length + 1,
write_header_length=pool_write_ops_length + pool_write_data_length + 1,
pool_header='Pools ' + ''.join(['-' for _ in range(6, pool_id_length + pool_name_length + pool_used_length + pool_usedpct_length + pool_free_length + 3)]),
objects_header='Objects ' + ''.join(['-' for _ in range(8, pool_num_objects_length + pool_num_clones_length + pool_num_copies_length + pool_num_degraded_length + 2)]),
read_header='Read ' + ''.join(['-' for _ in range(5, pool_read_ops_length + pool_read_data_length)]),
write_header='Write ' + ''.join(['-' for _ in range(6, pool_write_ops_length + pool_write_data_length)]))
)
pool_list_output.append('{bold}\
{pool_id: <{pool_id_length}} \
{pool_name: <{pool_name_length}} \
{pool_used: <{pool_used_length}} \
{pool_usedpct: <{pool_usedpct_length}} \
{pool_free: <{pool_free_length}} \
Obj: {pool_objects: <{pool_objects_length}} \
{pool_objects: <{pool_objects_length}} \
{pool_clones: <{pool_clones_length}} \
{pool_copies: <{pool_copies_length}} \
{pool_degraded: <{pool_degraded_length}} \
Rd: {pool_read_ops: <{pool_read_ops_length}} \
{pool_read_ops: <{pool_read_ops_length}} \
{pool_read_data: <{pool_read_data_length}} \
Wr: {pool_write_ops: <{pool_write_ops_length}} \
{pool_write_ops: <{pool_write_ops_length}} \
{pool_write_data: <{pool_write_data_length}} \
{end_bold}'.format(
bold=ansiprint.bold(),
@ -770,7 +798,7 @@ Wr: {pool_write_ops: <{pool_write_ops_length}} \
pool_id='ID',
pool_name='Name',
pool_used='Used',
pool_usedpct='%',
pool_usedpct='Used%',
pool_free='Free',
pool_objects='Count',
pool_clones='Clones',
@ -790,13 +818,13 @@ Wr: {pool_write_ops: <{pool_write_ops_length}} \
{pool_used: <{pool_used_length}} \
{pool_usedpct: <{pool_usedpct_length}} \
{pool_free: <{pool_free_length}} \
{pool_objects: <{pool_objects_length}} \
{pool_objects: <{pool_objects_length}} \
{pool_clones: <{pool_clones_length}} \
{pool_copies: <{pool_copies_length}} \
{pool_degraded: <{pool_degraded_length}} \
{pool_read_ops: <{pool_read_ops_length}} \
{pool_read_ops: <{pool_read_ops_length}} \
{pool_read_data: <{pool_read_data_length}} \
{pool_write_ops: <{pool_write_ops_length}} \
{pool_write_ops: <{pool_write_ops_length}} \
{pool_write_data: <{pool_write_data_length}} \
{end_bold}'.format(
bold='',
@ -1057,6 +1085,15 @@ def format_list_volume(volume_list):
volume_features_length = _volume_features_length
# Format the output header
volume_list_output.append('{bold}{volume_header: <{volume_header_length}} {details_header: <{details_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
volume_header_length=volume_name_length + volume_pool_length + 1,
details_header_length=volume_size_length + volume_objects_length + volume_order_length + volume_format_length + volume_features_length + 4,
volume_header='Volumes ' + ''.join(['-' for _ in range(8, volume_name_length + volume_pool_length)]),
details_header='Details ' + ''.join(['-' for _ in range(8, volume_size_length + volume_objects_length + volume_order_length + volume_format_length + volume_features_length + 3)]))
)
volume_list_output.append('{bold}\
{volume_name: <{volume_name_length}} \
{volume_pool: <{volume_pool_length}} \
@ -1084,7 +1121,7 @@ def format_list_volume(volume_list):
volume_features='Features')
)
for volume_information in volume_list:
for volume_information in sorted(volume_list, key=lambda v: v['pool'] + v['name']):
volume_list_output.append('{bold}\
{volume_name: <{volume_name_length}} \
{volume_pool: <{volume_pool_length}} \
@ -1112,7 +1149,7 @@ def format_list_volume(volume_list):
volume_features=','.join(volume_information['stats']['features']))
)
return '\n'.join(sorted(volume_list_output))
return '\n'.join(volume_list_output)
#
@ -1263,6 +1300,13 @@ def format_list_snapshot(snapshot_list):
snapshot_pool_length = _snapshot_pool_length
# Format the output header
snapshot_list_output.append('{bold}{snapshot_header: <{snapshot_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
snapshot_header_length=snapshot_name_length + snapshot_volume_length + snapshot_pool_length + 2,
snapshot_header='Snapshots ' + ''.join(['-' for _ in range(10, snapshot_name_length + snapshot_volume_length + snapshot_pool_length + 1)]))
)
snapshot_list_output.append('{bold}\
{snapshot_name: <{snapshot_name_length}} \
{snapshot_volume: <{snapshot_volume_length}} \
@ -1278,7 +1322,7 @@ def format_list_snapshot(snapshot_list):
snapshot_pool='Pool')
)
for snapshot_information in snapshot_list:
for snapshot_information in sorted(snapshot_list, key=lambda s: s['pool'] + s['volume'] + s['snapshot']):
snapshot_name = snapshot_information['snapshot']
snapshot_volume = snapshot_information['volume']
snapshot_pool = snapshot_information['pool']
@ -1297,7 +1341,7 @@ def format_list_snapshot(snapshot_list):
snapshot_pool=snapshot_pool)
)
return '\n'.join(sorted(snapshot_list_output))
return '\n'.join(snapshot_list_output)
#
@ -1365,6 +1409,11 @@ def format_list_benchmark(config, benchmark_information):
benchmark_bandwidth_length[test] = 7
benchmark_iops_length[test] = 6
benchmark_seq_bw_length = 15
benchmark_seq_iops_length = 10
benchmark_rand_bw_length = 15
benchmark_rand_iops_length = 10
for benchmark in benchmark_information:
benchmark_job = benchmark['job']
_benchmark_job_length = len(benchmark_job)
@ -1373,53 +1422,68 @@ def format_list_benchmark(config, benchmark_information):
if benchmark['benchmark_result'] == 'Running':
continue
benchmark_data = json.loads(benchmark['benchmark_result'])
benchmark_bandwidth = dict()
benchmark_iops = dict()
for test in ["seq_read", "seq_write", "rand_read_4K", "rand_write_4K"]:
benchmark_data = json.loads(benchmark['benchmark_result'])
benchmark_bandwidth[test] = format_bytes_tohuman(int(benchmark_data[test]['overall']['bandwidth']) * 1024)
benchmark_iops[test] = format_ops_tohuman(int(benchmark_data[test]['overall']['iops']))
_benchmark_bandwidth_length = len(benchmark_bandwidth[test]) + 1
if _benchmark_bandwidth_length > benchmark_bandwidth_length[test]:
benchmark_bandwidth_length[test] = _benchmark_bandwidth_length
seq_benchmark_bandwidth = "{} / {}".format(benchmark_bandwidth['seq_read'], benchmark_bandwidth['seq_write'])
seq_benchmark_iops = "{} / {}".format(benchmark_iops['seq_read'], benchmark_iops['seq_write'])
rand_benchmark_bandwidth = "{} / {}".format(benchmark_bandwidth['rand_read_4K'], benchmark_bandwidth['rand_write_4K'])
rand_benchmark_iops = "{} / {}".format(benchmark_iops['rand_read_4K'], benchmark_iops['rand_write_4K'])
_benchmark_iops_length = len(benchmark_iops[test]) + 1
if _benchmark_iops_length > benchmark_bandwidth_length[test]:
benchmark_iops_length[test] = _benchmark_iops_length
_benchmark_seq_bw_length = len(seq_benchmark_bandwidth) + 1
if _benchmark_seq_bw_length > benchmark_seq_bw_length:
benchmark_seq_bw_length = _benchmark_seq_bw_length
_benchmark_seq_iops_length = len(seq_benchmark_iops) + 1
if _benchmark_seq_iops_length > benchmark_seq_iops_length:
benchmark_seq_iops_length = _benchmark_seq_iops_length
_benchmark_rand_bw_length = len(rand_benchmark_bandwidth) + 1
if _benchmark_rand_bw_length > benchmark_rand_bw_length:
benchmark_rand_bw_length = _benchmark_rand_bw_length
_benchmark_rand_iops_length = len(rand_benchmark_iops) + 1
if _benchmark_rand_iops_length > benchmark_rand_iops_length:
benchmark_rand_iops_length = _benchmark_rand_iops_length
# Format the output header line 1
benchmark_list_output.append('{bold}\
{benchmark_job: <{benchmark_job_length}} \
{seq_header: <{seq_header_length}} \
{rand_header: <{rand_header_length}} \
{benchmark_job: <{benchmark_job_length}} \
{seq_header: <{seq_header_length}} \
{rand_header: <{rand_header_length}}\
{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
benchmark_job_length=benchmark_job_length,
seq_header_length=benchmark_bandwidth_length['seq_read'] + benchmark_bandwidth_length['seq_write'] + benchmark_iops_length['seq_read'] + benchmark_iops_length['seq_write'] + 3,
rand_header_length=benchmark_bandwidth_length['rand_read_4K'] + benchmark_bandwidth_length['rand_write_4K'] + benchmark_iops_length['rand_read_4K'] + benchmark_iops_length['rand_write_4K'] + 2,
benchmark_job='Benchmark Job',
seq_header='Sequential (4M blocks):',
rand_header='Random (4K blocks):')
seq_header_length=benchmark_seq_bw_length + benchmark_seq_iops_length + 1,
rand_header_length=benchmark_rand_bw_length + benchmark_rand_iops_length + 1,
benchmark_job='Benchmarks ' + ''.join(['-' for _ in range(11, benchmark_job_length - 1)]),
seq_header='Sequential (4M blocks) ' + ''.join(['-' for _ in range(23, benchmark_seq_bw_length + benchmark_seq_iops_length)]),
rand_header='Random (4K blocks) ' + ''.join(['-' for _ in range(19, benchmark_rand_bw_length + benchmark_rand_iops_length)]))
)
benchmark_list_output.append('{bold}\
{benchmark_job: <{benchmark_job_length}} \
{seq_benchmark_bandwidth: <{seq_benchmark_bandwidth_length}} \
{seq_benchmark_iops: <{seq_benchmark_iops_length}} \
{benchmark_job: <{benchmark_job_length}} \
{seq_benchmark_bandwidth: <{seq_benchmark_bandwidth_length}} \
{seq_benchmark_iops: <{seq_benchmark_iops_length}} \
{rand_benchmark_bandwidth: <{rand_benchmark_bandwidth_length}} \
{rand_benchmark_iops: <{rand_benchmark_iops_length}} \
{rand_benchmark_iops: <{rand_benchmark_iops_length}}\
{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
benchmark_job_length=benchmark_job_length,
seq_benchmark_bandwidth_length=benchmark_bandwidth_length['seq_read'] + benchmark_bandwidth_length['seq_write'] + 2,
seq_benchmark_iops_length=benchmark_iops_length['seq_read'] + benchmark_iops_length['seq_write'],
rand_benchmark_bandwidth_length=benchmark_bandwidth_length['rand_read_4K'] + benchmark_bandwidth_length['rand_write_4K'] + 1,
rand_benchmark_iops_length=benchmark_iops_length['rand_read_4K'] + benchmark_iops_length['rand_write_4K'],
benchmark_job='',
seq_benchmark_bandwidth_length=benchmark_seq_bw_length,
seq_benchmark_iops_length=benchmark_seq_iops_length,
rand_benchmark_bandwidth_length=benchmark_rand_bw_length,
rand_benchmark_iops_length=benchmark_rand_iops_length,
benchmark_job='Job',
seq_benchmark_bandwidth='R/W Bandwith/s',
seq_benchmark_iops='R/W IOPS',
rand_benchmark_bandwidth='R/W Bandwith/s',
@ -1448,19 +1512,19 @@ def format_list_benchmark(config, benchmark_information):
rand_benchmark_iops = "{} / {}".format(benchmark_iops['rand_read_4K'], benchmark_iops['rand_write_4K'])
benchmark_list_output.append('{bold}\
{benchmark_job: <{benchmark_job_length}} \
{seq_benchmark_bandwidth: <{seq_benchmark_bandwidth_length}} \
{seq_benchmark_iops: <{seq_benchmark_iops_length}} \
{benchmark_job: <{benchmark_job_length}} \
{seq_benchmark_bandwidth: <{seq_benchmark_bandwidth_length}} \
{seq_benchmark_iops: <{seq_benchmark_iops_length}} \
{rand_benchmark_bandwidth: <{rand_benchmark_bandwidth_length}} \
{rand_benchmark_iops: <{rand_benchmark_iops_length}} \
{rand_benchmark_iops: <{rand_benchmark_iops_length}}\
{end_bold}'.format(
bold='',
end_bold='',
benchmark_job_length=benchmark_job_length,
seq_benchmark_bandwidth_length=benchmark_bandwidth_length['seq_read'] + benchmark_bandwidth_length['seq_write'] + 2,
seq_benchmark_iops_length=benchmark_iops_length['seq_read'] + benchmark_iops_length['seq_write'],
rand_benchmark_bandwidth_length=benchmark_bandwidth_length['rand_read_4K'] + benchmark_bandwidth_length['rand_write_4K'] + 1,
rand_benchmark_iops_length=benchmark_iops_length['rand_read_4K'] + benchmark_iops_length['rand_write_4K'],
seq_benchmark_bandwidth_length=benchmark_seq_bw_length,
seq_benchmark_iops_length=benchmark_seq_iops_length,
rand_benchmark_bandwidth_length=benchmark_rand_bw_length,
rand_benchmark_iops_length=benchmark_rand_iops_length,
benchmark_job=benchmark_job,
seq_benchmark_bandwidth=seq_benchmark_bandwidth,
seq_benchmark_iops=seq_benchmark_iops,

View File

@ -21,8 +21,8 @@
import json
import cli_lib.ansiprint as ansiprint
from cli_lib.common import call_api
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import call_api
def initialize(config, overwrite=False):

View File

@ -20,8 +20,8 @@
###############################################################################
import re
import cli_lib.ansiprint as ansiprint
from cli_lib.common import call_api
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import call_api
def isValidMAC(macaddr):
@ -360,7 +360,6 @@ def net_acl_add(config, net, direction, description, rule, order):
def net_acl_remove(config, net, description):
"""
Remove a network ACL
@ -378,27 +377,135 @@ def net_acl_remove(config, net, description):
return retstatus, response.json().get('message', '')
#
# SR-IOV functions
#
def net_sriov_pf_list(config, node):
"""
List all PFs on NODE
API endpoint: GET /api/v1/sriov/pf/<node>
API arguments: node={node}
API schema: [{json_data_object},{json_data_object},etc.]
"""
response = call_api(config, 'get', '/sriov/pf/{}'.format(node))
if response.status_code == 200:
return True, response.json()
else:
return False, response.json().get('message', '')
def net_sriov_vf_set(config, node, vf, vlan_id, vlan_qos, tx_rate_min, tx_rate_max, link_state, spoof_check, trust, query_rss):
"""
Mdoify configuration of a SR-IOV VF
API endpoint: PUT /api/v1/sriov/vf/<node>/<vf>
API arguments: vlan_id={vlan_id}, vlan_qos={vlan_qos}, tx_rate_min={tx_rate_min}, tx_rate_max={tx_rate_max},
link_state={link_state}, spoof_check={spoof_check}, trust={trust}, query_rss={query_rss}
API schema: {"message": "{data}"}
"""
params = dict()
# Update any params that we've sent
if vlan_id is not None:
params['vlan_id'] = vlan_id
if vlan_qos is not None:
params['vlan_qos'] = vlan_qos
if tx_rate_min is not None:
params['tx_rate_min'] = tx_rate_min
if tx_rate_max is not None:
params['tx_rate_max'] = tx_rate_max
if link_state is not None:
params['link_state'] = link_state
if spoof_check is not None:
params['spoof_check'] = spoof_check
if trust is not None:
params['trust'] = trust
if query_rss is not None:
params['query_rss'] = query_rss
# Write the new configuration to the API
response = call_api(config, 'put', '/sriov/vf/{node}/{vf}'.format(node=node, vf=vf), params=params)
if response.status_code == 200:
retstatus = True
else:
retstatus = False
return retstatus, response.json().get('message', '')
def net_sriov_vf_list(config, node, pf=None):
"""
List all VFs on NODE, optionally limited by PF
API endpoint: GET /api/v1/sriov/vf/<node>
API arguments: node={node}, pf={pf}
API schema: [{json_data_object},{json_data_object},etc.]
"""
params = dict()
params['pf'] = pf
response = call_api(config, 'get', '/sriov/vf/{}'.format(node), params=params)
if response.status_code == 200:
return True, response.json()
else:
return False, response.json().get('message', '')
def net_sriov_vf_info(config, node, vf):
"""
Get info about VF on NODE
API endpoint: GET /api/v1/sriov/vf/<node>/<vf>
API arguments:
API schema: [{json_data_object}]
"""
response = call_api(config, 'get', '/sriov/vf/{}/{}'.format(node, vf))
if response.status_code == 200:
if isinstance(response.json(), list) and len(response.json()) != 1:
# No exact match; return not found
return False, "VF not found."
else:
# Return a single instance if the response is a list
if isinstance(response.json(), list):
return True, response.json()[0]
# This shouldn't happen, but is here just in case
else:
return True, response.json()
else:
return False, response.json().get('message', '')
#
# Output display functions
#
def getOutputColours(network_information):
if network_information['ip6']['network'] != "None":
v6_flag_colour = ansiprint.green()
def getColour(value):
if value in ['True', "start"]:
return ansiprint.green()
elif value in ["restart", "shutdown"]:
return ansiprint.yellow()
elif value in ["stop", "fail"]:
return ansiprint.red()
else:
v6_flag_colour = ansiprint.blue()
if network_information['ip4']['network'] != "None":
v4_flag_colour = ansiprint.green()
else:
v4_flag_colour = ansiprint.blue()
return ansiprint.blue()
if network_information['ip6']['dhcp_flag'] == "True":
dhcp6_flag_colour = ansiprint.green()
else:
dhcp6_flag_colour = ansiprint.blue()
if network_information['ip4']['dhcp_flag'] == "True":
dhcp4_flag_colour = ansiprint.green()
else:
dhcp4_flag_colour = ansiprint.blue()
def getOutputColours(network_information):
v6_flag_colour = getColour(network_information['ip6']['network'])
v4_flag_colour = getColour(network_information['ip4']['network'])
dhcp6_flag_colour = getColour(network_information['ip6']['dhcp_flag'])
dhcp4_flag_colour = getColour(network_information['ip4']['dhcp_flag'])
return v6_flag_colour, v4_flag_colour, dhcp6_flag_colour, dhcp4_flag_colour
@ -492,6 +599,14 @@ def format_list(config, network_list):
net_domain_length = _net_domain_length
# Format the string (header)
network_list_output.append('{bold}{networks_header: <{networks_header_length}} {config_header: <{config_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
networks_header_length=net_vni_length + net_description_length + 1,
config_header_length=net_nettype_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 6,
networks_header='Networks ' + ''.join(['-' for _ in range(9, net_vni_length + net_description_length)]),
config_header='Config ' + ''.join(['-' for _ in range(7, net_nettype_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 5)]))
)
network_list_output.append('{bold}\
{net_vni: <{net_vni_length}} \
{net_description: <{net_description_length}} \
@ -522,7 +637,7 @@ def format_list(config, network_list):
net_dhcp4_flag='DHCPv4')
)
for network_information in network_list:
for network_information in sorted(network_list, key=lambda n: int(n['vni'])):
v6_flag_colour, v4_flag_colour, dhcp6_flag_colour, dhcp4_flag_colour = getOutputColours(network_information)
if network_information['ip4']['network'] != "None":
v4_flag = 'True'
@ -569,7 +684,7 @@ def format_list(config, network_list):
colour_off=ansiprint.end())
)
return '\n'.join(sorted(network_list_output))
return '\n'.join(network_list_output)
def format_list_dhcp(dhcp_lease_list):
@ -579,7 +694,7 @@ def format_list_dhcp(dhcp_lease_list):
lease_hostname_length = 9
lease_ip4_address_length = 11
lease_mac_address_length = 13
lease_timestamp_length = 13
lease_timestamp_length = 10
for dhcp_lease_information in dhcp_lease_list:
# hostname column
_lease_hostname_length = len(str(dhcp_lease_information['hostname'])) + 1
@ -593,8 +708,19 @@ def format_list_dhcp(dhcp_lease_list):
_lease_mac_address_length = len(str(dhcp_lease_information['mac_address'])) + 1
if _lease_mac_address_length > lease_mac_address_length:
lease_mac_address_length = _lease_mac_address_length
# timestamp column
_lease_timestamp_length = len(str(dhcp_lease_information['timestamp'])) + 1
if _lease_timestamp_length > lease_timestamp_length:
lease_timestamp_length = _lease_timestamp_length
# Format the string (header)
dhcp_lease_list_output.append('{bold}{lease_header: <{lease_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
lease_header_length=lease_hostname_length + lease_ip4_address_length + lease_mac_address_length + lease_timestamp_length + 3,
lease_header='Leases ' + ''.join(['-' for _ in range(7, lease_hostname_length + lease_ip4_address_length + lease_mac_address_length + lease_timestamp_length + 2)]))
)
dhcp_lease_list_output.append('{bold}\
{lease_hostname: <{lease_hostname_length}} \
{lease_ip4_address: <{lease_ip4_address_length}} \
@ -613,7 +739,7 @@ def format_list_dhcp(dhcp_lease_list):
lease_timestamp='Timestamp')
)
for dhcp_lease_information in dhcp_lease_list:
for dhcp_lease_information in sorted(dhcp_lease_list, key=lambda l: l['hostname']):
dhcp_lease_list_output.append('{bold}\
{lease_hostname: <{lease_hostname_length}} \
{lease_ip4_address: <{lease_ip4_address_length}} \
@ -632,7 +758,7 @@ def format_list_dhcp(dhcp_lease_list):
lease_timestamp=str(dhcp_lease_information['timestamp']))
)
return '\n'.join(sorted(dhcp_lease_list_output))
return '\n'.join(dhcp_lease_list_output)
def format_list_acl(acl_list):
@ -662,6 +788,13 @@ def format_list_acl(acl_list):
acl_rule_length = _acl_rule_length
# Format the string (header)
acl_list_output.append('{bold}{acl_header: <{acl_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
acl_header_length=acl_direction_length + acl_order_length + acl_description_length + acl_rule_length + 3,
acl_header='ACLs ' + ''.join(['-' for _ in range(5, acl_direction_length + acl_order_length + acl_description_length + acl_rule_length + 2)]))
)
acl_list_output.append('{bold}\
{acl_direction: <{acl_direction_length}} \
{acl_order: <{acl_order_length}} \
@ -680,7 +813,7 @@ def format_list_acl(acl_list):
acl_rule='Rule')
)
for acl_information in acl_list:
for acl_information in sorted(acl_list, key=lambda l: l['direction'] + str(l['order'])):
acl_list_output.append('{bold}\
{acl_direction: <{acl_direction_length}} \
{acl_order: <{acl_order_length}} \
@ -699,4 +832,264 @@ def format_list_acl(acl_list):
acl_rule=acl_information['rule'])
)
return '\n'.join(sorted(acl_list_output))
return '\n'.join(acl_list_output)
def format_list_sriov_pf(pf_list):
# The maximum column width of the VFs column
max_vfs_length = 70
# Handle when we get an empty entry
if not pf_list:
pf_list = list()
pf_list_output = []
# Determine optimal column widths
pf_phy_length = 6
pf_mtu_length = 4
pf_vfs_length = 4
for pf_information in pf_list:
# phy column
_pf_phy_length = len(str(pf_information['phy'])) + 1
if _pf_phy_length > pf_phy_length:
pf_phy_length = _pf_phy_length
# mtu column
_pf_mtu_length = len(str(pf_information['mtu'])) + 1
if _pf_mtu_length > pf_mtu_length:
pf_mtu_length = _pf_mtu_length
# vfs column
_pf_vfs_length = len(str(', '.join(pf_information['vfs']))) + 1
if _pf_vfs_length > pf_vfs_length:
pf_vfs_length = _pf_vfs_length
# We handle columnizing very long lists later
if pf_vfs_length > max_vfs_length:
pf_vfs_length = max_vfs_length
# Format the string (header)
pf_list_output.append('{bold}{pf_header: <{pf_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
pf_header_length=pf_phy_length + pf_mtu_length + pf_vfs_length + 2,
pf_header='PFs ' + ''.join(['-' for _ in range(4, pf_phy_length + pf_mtu_length + pf_vfs_length + 1)]))
)
pf_list_output.append('{bold}\
{pf_phy: <{pf_phy_length}} \
{pf_mtu: <{pf_mtu_length}} \
{pf_vfs: <{pf_vfs_length}} \
{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
pf_phy_length=pf_phy_length,
pf_mtu_length=pf_mtu_length,
pf_vfs_length=pf_vfs_length,
pf_phy='Device',
pf_mtu='MTU',
pf_vfs='VFs')
)
for pf_information in sorted(pf_list, key=lambda p: p['phy']):
# Figure out how to nicely columnize our list
nice_vfs_list = [list()]
vfs_lines = 0
cur_vfs_length = 0
for vfs in pf_information['vfs']:
vfs_len = len(vfs)
cur_vfs_length += vfs_len + 2 # for the comma and space
if cur_vfs_length > max_vfs_length:
cur_vfs_length = 0
vfs_lines += 1
nice_vfs_list.append(list())
nice_vfs_list[vfs_lines].append(vfs)
# Append the lines
pf_list_output.append('{bold}\
{pf_phy: <{pf_phy_length}} \
{pf_mtu: <{pf_mtu_length}} \
{pf_vfs: <{pf_vfs_length}} \
{end_bold}'.format(
bold='',
end_bold='',
pf_phy_length=pf_phy_length,
pf_mtu_length=pf_mtu_length,
pf_vfs_length=pf_vfs_length,
pf_phy=pf_information['phy'],
pf_mtu=pf_information['mtu'],
pf_vfs=', '.join(nice_vfs_list[0]))
)
if len(nice_vfs_list) > 1:
for idx in range(1, len(nice_vfs_list)):
pf_list_output.append('{bold}\
{pf_phy: <{pf_phy_length}} \
{pf_mtu: <{pf_mtu_length}} \
{pf_vfs: <{pf_vfs_length}} \
{end_bold}'.format(
bold='',
end_bold='',
pf_phy_length=pf_phy_length,
pf_mtu_length=pf_mtu_length,
pf_vfs_length=pf_vfs_length,
pf_phy='',
pf_mtu='',
pf_vfs=', '.join(nice_vfs_list[idx]))
)
return '\n'.join(pf_list_output)
def format_list_sriov_vf(vf_list):
# Handle when we get an empty entry
if not vf_list:
vf_list = list()
vf_list_output = []
# Determine optimal column widths
vf_phy_length = 4
vf_pf_length = 3
vf_mtu_length = 4
vf_mac_length = 11
vf_used_length = 5
vf_domain_length = 5
for vf_information in vf_list:
# phy column
_vf_phy_length = len(str(vf_information['phy'])) + 1
if _vf_phy_length > vf_phy_length:
vf_phy_length = _vf_phy_length
# pf column
_vf_pf_length = len(str(vf_information['pf'])) + 1
if _vf_pf_length > vf_pf_length:
vf_pf_length = _vf_pf_length
# mtu column
_vf_mtu_length = len(str(vf_information['mtu'])) + 1
if _vf_mtu_length > vf_mtu_length:
vf_mtu_length = _vf_mtu_length
# mac column
_vf_mac_length = len(str(vf_information['mac'])) + 1
if _vf_mac_length > vf_mac_length:
vf_mac_length = _vf_mac_length
# used column
_vf_used_length = len(str(vf_information['usage']['used'])) + 1
if _vf_used_length > vf_used_length:
vf_used_length = _vf_used_length
# domain column
_vf_domain_length = len(str(vf_information['usage']['domain'])) + 1
if _vf_domain_length > vf_domain_length:
vf_domain_length = _vf_domain_length
# Format the string (header)
vf_list_output.append('{bold}{vf_header: <{vf_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
vf_header_length=vf_phy_length + vf_pf_length + vf_mtu_length + vf_mac_length + vf_used_length + vf_domain_length + 5,
vf_header='VFs ' + ''.join(['-' for _ in range(4, vf_phy_length + vf_pf_length + vf_mtu_length + vf_mac_length + vf_used_length + vf_domain_length + 4)]))
)
vf_list_output.append('{bold}\
{vf_phy: <{vf_phy_length}} \
{vf_pf: <{vf_pf_length}} \
{vf_mtu: <{vf_mtu_length}} \
{vf_mac: <{vf_mac_length}} \
{vf_used: <{vf_used_length}} \
{vf_domain: <{vf_domain_length}} \
{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
vf_phy_length=vf_phy_length,
vf_pf_length=vf_pf_length,
vf_mtu_length=vf_mtu_length,
vf_mac_length=vf_mac_length,
vf_used_length=vf_used_length,
vf_domain_length=vf_domain_length,
vf_phy='Device',
vf_pf='PF',
vf_mtu='MTU',
vf_mac='MAC Address',
vf_used='Used',
vf_domain='Domain')
)
for vf_information in sorted(vf_list, key=lambda v: v['phy']):
vf_domain = vf_information['usage']['domain']
if not vf_domain:
vf_domain = 'N/A'
vf_list_output.append('{bold}\
{vf_phy: <{vf_phy_length}} \
{vf_pf: <{vf_pf_length}} \
{vf_mtu: <{vf_mtu_length}} \
{vf_mac: <{vf_mac_length}} \
{vf_used: <{vf_used_length}} \
{vf_domain: <{vf_domain_length}} \
{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
vf_phy_length=vf_phy_length,
vf_pf_length=vf_pf_length,
vf_mtu_length=vf_mtu_length,
vf_mac_length=vf_mac_length,
vf_used_length=vf_used_length,
vf_domain_length=vf_domain_length,
vf_phy=vf_information['phy'],
vf_pf=vf_information['pf'],
vf_mtu=vf_information['mtu'],
vf_mac=vf_information['mac'],
vf_used=vf_information['usage']['used'],
vf_domain=vf_domain)
)
return '\n'.join(vf_list_output)
def format_info_sriov_vf(config, vf_information, node):
if not vf_information:
return "No VF found"
# Get information on the using VM if applicable
if vf_information['usage']['used'] == 'True' and vf_information['usage']['domain']:
vm_information = call_api(config, 'get', '/vm/{vm}'.format(vm=vf_information['usage']['domain'])).json()
if isinstance(vm_information, list) and len(vm_information) > 0:
vm_information = vm_information[0]
else:
vm_information = None
# Format a nice output: do this line-by-line then concat the elements at the end
ainformation = []
ainformation.append('{}SR-IOV VF information:{}'.format(ansiprint.bold(), ansiprint.end()))
ainformation.append('')
# Basic information
ainformation.append('{}PHY:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['phy']))
ainformation.append('{}PF:{} {} @ {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['pf'], node))
ainformation.append('{}MTU:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['mtu']))
ainformation.append('{}MAC Address:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['mac']))
ainformation.append('')
# Configuration information
ainformation.append('{}vLAN ID:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['config']['vlan_id']))
ainformation.append('{}vLAN QOS priority:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['config']['vlan_qos']))
ainformation.append('{}Minimum TX Rate:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['config']['tx_rate_min']))
ainformation.append('{}Maximum TX Rate:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['config']['tx_rate_max']))
ainformation.append('{}Link State:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['config']['link_state']))
ainformation.append('{}Spoof Checking:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), getColour(vf_information['config']['spoof_check']), vf_information['config']['spoof_check'], ansiprint.end()))
ainformation.append('{}VF User Trust:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), getColour(vf_information['config']['trust']), vf_information['config']['trust'], ansiprint.end()))
ainformation.append('{}Query RSS Config:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), getColour(vf_information['config']['query_rss']), vf_information['config']['query_rss'], ansiprint.end()))
ainformation.append('')
# PCIe bus information
ainformation.append('{}PCIe domain:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['pci']['domain']))
ainformation.append('{}PCIe bus:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['pci']['bus']))
ainformation.append('{}PCIe slot:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['pci']['slot']))
ainformation.append('{}PCIe function:{} {}'.format(ansiprint.purple(), ansiprint.end(), vf_information['pci']['function']))
ainformation.append('')
# Usage information
ainformation.append('{}VF Used:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), getColour(vf_information['usage']['used']), vf_information['usage']['used'], ansiprint.end()))
if vf_information['usage']['used'] == 'True' and vm_information is not None:
ainformation.append('{}Using Domain:{} {} ({}) ({}{}{})'.format(ansiprint.purple(), ansiprint.end(), vf_information['usage']['domain'], vm_information['name'], getColour(vm_information['state']), vm_information['state'], ansiprint.end()))
else:
ainformation.append('{}Using Domain:{} N/A'.format(ansiprint.purple(), ansiprint.end()))
# Join it all together
return '\n'.join(ainformation)

View File

@ -19,8 +19,10 @@
#
###############################################################################
import cli_lib.ansiprint as ansiprint
from cli_lib.common import call_api
import time
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import call_api
#
@ -69,6 +71,89 @@ def node_domain_state(config, node, action, wait):
return retstatus, response.json().get('message', '')
def view_node_log(config, node, lines=100):
"""
Return node log lines from the API (and display them in a pager in the main CLI)
API endpoint: GET /node/{node}/log
API arguments: lines={lines}
API schema: {"name":"{node}","data":"{node_log}"}
"""
params = {
'lines': lines
}
response = call_api(config, 'get', '/node/{node}/log'.format(node=node), params=params)
if response.status_code != 200:
return False, response.json().get('message', '')
node_log = response.json()['data']
# Shrink the log buffer to length lines
shrunk_log = node_log.split('\n')[-lines:]
loglines = '\n'.join(shrunk_log)
return True, loglines
def follow_node_log(config, node, lines=10):
"""
Return and follow node log lines from the API
API endpoint: GET /node/{node}/log
API arguments: lines={lines}
API schema: {"name":"{nodename}","data":"{node_log}"}
"""
# We always grab 200 to match the follow call, but only _show_ `lines` number
params = {
'lines': 200
}
response = call_api(config, 'get', '/node/{node}/log'.format(node=node), params=params)
if response.status_code != 200:
return False, response.json().get('message', '')
# Shrink the log buffer to length lines
node_log = response.json()['data']
shrunk_log = node_log.split('\n')[-int(lines):]
loglines = '\n'.join(shrunk_log)
# Print the initial data and begin following
print(loglines, end='')
print('\n', end='')
while True:
# Grab the next line set (200 is a reasonable number of lines per half-second; any more are skipped)
try:
params = {
'lines': 200
}
response = call_api(config, 'get', '/node/{node}/log'.format(node=node), params=params)
new_node_log = response.json()['data']
except Exception:
break
# Split the new and old log strings into constitutent lines
old_node_loglines = node_log.split('\n')
new_node_loglines = new_node_log.split('\n')
# Set the node log to the new log value for the next iteration
node_log = new_node_log
# Get the difference between the two sets of lines
old_node_loglines_set = set(old_node_loglines)
diff_node_loglines = [x for x in new_node_loglines if x not in old_node_loglines_set]
# If there's a difference, print it out
if len(diff_node_loglines) > 0:
print('\n'.join(diff_node_loglines), end='')
print('\n', end='')
# Wait half a second
time.sleep(0.5)
return True, ''
def node_info(config, node):
"""
Get information about node
@ -169,6 +254,7 @@ def format_info(node_information, long_output):
ainformation = []
# Basic information
ainformation.append('{}Name:{} {}'.format(ansiprint.purple(), ansiprint.end(), node_information['name']))
ainformation.append('{}PVC Version:{} {}'.format(ansiprint.purple(), ansiprint.end(), node_information['pvc_version']))
ainformation.append('{}Daemon State:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), daemon_state_colour, node_information['daemon_state'], ansiprint.end()))
ainformation.append('{}Coordinator State:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), coordinator_state_colour, node_information['coordinator_state'], ansiprint.end()))
ainformation.append('{}Domain State:{} {}{}{}'.format(ansiprint.purple(), ansiprint.end(), domain_state_colour, node_information['domain_state'], ansiprint.end()))
@ -204,9 +290,10 @@ def format_list(node_list, raw):
# Determine optimal column widths
node_name_length = 5
pvc_version_length = 8
daemon_state_length = 7
coordinator_state_length = 12
domain_state_length = 8
domain_state_length = 7
domains_count_length = 4
cpu_count_length = 6
load_length = 5
@ -220,6 +307,10 @@ def format_list(node_list, raw):
_node_name_length = len(node_information['name']) + 1
if _node_name_length > node_name_length:
node_name_length = _node_name_length
# node_pvc_version column
_pvc_version_length = len(node_information.get('pvc_version', 'N/A')) + 1
if _pvc_version_length > pvc_version_length:
pvc_version_length = _pvc_version_length
# daemon_state column
_daemon_state_length = len(node_information['daemon_state']) + 1
if _daemon_state_length > daemon_state_length:
@ -268,11 +359,27 @@ def format_list(node_list, raw):
# Format the string (header)
node_list_output.append(
'{bold}{node_name: <{node_name_length}} \
St: {daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
Res: {node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
Mem (M): {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {node_mem_allocated: <{mem_alloc_length}} {node_mem_provisioned: <{mem_prov_length}}{end_bold}'.format(
'{bold}{node_header: <{node_header_length}} {state_header: <{state_header_length}} {resource_header: <{resource_header_length}} {memory_header: <{memory_header_length}}{end_bold}'.format(
node_header_length=node_name_length + pvc_version_length + 1,
state_header_length=daemon_state_length + coordinator_state_length + domain_state_length + 2,
resource_header_length=domains_count_length + cpu_count_length + load_length + 2,
memory_header_length=mem_total_length + mem_used_length + mem_free_length + mem_alloc_length + mem_prov_length + 4,
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
node_header='Nodes ' + ''.join(['-' for _ in range(6, node_name_length + pvc_version_length)]),
state_header='States ' + ''.join(['-' for _ in range(7, daemon_state_length + coordinator_state_length + domain_state_length + 1)]),
resource_header='Resources ' + ''.join(['-' for _ in range(10, domains_count_length + cpu_count_length + load_length + 1)]),
memory_header='Memory (M) ' + ''.join(['-' for _ in range(11, mem_total_length + mem_used_length + mem_free_length + mem_alloc_length + mem_prov_length + 3)])
)
)
node_list_output.append(
'{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} \
{daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
{node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
{node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {node_mem_allocated: <{mem_alloc_length}} {node_mem_provisioned: <{mem_prov_length}}{end_bold}'.format(
node_name_length=node_name_length,
pvc_version_length=pvc_version_length,
daemon_state_length=daemon_state_length,
coordinator_state_length=coordinator_state_length,
domain_state_length=domain_state_length,
@ -291,6 +398,7 @@ Mem (M): {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length
domain_state_colour='',
end_colour='',
node_name='Name',
node_pvc_version='Version',
node_daemon_state='Daemon',
node_coordinator_state='Coordinator',
node_domain_state='Domain',
@ -306,14 +414,15 @@ Mem (M): {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length
)
# Format the string (elements)
for node_information in node_list:
for node_information in sorted(node_list, key=lambda n: n['name']):
daemon_state_colour, coordinator_state_colour, domain_state_colour, mem_allocated_colour, mem_provisioned_colour = getOutputColours(node_information)
node_list_output.append(
'{bold}{node_name: <{node_name_length}} \
{daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
{node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
{node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {mem_allocated_colour}{node_mem_allocated: <{mem_alloc_length}}{end_colour} {mem_provisioned_colour}{node_mem_provisioned: <{mem_prov_length}}{end_colour}{end_bold}'.format(
'{bold}{node_name: <{node_name_length}} {node_pvc_version: <{pvc_version_length}} \
{daemon_state_colour}{node_daemon_state: <{daemon_state_length}}{end_colour} {coordinator_state_colour}{node_coordinator_state: <{coordinator_state_length}}{end_colour} {domain_state_colour}{node_domain_state: <{domain_state_length}}{end_colour} \
{node_domains_count: <{domains_count_length}} {node_cpu_count: <{cpu_count_length}} {node_load: <{load_length}} \
{node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length}} {node_mem_free: <{mem_free_length}} {mem_allocated_colour}{node_mem_allocated: <{mem_alloc_length}}{end_colour} {mem_provisioned_colour}{node_mem_provisioned: <{mem_prov_length}}{end_colour}{end_bold}'.format(
node_name_length=node_name_length,
pvc_version_length=pvc_version_length,
daemon_state_length=daemon_state_length,
coordinator_state_length=coordinator_state_length,
domain_state_length=domain_state_length,
@ -334,6 +443,7 @@ Mem (M): {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length
mem_provisioned_colour=mem_allocated_colour,
end_colour=ansiprint.end(),
node_name=node_information['name'],
node_pvc_version=node_information.get('pvc_version', 'N/A'),
node_daemon_state=node_information['daemon_state'],
node_coordinator_state=node_information['coordinator_state'],
node_domain_state=node_information['domain_state'],
@ -348,4 +458,4 @@ Mem (M): {node_mem_total: <{mem_total_length}} {node_mem_used: <{mem_used_length
)
)
return '\n'.join(sorted(node_list_output))
return '\n'.join(node_list_output)

View File

@ -19,12 +19,10 @@
#
###############################################################################
import ast
from requests_toolbelt.multipart.encoder import MultipartEncoder, MultipartEncoderMonitor
import cli_lib.ansiprint as ansiprint
from cli_lib.common import UploadProgressBar, call_api
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import UploadProgressBar, call_api
#
@ -721,10 +719,10 @@ def task_status(config, task_id=None, is_watching=False):
task['type'] = task_type
task['worker'] = task_host
task['id'] = task_job.get('id')
task_args = ast.literal_eval(task_job.get('args'))
task_args = task_job.get('args')
task['vm_name'] = task_args[0]
task['vm_profile'] = task_args[1]
task_kwargs = ast.literal_eval(task_job.get('kwargs'))
task_kwargs = task_job.get('kwargs')
task['vm_define'] = str(bool(task_kwargs['define_vm']))
task['vm_start'] = str(bool(task_kwargs['start_vm']))
task_data.append(task)
@ -755,22 +753,16 @@ def format_list_template(template_data, template_type=None):
normalized_template_data = template_data
if 'system' in template_types:
ainformation.append('System templates:')
ainformation.append('')
ainformation.append(format_list_template_system(normalized_template_data['system_templates']))
if len(template_types) > 1:
ainformation.append('')
if 'network' in template_types:
ainformation.append('Network templates:')
ainformation.append('')
ainformation.append(format_list_template_network(normalized_template_data['network_templates']))
if len(template_types) > 1:
ainformation.append('')
if 'storage' in template_types:
ainformation.append('Storage templates:')
ainformation.append('')
ainformation.append(format_list_template_storage(normalized_template_data['storage_templates']))
return '\n'.join(ainformation)
@ -783,13 +775,13 @@ def format_list_template_system(template_data):
template_list_output = []
# Determine optimal column widths
template_name_length = 5
template_id_length = 3
template_name_length = 15
template_id_length = 5
template_vcpu_length = 6
template_vram_length = 10
template_vram_length = 9
template_serial_length = 7
template_vnc_length = 4
template_vnc_bind_length = 10
template_vnc_bind_length = 9
template_node_limit_length = 6
template_node_selector_length = 9
template_node_autostart_length = 10
@ -842,16 +834,33 @@ def format_list_template_system(template_data):
template_migration_method_length = _template_migration_method_length
# Format the string (header)
template_list_output_header = '{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
template_list_output.append('{bold}{template_header: <{template_header_length}} {resources_header: <{resources_header_length}} {consoles_header: <{consoles_header_length}} {metadata_header: <{metadata_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
template_header_length=template_name_length + template_id_length + 1,
resources_header_length=template_vcpu_length + template_vram_length + 1,
consoles_header_length=template_serial_length + template_vnc_length + template_vnc_bind_length + 2,
metadata_header_length=template_node_limit_length + template_node_selector_length + template_node_autostart_length + template_migration_method_length + 3,
template_header='System Templates ' + ''.join(['-' for _ in range(17, template_name_length + template_id_length)]),
resources_header='Resources ' + ''.join(['-' for _ in range(10, template_vcpu_length + template_vram_length)]),
consoles_header='Consoles ' + ''.join(['-' for _ in range(9, template_serial_length + template_vnc_length + template_vnc_bind_length + 1)]),
metadata_header='Metadata ' + ''.join(['-' for _ in range(9, template_node_limit_length + template_node_selector_length + template_node_autostart_length + template_migration_method_length + 2)]))
)
template_list_output.append('{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
{template_vcpu: <{template_vcpu_length}} \
{template_vram: <{template_vram_length}} \
Console: {template_serial: <{template_serial_length}} \
{template_serial: <{template_serial_length}} \
{template_vnc: <{template_vnc_length}} \
{template_vnc_bind: <{template_vnc_bind_length}} \
Meta: {template_node_limit: <{template_node_limit_length}} \
{template_node_limit: <{template_node_limit_length}} \
{template_node_selector: <{template_node_selector_length}} \
{template_node_autostart: <{template_node_autostart_length}} \
{template_migration_method: <{template_migration_method_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
template_state_colour='',
end_colour='',
template_name_length=template_name_length,
template_id_length=template_id_length,
template_vcpu_length=template_vcpu_length,
@ -863,14 +872,10 @@ Meta: {template_node_limit: <{template_node_limit_length}} \
template_node_selector_length=template_node_selector_length,
template_node_autostart_length=template_node_autostart_length,
template_migration_method_length=template_migration_method_length,
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
template_state_colour='',
end_colour='',
template_name='Name',
template_id='ID',
template_vcpu='vCPUs',
template_vram='vRAM [MB]',
template_vram='vRAM [M]',
template_serial='Serial',
template_vnc='VNC',
template_vnc_bind='VNC bind',
@ -878,6 +883,7 @@ Meta: {template_node_limit: <{template_node_limit_length}} \
template_node_selector='Selector',
template_node_autostart='Autostart',
template_migration_method='Migration')
)
# Format the string (elements)
for template in sorted(template_data, key=lambda i: i.get('name', None)):
@ -885,10 +891,10 @@ Meta: {template_node_limit: <{template_node_limit_length}} \
'{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
{template_vcpu: <{template_vcpu_length}} \
{template_vram: <{template_vram_length}} \
{template_serial: <{template_serial_length}} \
{template_serial: <{template_serial_length}} \
{template_vnc: <{template_vnc_length}} \
{template_vnc_bind: <{template_vnc_bind_length}} \
{template_node_limit: <{template_node_limit_length}} \
{template_node_limit: <{template_node_limit_length}} \
{template_node_selector: <{template_node_selector_length}} \
{template_node_autostart: <{template_node_autostart_length}} \
{template_migration_method: <{template_migration_method_length}}{end_bold}'.format(
@ -919,9 +925,7 @@ Meta: {template_node_limit: <{template_node_limit_length}} \
)
)
return '\n'.join([template_list_output_header] + template_list_output)
return True, ''
return '\n'.join(template_list_output)
def format_list_template_network(template_template):
@ -931,8 +935,8 @@ def format_list_template_network(template_template):
template_list_output = []
# Determine optimal column widths
template_name_length = 5
template_id_length = 3
template_name_length = 18
template_id_length = 5
template_mac_template_length = 13
template_networks_length = 10
@ -962,7 +966,16 @@ def format_list_template_network(template_template):
template_networks_length = _template_networks_length
# Format the string (header)
template_list_output_header = '{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
template_list_output.append('{bold}{template_header: <{template_header_length}} {details_header: <{details_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
template_header_length=template_name_length + template_id_length + 1,
details_header_length=template_mac_template_length + template_networks_length + 1,
template_header='Network Templates ' + ''.join(['-' for _ in range(18, template_name_length + template_id_length)]),
details_header='Details ' + ''.join(['-' for _ in range(8, template_mac_template_length + template_networks_length)]))
)
template_list_output.append('{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
{template_mac_template: <{template_mac_template_length}} \
{template_networks: <{template_networks_length}}{end_bold}'.format(
template_name_length=template_name_length,
@ -975,6 +988,7 @@ def format_list_template_network(template_template):
template_id='ID',
template_mac_template='MAC template',
template_networks='Network VNIs')
)
# Format the string (elements)
for template in sorted(template_template, key=lambda i: i.get('name', None)):
@ -995,7 +1009,7 @@ def format_list_template_network(template_template):
)
)
return '\n'.join([template_list_output_header] + template_list_output)
return '\n'.join(template_list_output)
def format_list_template_storage(template_template):
@ -1005,12 +1019,12 @@ def format_list_template_storage(template_template):
template_list_output = []
# Determine optimal column widths
template_name_length = 5
template_id_length = 3
template_name_length = 18
template_id_length = 5
template_disk_id_length = 8
template_disk_pool_length = 8
template_disk_pool_length = 5
template_disk_source_length = 14
template_disk_size_length = 10
template_disk_size_length = 9
template_disk_filesystem_length = 11
template_disk_fsargs_length = 10
template_disk_mountpoint_length = 10
@ -1056,7 +1070,16 @@ def format_list_template_storage(template_template):
template_disk_mountpoint_length = _template_disk_mountpoint_length
# Format the string (header)
template_list_output_header = '{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
template_list_output.append('{bold}{template_header: <{template_header_length}} {details_header: <{details_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
template_header_length=template_name_length + template_id_length + 1,
details_header_length=template_disk_id_length + template_disk_pool_length + template_disk_source_length + template_disk_size_length + template_disk_filesystem_length + template_disk_fsargs_length + template_disk_mountpoint_length + 7,
template_header='Storage Templates ' + ''.join(['-' for _ in range(18, template_name_length + template_id_length)]),
details_header='Details ' + ''.join(['-' for _ in range(8, template_disk_id_length + template_disk_pool_length + template_disk_source_length + template_disk_size_length + template_disk_filesystem_length + template_disk_fsargs_length + template_disk_mountpoint_length + 6)]))
)
template_list_output.append('{bold}{template_name: <{template_name_length}} {template_id: <{template_id_length}} \
{template_disk_id: <{template_disk_id_length}} \
{template_disk_pool: <{template_disk_pool_length}} \
{template_disk_source: <{template_disk_source_length}} \
@ -1080,10 +1103,11 @@ def format_list_template_storage(template_template):
template_disk_id='Disk ID',
template_disk_pool='Pool',
template_disk_source='Source Volume',
template_disk_size='Size [GB]',
template_disk_size='Size [G]',
template_disk_filesystem='Filesystem',
template_disk_fsargs='Arguments',
template_disk_mountpoint='Mountpoint')
)
# Format the string (elements)
for template in sorted(template_template, key=lambda i: i.get('name', None)):
@ -1130,7 +1154,7 @@ def format_list_template_storage(template_template):
)
)
return '\n'.join([template_list_output_header] + template_list_output)
return '\n'.join(template_list_output)
def format_list_userdata(userdata_data, lines=None):
@ -1140,8 +1164,9 @@ def format_list_userdata(userdata_data, lines=None):
userdata_list_output = []
# Determine optimal column widths
userdata_name_length = 5
userdata_id_length = 3
userdata_name_length = 12
userdata_id_length = 5
userdata_document_length = 92 - userdata_name_length - userdata_id_length
for userdata in userdata_data:
# userdata_name column
@ -1154,7 +1179,14 @@ def format_list_userdata(userdata_data, lines=None):
userdata_id_length = _userdata_id_length
# Format the string (header)
userdata_list_output_header = '{bold}{userdata_name: <{userdata_name_length}} {userdata_id: <{userdata_id_length}} \
userdata_list_output.append('{bold}{userdata_header: <{userdata_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
userdata_header_length=userdata_name_length + userdata_id_length + userdata_document_length + 2,
userdata_header='Userdata ' + ''.join(['-' for _ in range(9, userdata_name_length + userdata_id_length + userdata_document_length + 1)]))
)
userdata_list_output.append('{bold}{userdata_name: <{userdata_name_length}} {userdata_id: <{userdata_id_length}} \
{userdata_data}{end_bold}'.format(
userdata_name_length=userdata_name_length,
userdata_id_length=userdata_id_length,
@ -1163,6 +1195,7 @@ def format_list_userdata(userdata_data, lines=None):
userdata_name='Name',
userdata_id='ID',
userdata_data='Document')
)
# Format the string (elements)
for data in sorted(userdata_data, key=lambda i: i.get('name', None)):
@ -1204,7 +1237,7 @@ def format_list_userdata(userdata_data, lines=None):
)
)
return '\n'.join([userdata_list_output_header] + userdata_list_output)
return '\n'.join(userdata_list_output)
def format_list_script(script_data, lines=None):
@ -1214,8 +1247,9 @@ def format_list_script(script_data, lines=None):
script_list_output = []
# Determine optimal column widths
script_name_length = 5
script_id_length = 3
script_name_length = 12
script_id_length = 5
script_data_length = 92 - script_name_length - script_id_length
for script in script_data:
# script_name column
@ -1228,7 +1262,14 @@ def format_list_script(script_data, lines=None):
script_id_length = _script_id_length
# Format the string (header)
script_list_output_header = '{bold}{script_name: <{script_name_length}} {script_id: <{script_id_length}} \
script_list_output.append('{bold}{script_header: <{script_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
script_header_length=script_name_length + script_id_length + script_data_length + 2,
script_header='Script ' + ''.join(['-' for _ in range(7, script_name_length + script_id_length + script_data_length + 1)]))
)
script_list_output.append('{bold}{script_name: <{script_name_length}} {script_id: <{script_id_length}} \
{script_data}{end_bold}'.format(
script_name_length=script_name_length,
script_id_length=script_id_length,
@ -1237,6 +1278,7 @@ def format_list_script(script_data, lines=None):
script_name='Name',
script_id='ID',
script_data='Script')
)
# Format the string (elements)
for script in sorted(script_data, key=lambda i: i.get('name', None)):
@ -1278,7 +1320,7 @@ def format_list_script(script_data, lines=None):
)
)
return '\n'.join([script_list_output_header] + script_list_output)
return '\n'.join(script_list_output)
def format_list_ova(ova_data):
@ -1288,8 +1330,8 @@ def format_list_ova(ova_data):
ova_list_output = []
# Determine optimal column widths
ova_name_length = 5
ova_id_length = 3
ova_name_length = 18
ova_id_length = 5
ova_disk_id_length = 8
ova_disk_size_length = 10
ova_disk_pool_length = 5
@ -1329,7 +1371,16 @@ def format_list_ova(ova_data):
ova_disk_volume_name_length = _ova_disk_volume_name_length
# Format the string (header)
ova_list_output_header = '{bold}{ova_name: <{ova_name_length}} {ova_id: <{ova_id_length}} \
ova_list_output.append('{bold}{ova_header: <{ova_header_length}} {details_header: <{details_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
ova_header_length=ova_name_length + ova_id_length + 1,
details_header_length=ova_disk_id_length + ova_disk_size_length + ova_disk_pool_length + ova_disk_volume_format_length + ova_disk_volume_name_length + 4,
ova_header='OVAs ' + ''.join(['-' for _ in range(5, ova_name_length + ova_id_length)]),
details_header='Details ' + ''.join(['-' for _ in range(8, ova_disk_id_length + ova_disk_size_length + ova_disk_pool_length + ova_disk_volume_format_length + ova_disk_volume_name_length + 3)]))
)
ova_list_output.append('{bold}{ova_name: <{ova_name_length}} {ova_id: <{ova_id_length}} \
{ova_disk_id: <{ova_disk_id_length}} \
{ova_disk_size: <{ova_disk_size_length}} \
{ova_disk_pool: <{ova_disk_pool_length}} \
@ -1351,6 +1402,7 @@ def format_list_ova(ova_data):
ova_disk_pool='Pool',
ova_disk_volume_format='Format',
ova_disk_volume_name='Source Volume')
)
# Format the string (elements)
for ova in sorted(ova_data, key=lambda i: i.get('name', None)):
@ -1391,7 +1443,7 @@ def format_list_ova(ova_data):
)
)
return '\n'.join([ova_list_output_header] + ova_list_output)
return '\n'.join(ova_list_output)
def format_list_profile(profile_data):
@ -1411,8 +1463,8 @@ def format_list_profile(profile_data):
profile_list_output = []
# Determine optimal column widths
profile_name_length = 5
profile_id_length = 3
profile_name_length = 18
profile_id_length = 5
profile_source_length = 7
profile_system_template_length = 7
@ -1420,6 +1472,7 @@ def format_list_profile(profile_data):
profile_storage_template_length = 8
profile_userdata_length = 9
profile_script_length = 7
profile_arguments_length = 18
for profile in profile_data:
# profile_name column
@ -1456,11 +1509,22 @@ def format_list_profile(profile_data):
profile_script_length = _profile_script_length
# Format the string (header)
profile_list_output_header = '{bold}{profile_name: <{profile_name_length}} {profile_id: <{profile_id_length}} {profile_source: <{profile_source_length}} \
Templates: {profile_system_template: <{profile_system_template_length}} \
profile_list_output.append('{bold}{profile_header: <{profile_header_length}} {templates_header: <{templates_header_length}} {data_header: <{data_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
profile_header_length=profile_name_length + profile_id_length + profile_source_length + 2,
templates_header_length=profile_system_template_length + profile_network_template_length + profile_storage_template_length + 2,
data_header_length=profile_userdata_length + profile_script_length + profile_arguments_length + 2,
profile_header='Profiles ' + ''.join(['-' for _ in range(9, profile_name_length + profile_id_length + profile_source_length + 1)]),
templates_header='Templates ' + ''.join(['-' for _ in range(10, profile_system_template_length + profile_network_template_length + profile_storage_template_length + 1)]),
data_header='Data ' + ''.join(['-' for _ in range(5, profile_userdata_length + profile_script_length + profile_arguments_length + 1)]))
)
profile_list_output.append('{bold}{profile_name: <{profile_name_length}} {profile_id: <{profile_id_length}} {profile_source: <{profile_source_length}} \
{profile_system_template: <{profile_system_template_length}} \
{profile_network_template: <{profile_network_template_length}} \
{profile_storage_template: <{profile_storage_template_length}} \
Data: {profile_userdata: <{profile_userdata_length}} \
{profile_userdata: <{profile_userdata_length}} \
{profile_script: <{profile_script_length}} \
{profile_arguments}{end_bold}'.format(
profile_name_length=profile_name_length,
@ -1482,15 +1546,19 @@ Data: {profile_userdata: <{profile_userdata_length}} \
profile_userdata='Userdata',
profile_script='Script',
profile_arguments='Script Arguments')
)
# Format the string (elements)
for profile in sorted(profile_data, key=lambda i: i.get('name', None)):
arguments_list = ', '.join(profile['arguments'])
if not arguments_list:
arguments_list = 'N/A'
profile_list_output.append(
'{bold}{profile_name: <{profile_name_length}} {profile_id: <{profile_id_length}} {profile_source: <{profile_source_length}} \
{profile_system_template: <{profile_system_template_length}} \
{profile_system_template: <{profile_system_template_length}} \
{profile_network_template: <{profile_network_template_length}} \
{profile_storage_template: <{profile_storage_template_length}} \
{profile_userdata: <{profile_userdata_length}} \
{profile_userdata: <{profile_userdata_length}} \
{profile_script: <{profile_script_length}} \
{profile_arguments}{end_bold}'.format(
profile_name_length=profile_name_length,
@ -1511,11 +1579,11 @@ Data: {profile_userdata: <{profile_userdata_length}} \
profile_storage_template=profile['storage_template'],
profile_userdata=profile['userdata'],
profile_script=profile['script'],
profile_arguments=', '.join(profile['arguments'])
profile_arguments=arguments_list,
)
)
return '\n'.join([profile_list_output_header] + profile_list_output)
return '\n'.join(profile_list_output)
def format_list_task(task_data):
@ -1524,17 +1592,21 @@ def format_list_task(task_data):
# Determine optimal column widths
task_id_length = 7
task_type_length = 7
task_worker_length = 7
task_vm_name_length = 5
task_vm_profile_length = 8
task_vm_define_length = 8
task_vm_start_length = 7
task_worker_length = 8
for task in task_data:
# task_id column
_task_id_length = len(str(task['id'])) + 1
if _task_id_length > task_id_length:
task_id_length = _task_id_length
# task_worker column
_task_worker_length = len(str(task['worker'])) + 1
if _task_worker_length > task_worker_length:
task_worker_length = _task_worker_length
# task_type column
_task_type_length = len(str(task['type'])) + 1
if _task_type_length > task_type_length:
@ -1555,15 +1627,20 @@ def format_list_task(task_data):
_task_vm_start_length = len(str(task['vm_start'])) + 1
if _task_vm_start_length > task_vm_start_length:
task_vm_start_length = _task_vm_start_length
# task_worker column
_task_worker_length = len(str(task['worker'])) + 1
if _task_worker_length > task_worker_length:
task_worker_length = _task_worker_length
# Format the string (header)
task_list_output_header = '{bold}{task_id: <{task_id_length}} {task_type: <{task_type_length}} \
task_list_output.append('{bold}{task_header: <{task_header_length}} {vms_header: <{vms_header_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
task_header_length=task_id_length + task_type_length + task_worker_length + 2,
vms_header_length=task_vm_name_length + task_vm_profile_length + task_vm_define_length + task_vm_start_length + 3,
task_header='Tasks ' + ''.join(['-' for _ in range(6, task_id_length + task_type_length + task_worker_length + 1)]),
vms_header='VM Details ' + ''.join(['-' for _ in range(11, task_vm_name_length + task_vm_profile_length + task_vm_define_length + task_vm_start_length + 2)]))
)
task_list_output.append('{bold}{task_id: <{task_id_length}} {task_type: <{task_type_length}} \
{task_worker: <{task_worker_length}} \
VM: {task_vm_name: <{task_vm_name_length}} \
{task_vm_name: <{task_vm_name_length}} \
{task_vm_profile: <{task_vm_profile_length}} \
{task_vm_define: <{task_vm_define_length}} \
{task_vm_start: <{task_vm_start_length}}{end_bold}'.format(
@ -1583,13 +1660,14 @@ VM: {task_vm_name: <{task_vm_name_length}} \
task_vm_profile='Profile',
task_vm_define='Define?',
task_vm_start='Start?')
)
# Format the string (elements)
for task in sorted(task_data, key=lambda i: i.get('type', None)):
task_list_output.append(
'{bold}{task_id: <{task_id_length}} {task_type: <{task_type_length}} \
{task_worker: <{task_worker_length}} \
{task_vm_name: <{task_vm_name_length}} \
{task_vm_name: <{task_vm_name_length}} \
{task_vm_profile: <{task_vm_profile_length}} \
{task_vm_define: <{task_vm_define_length}} \
{task_vm_start: <{task_vm_start_length}}{end_bold}'.format(
@ -1612,4 +1690,4 @@ VM: {task_vm_name: <{task_vm_name_length}} \
)
)
return '\n'.join([task_list_output_header] + task_list_output)
return '\n'.join(task_list_output)

View File

@ -22,8 +22,8 @@
import time
import re
import cli_lib.ansiprint as ansiprint
from cli_lib.common import call_api, format_bytes, format_metric
import pvc.cli_lib.ansiprint as ansiprint
from pvc.cli_lib.common import call_api, format_bytes, format_metric
#
@ -54,12 +54,12 @@ def vm_info(config, vm):
return False, response.json().get('message', '')
def vm_list(config, limit, target_node, target_state):
def vm_list(config, limit, target_node, target_state, target_tag):
"""
Get list information about VMs (limited by {limit}, {target_node}, or {target_state})
API endpoint: GET /api/v1/vm
API arguments: limit={limit}, node={target_node}, state={target_state}
API arguments: limit={limit}, node={target_node}, state={target_state}, tag={target_tag}
API schema: [{json_data_object},{json_data_object},etc.]
"""
params = dict()
@ -69,6 +69,8 @@ def vm_list(config, limit, target_node, target_state):
params['node'] = target_node
if target_state:
params['state'] = target_state
if target_tag:
params['tag'] = target_tag
response = call_api(config, 'get', '/vm', params=params)
@ -78,12 +80,12 @@ def vm_list(config, limit, target_node, target_state):
return False, response.json().get('message', '')
def vm_define(config, xml, node, node_limit, node_selector, node_autostart, migration_method):
def vm_define(config, xml, node, node_limit, node_selector, node_autostart, migration_method, user_tags, protected_tags):
"""
Define a new VM on the cluster
API endpoint: POST /vm
API arguments: xml={xml}, node={node}, limit={node_limit}, selector={node_selector}, autostart={node_autostart}, migration_method={migration_method}
API arguments: xml={xml}, node={node}, limit={node_limit}, selector={node_selector}, autostart={node_autostart}, migration_method={migration_method}, user_tags={user_tags}, protected_tags={protected_tags}
API schema: {"message":"{data}"}
"""
params = {
@ -91,7 +93,9 @@ def vm_define(config, xml, node, node_limit, node_selector, node_autostart, migr
'limit': node_limit,
'selector': node_selector,
'autostart': node_autostart,
'migration_method': migration_method
'migration_method': migration_method,
'user_tags': user_tags,
'protected_tags': protected_tags
}
data = {
'xml': xml
@ -155,7 +159,7 @@ def vm_metadata(config, vm, node_limit, node_selector, node_autostart, migration
"""
Modify PVC metadata of a VM
API endpoint: GET /vm/{vm}/meta, POST /vm/{vm}/meta
API endpoint: POST /vm/{vm}/meta
API arguments: limit={node_limit}, selector={node_selector}, autostart={node_autostart}, migration_method={migration_method} profile={provisioner_profile}
API schema: {"message":"{data}"}
"""
@ -188,6 +192,119 @@ def vm_metadata(config, vm, node_limit, node_selector, node_autostart, migration
return retstatus, response.json().get('message', '')
def vm_tags_get(config, vm):
"""
Get PVC tags of a VM
API endpoint: GET /vm/{vm}/tags
API arguments:
API schema: {{"name": "{name}", "type": "{type}"},...}
"""
response = call_api(config, 'get', '/vm/{vm}/tags'.format(vm=vm))
if response.status_code == 200:
retstatus = True
retdata = response.json()
else:
retstatus = False
retdata = response.json().get('message', '')
return retstatus, retdata
def vm_tag_set(config, vm, action, tag, protected=False):
"""
Modify PVC tags of a VM
API endpoint: POST /vm/{vm}/tags
API arguments: action={action}, tag={tag}, protected={protected}
API schema: {"message":"{data}"}
"""
params = {
'action': action,
'tag': tag,
'protected': protected
}
# Update the tags
response = call_api(config, 'post', '/vm/{vm}/tags'.format(vm=vm), params=params)
if response.status_code == 200:
retstatus = True
else:
retstatus = False
return retstatus, response.json().get('message', '')
def format_vm_tags(config, name, tags):
"""
Format the output of a tags dictionary in a nice table
"""
if len(tags) < 1:
return "No tags found."
output_list = []
name_length = 5
_name_length = len(name) + 1
if _name_length > name_length:
name_length = _name_length
tags_name_length = 4
tags_type_length = 5
tags_protected_length = 10
for tag in tags:
_tags_name_length = len(tag['name']) + 1
if _tags_name_length > tags_name_length:
tags_name_length = _tags_name_length
_tags_type_length = len(tag['type']) + 1
if _tags_type_length > tags_type_length:
tags_type_length = _tags_type_length
_tags_protected_length = len(str(tag['protected'])) + 1
if _tags_protected_length > tags_protected_length:
tags_protected_length = _tags_protected_length
output_list.append(
'{bold}{tags_name: <{tags_name_length}} \
{tags_type: <{tags_type_length}} \
{tags_protected: <{tags_protected_length}}{end_bold}'.format(
name_length=name_length,
tags_name_length=tags_name_length,
tags_type_length=tags_type_length,
tags_protected_length=tags_protected_length,
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
tags_name='Name',
tags_type='Type',
tags_protected='Protected'
)
)
for tag in sorted(tags, key=lambda t: t['name']):
output_list.append(
'{bold}{tags_name: <{tags_name_length}} \
{tags_type: <{tags_type_length}} \
{tags_protected: <{tags_protected_length}}{end_bold}'.format(
name_length=name_length,
tags_type_length=tags_type_length,
tags_name_length=tags_name_length,
tags_protected_length=tags_protected_length,
bold='',
end_bold='',
tags_name=tag['name'],
tags_type=tag['type'],
tags_protected=str(tag['protected'])
)
)
return '\n'.join(output_list)
def vm_remove(config, vm, delete_disks=False):
"""
Remove a VM
@ -501,7 +618,7 @@ def format_vm_memory(config, name, memory):
return '\n'.join(output_list)
def vm_networks_add(config, vm, network, macaddr, model, restart):
def vm_networks_add(config, vm, network, macaddr, model, sriov, sriov_mode, restart):
"""
Add a new network to the VM
@ -512,19 +629,21 @@ def vm_networks_add(config, vm, network, macaddr, model, restart):
from lxml.objectify import fromstring
from lxml.etree import tostring
from random import randint
import cli_lib.network as pvc_network
import pvc.cli_lib.network as pvc_network
# Verify that the provided network is valid
retcode, retdata = pvc_network.net_info(config, network)
if not retcode:
# Ignore the three special networks
if network not in ['upstream', 'cluster', 'storage']:
return False, "Network {} is not present in the cluster.".format(network)
# Verify that the provided network is valid (not in SR-IOV mode)
if not sriov:
retcode, retdata = pvc_network.net_info(config, network)
if not retcode:
# Ignore the three special networks
if network not in ['upstream', 'cluster', 'storage']:
return False, "Network {} is not present in the cluster.".format(network)
if network in ['upstream', 'cluster', 'storage']:
br_prefix = 'br'
else:
br_prefix = 'vmbr'
# Set the bridge prefix
if network in ['upstream', 'cluster', 'storage']:
br_prefix = 'br'
else:
br_prefix = 'vmbr'
status, domain_information = vm_info(config, vm)
if not status:
@ -551,24 +670,74 @@ def vm_networks_add(config, vm, network, macaddr, model, restart):
octetC=random_octet_C
)
device_string = '<interface type="bridge"><mac address="{macaddr}"/><source bridge="{bridge}"/><model type="{model}"/></interface>'.format(
macaddr=macaddr,
bridge="{}{}".format(br_prefix, network),
model=model
)
# Add an SR-IOV network
if sriov:
valid, sriov_vf_information = pvc_network.net_sriov_vf_info(config, domain_information['node'], network)
if not valid:
return False, 'Specified SR-IOV VF "{}" does not exist on VM node "{}".'.format(network, domain_information['node'])
# Add a hostdev (direct PCIe) SR-IOV network
if sriov_mode == 'hostdev':
bus_address = 'domain="0x{pci_domain}" bus="0x{pci_bus}" slot="0x{pci_slot}" function="0x{pci_function}"'.format(
pci_domain=sriov_vf_information['pci']['domain'],
pci_bus=sriov_vf_information['pci']['bus'],
pci_slot=sriov_vf_information['pci']['slot'],
pci_function=sriov_vf_information['pci']['function'],
)
device_string = '<interface type="hostdev" managed="yes"><mac address="{macaddr}"/><source><address type="pci" {bus_address}/></source><sriov_device>{network}</sriov_device></interface>'.format(
macaddr=macaddr,
bus_address=bus_address,
network=network
)
# Add a macvtap SR-IOV network
elif sriov_mode == 'macvtap':
device_string = '<interface type="direct"><mac address="{macaddr}"/><source dev="{network}" mode="passthrough"/><model type="{model}"/></interface>'.format(
macaddr=macaddr,
network=network,
model=model
)
else:
return False, "ERROR: Invalid SR-IOV mode specified."
# Add a normal bridged PVC network
else:
device_string = '<interface type="bridge"><mac address="{macaddr}"/><source bridge="{bridge}"/><model type="{model}"/></interface>'.format(
macaddr=macaddr,
bridge="{}{}".format(br_prefix, network),
model=model
)
device_xml = fromstring(device_string)
last_interface = None
all_interfaces = parsed_xml.devices.find('interface')
if all_interfaces is None:
all_interfaces = []
for interface in all_interfaces:
last_interface = re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
if last_interface == network:
return False, 'Network {} is already configured for VM {}.'.format(network, vm)
if last_interface is not None:
for interface in parsed_xml.devices.find('interface'):
if last_interface == re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1):
if sriov:
if sriov_mode == 'hostdev':
if interface.attrib.get('type') == 'hostdev':
interface_address = 'domain="{pci_domain}" bus="{pci_bus}" slot="{pci_slot}" function="{pci_function}"'.format(
pci_domain=interface.source.address.attrib.get('domain'),
pci_bus=interface.source.address.attrib.get('bus'),
pci_slot=interface.source.address.attrib.get('slot'),
pci_function=interface.source.address.attrib.get('function')
)
if interface_address == bus_address:
return False, 'Network "{}" is already configured for VM "{}".'.format(network, vm)
elif sriov_mode == 'macvtap':
if interface.attrib.get('type') == 'direct':
interface_dev = interface.source.attrib.get('dev')
if interface_dev == network:
return False, 'Network "{}" is already configured for VM "{}".'.format(network, vm)
else:
if interface.attrib.get('type') == 'bridge':
interface_vni = re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
if interface_vni == network:
return False, 'Network "{}" is already configured for VM "{}".'.format(network, vm)
# Add the interface at the end of the list (or, right above emulator)
if len(all_interfaces) > 0:
for idx, interface in enumerate(parsed_xml.devices.find('interface')):
if idx == len(all_interfaces) - 1:
interface.addnext(device_xml)
else:
parsed_xml.devices.find('emulator').addprevious(device_xml)
@ -581,7 +750,7 @@ def vm_networks_add(config, vm, network, macaddr, model, restart):
return vm_modify(config, vm, new_xml, restart)
def vm_networks_remove(config, vm, network, restart):
def vm_networks_remove(config, vm, network, sriov, restart):
"""
Remove a network to the VM
@ -605,17 +774,33 @@ def vm_networks_remove(config, vm, network, restart):
except Exception:
return False, 'ERROR: Failed to parse XML data.'
changed = False
for interface in parsed_xml.devices.find('interface'):
if_vni = re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
if network == if_vni:
interface.getparent().remove(interface)
if sriov:
if interface.attrib.get('type') == 'hostdev':
if_dev = str(interface.sriov_device)
if network == if_dev:
interface.getparent().remove(interface)
changed = True
elif interface.attrib.get('type') == 'direct':
if_dev = str(interface.source.attrib.get('dev'))
if network == if_dev:
interface.getparent().remove(interface)
changed = True
else:
if_vni = re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
if network == if_vni:
interface.getparent().remove(interface)
changed = True
if changed:
try:
new_xml = tostring(parsed_xml, pretty_print=True)
except Exception:
return False, 'ERROR: Failed to dump XML data.'
try:
new_xml = tostring(parsed_xml, pretty_print=True)
except Exception:
return False, 'ERROR: Failed to dump XML data.'
return vm_modify(config, vm, new_xml, restart)
return vm_modify(config, vm, new_xml, restart)
else:
return False, 'ERROR: Network "{}" does not exist on VM.'.format(network)
def vm_networks_get(config, vm):
@ -645,7 +830,14 @@ def vm_networks_get(config, vm):
for interface in parsed_xml.devices.find('interface'):
mac_address = interface.mac.attrib.get('address')
model = interface.model.attrib.get('type')
network = re.match(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
interface_type = interface.attrib.get('type')
if interface_type == 'bridge':
network = re.search(r'[vm]*br([0-9a-z]+)', interface.source.attrib.get('bridge')).group(1)
elif interface_type == 'direct':
network = 'macvtap:{}'.format(interface.source.attrib.get('dev'))
elif interface_type == 'hostdev':
network = 'hostdev:{}'.format(interface.source.attrib.get('dev'))
network_data.append((network, mac_address, model))
return True, network_data
@ -732,7 +924,7 @@ def vm_volumes_add(config, vm, volume, disk_id, bus, disk_type, restart):
from lxml.objectify import fromstring
from lxml.etree import tostring
from copy import deepcopy
import cli_lib.ceph as pvc_ceph
import pvc.cli_lib.ceph as pvc_ceph
if disk_type == 'rbd':
# Verify that the provided volume is valid
@ -844,7 +1036,7 @@ def vm_volumes_remove(config, vm, volume, restart):
xml = domain_information.get('xml', None)
if xml is None:
return False, "VM does not have a valid XML doccument."
return False, "VM does not have a valid XML document."
try:
parsed_xml = fromstring(xml)
@ -1023,9 +1215,9 @@ def follow_console_log(config, vm, lines=10):
API arguments: lines={lines}
API schema: {"name":"{vmname}","data":"{console_log}"}
"""
# We always grab 500 to match the follow call, but only _show_ `lines` number
# We always grab 200 to match the follow call, but only _show_ `lines` number
params = {
'lines': 500
'lines': 200
}
response = call_api(config, 'get', '/vm/{vm}/console'.format(vm=vm), params=params)
@ -1041,10 +1233,10 @@ def follow_console_log(config, vm, lines=10):
print(loglines, end='')
while True:
# Grab the next line set (500 is a reasonable number of lines per second; any more are skipped)
# Grab the next line set (200 is a reasonable number of lines per half-second; any more are skipped)
try:
params = {
'lines': 500
'lines': 200
}
response = call_api(config, 'get', '/vm/{vm}/console'.format(vm=vm), params=params)
new_console_log = response.json()['data']
@ -1053,8 +1245,10 @@ def follow_console_log(config, vm, lines=10):
# Split the new and old log strings into constitutent lines
old_console_loglines = console_log.split('\n')
new_console_loglines = new_console_log.split('\n')
# Set the console log to the new log value for the next iteration
console_log = new_console_log
# Remove the lines from the old log until we hit the first line of the new log; this
# ensures that the old log is a string that we can remove from the new log entirely
for index, line in enumerate(old_console_loglines, start=0):
@ -1069,8 +1263,8 @@ def follow_console_log(config, vm, lines=10):
# If there's a difference, print it out
if diff_console_log:
print(diff_console_log, end='')
# Wait a second
time.sleep(1)
# Wait half a second
time.sleep(0.5)
return True, ''
@ -1084,8 +1278,8 @@ def format_info(config, domain_information, long_output):
ainformation.append('{}Virtual machine information:{}'.format(ansiprint.bold(), ansiprint.end()))
ainformation.append('')
# Basic information
ainformation.append('{}UUID:{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['uuid']))
ainformation.append('{}Name:{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['name']))
ainformation.append('{}UUID:{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['uuid']))
ainformation.append('{}Description:{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['description']))
ainformation.append('{}Profile:{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['profile']))
ainformation.append('{}Memory (M):{} {}'.format(ansiprint.purple(), ansiprint.end(), domain_information['memory']))
@ -1173,19 +1367,64 @@ def format_info(config, domain_information, long_output):
ainformation.append('{}Autostart:{} {}'.format(ansiprint.purple(), ansiprint.end(), formatted_node_autostart))
ainformation.append('{}Migration Method:{} {}'.format(ansiprint.purple(), ansiprint.end(), formatted_migration_method))
# Tag list
tags_name_length = 5
tags_type_length = 5
tags_protected_length = 10
for tag in domain_information['tags']:
_tags_name_length = len(tag['name']) + 1
if _tags_name_length > tags_name_length:
tags_name_length = _tags_name_length
_tags_type_length = len(tag['type']) + 1
if _tags_type_length > tags_type_length:
tags_type_length = _tags_type_length
_tags_protected_length = len(str(tag['protected'])) + 1
if _tags_protected_length > tags_protected_length:
tags_protected_length = _tags_protected_length
if len(domain_information['tags']) > 0:
ainformation.append('')
ainformation.append('{purple}Tags:{end} {bold}{tags_name: <{tags_name_length}} {tags_type: <{tags_type_length}} {tags_protected: <{tags_protected_length}}{end}'.format(
purple=ansiprint.purple(),
bold=ansiprint.bold(),
end=ansiprint.end(),
tags_name_length=tags_name_length,
tags_type_length=tags_type_length,
tags_protected_length=tags_protected_length,
tags_name='Name',
tags_type='Type',
tags_protected='Protected'
))
for tag in sorted(domain_information['tags'], key=lambda t: t['type'] + t['name']):
ainformation.append(' {tags_name: <{tags_name_length}} {tags_type: <{tags_type_length}} {tags_protected: <{tags_protected_length}}'.format(
tags_name_length=tags_name_length,
tags_type_length=tags_type_length,
tags_protected_length=tags_protected_length,
tags_name=tag['name'],
tags_type=tag['type'],
tags_protected=str(tag['protected'])
))
else:
ainformation.append('')
ainformation.append('{purple}Tags:{end} N/A'.format(
purple=ansiprint.purple(),
bold=ansiprint.bold(),
end=ansiprint.end(),
))
# Network list
net_list = []
cluster_net_list = call_api(config, 'get', '/network').json()
for net in domain_information['networks']:
# Split out just the numerical (VNI) part of the brXXXX name
net_vnis = re.findall(r'\d+', net['source'])
if net_vnis:
net_vni = net_vnis[0]
else:
net_vni = re.sub('br', '', net['source'])
response = call_api(config, 'get', '/network/{net}'.format(net=net_vni))
if response.status_code != 200 and net_vni not in ['cluster', 'storage', 'upstream']:
net_list.append(ansiprint.red() + net_vni + ansiprint.end() + ' [invalid]')
net_vni = net['vni']
if net_vni not in ['cluster', 'storage', 'upstream'] and not re.match(r'^macvtap:.*', net_vni) and not re.match(r'^hostdev:.*', net_vni):
if int(net_vni) not in [net['vni'] for net in cluster_net_list]:
net_list.append(ansiprint.red() + net_vni + ansiprint.end() + ' [invalid]')
else:
net_list.append(net_vni)
else:
net_list.append(net_vni)
@ -1213,17 +1452,31 @@ def format_info(config, domain_information, long_output):
width=name_length
))
ainformation.append('')
ainformation.append('{}Interfaces:{} {}ID Type Source Model MAC Data (r/w) Packets (r/w) Errors (r/w){}'.format(ansiprint.purple(), ansiprint.end(), ansiprint.bold(), ansiprint.end()))
ainformation.append('{}Interfaces:{} {}ID Type Source Model MAC Data (r/w) Packets (r/w) Errors (r/w){}'.format(ansiprint.purple(), ansiprint.end(), ansiprint.bold(), ansiprint.end()))
for net in domain_information['networks']:
ainformation.append(' {0: <3} {1: <7} {2: <10} {3: <8} {4: <18} {5: <12} {6: <15} {7: <12}'.format(
net_type = net['type']
net_source = net['source']
net_mac = net['mac']
if net_type in ['direct', 'hostdev']:
net_model = 'N/A'
net_bytes = 'N/A'
net_packets = 'N/A'
net_errors = 'N/A'
elif net_type in ['bridge']:
net_model = net['model']
net_bytes = '/'.join([str(format_bytes(net.get('rd_bytes', 0))), str(format_bytes(net.get('wr_bytes', 0)))])
net_packets = '/'.join([str(format_metric(net.get('rd_packets', 0))), str(format_metric(net.get('wr_packets', 0)))])
net_errors = '/'.join([str(format_metric(net.get('rd_errors', 0))), str(format_metric(net.get('wr_errors', 0)))])
ainformation.append(' {0: <3} {1: <8} {2: <12} {3: <8} {4: <18} {5: <12} {6: <15} {7: <12}'.format(
domain_information['networks'].index(net),
net['type'],
net['source'],
net['model'],
net['mac'],
'/'.join([str(format_bytes(net.get('rd_bytes', 0))), str(format_bytes(net.get('wr_bytes', 0)))]),
'/'.join([str(format_metric(net.get('rd_packets', 0))), str(format_metric(net.get('wr_packets', 0)))]),
'/'.join([str(format_metric(net.get('rd_errors', 0))), str(format_metric(net.get('wr_errors', 0)))]),
net_type,
net_source,
net_model,
net_mac,
net_bytes,
net_packets,
net_errors
))
# Controller list
ainformation.append('')
@ -1242,15 +1495,17 @@ def format_list(config, vm_list, raw):
# Network list
net_list = []
for net in domain_information['networks']:
# Split out just the numerical (VNI) part of the brXXXX name
net_vnis = re.findall(r'\d+', net['source'])
if net_vnis:
net_vni = net_vnis[0]
else:
net_vni = re.sub('br', '', net['source'])
net_list.append(net_vni)
net_list.append(net['vni'])
return net_list
# Function to get tag names and returna nicer list
def getNiceTagName(domain_information):
# Tag list
tag_list = []
for tag in sorted(domain_information['tags'], key=lambda t: t['type'] + t['name']):
tag_list.append(tag['name'])
return tag_list
# Handle raw mode since it just lists the names
if raw:
ainformation = list()
@ -1263,15 +1518,16 @@ def format_list(config, vm_list, raw):
# Determine optimal column widths
# Dynamic columns: node_name, node, migrated
vm_name_length = 5
vm_uuid_length = 37
vm_state_length = 6
vm_tags_length = 5
vm_nets_length = 9
vm_ram_length = 8
vm_vcpu_length = 6
vm_node_length = 8
vm_migrated_length = 10
vm_migrated_length = 9
for domain_information in vm_list:
net_list = getNiceNetID(domain_information)
tag_list = getNiceTagName(domain_information)
# vm_name column
_vm_name_length = len(domain_information['name']) + 1
if _vm_name_length > vm_name_length:
@ -1280,6 +1536,10 @@ def format_list(config, vm_list, raw):
_vm_state_length = len(domain_information['state']) + 1
if _vm_state_length > vm_state_length:
vm_state_length = _vm_state_length
# vm_tags column
_vm_tags_length = len(','.join(tag_list)) + 1
if _vm_tags_length > vm_tags_length:
vm_tags_length = _vm_tags_length
# vm_nets column
_vm_nets_length = len(','.join(net_list)) + 1
if _vm_nets_length > vm_nets_length:
@ -1295,15 +1555,29 @@ def format_list(config, vm_list, raw):
# Format the string (header)
vm_list_output.append(
'{bold}{vm_name: <{vm_name_length}} {vm_uuid: <{vm_uuid_length}} \
'{bold}{vm_header: <{vm_header_length}} {resource_header: <{resource_header_length}} {node_header: <{node_header_length}}{end_bold}'.format(
vm_header_length=vm_name_length + vm_state_length + vm_tags_length + 2,
resource_header_length=vm_nets_length + vm_ram_length + vm_vcpu_length + 2,
node_header_length=vm_node_length + vm_migrated_length + 1,
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
vm_header='VMs ' + ''.join(['-' for _ in range(4, vm_name_length + vm_state_length + vm_tags_length + 1)]),
resource_header='Resources ' + ''.join(['-' for _ in range(10, vm_nets_length + vm_ram_length + vm_vcpu_length + 1)]),
node_header='Node ' + ''.join(['-' for _ in range(5, vm_node_length + vm_migrated_length)])
)
)
vm_list_output.append(
'{bold}{vm_name: <{vm_name_length}} \
{vm_state_colour}{vm_state: <{vm_state_length}}{end_colour} \
{vm_tags: <{vm_tags_length}} \
{vm_networks: <{vm_nets_length}} \
{vm_memory: <{vm_ram_length}} {vm_vcpu: <{vm_vcpu_length}} \
{vm_node: <{vm_node_length}} \
{vm_migrated: <{vm_migrated_length}}{end_bold}'.format(
vm_name_length=vm_name_length,
vm_uuid_length=vm_uuid_length,
vm_state_length=vm_state_length,
vm_tags_length=vm_tags_length,
vm_nets_length=vm_nets_length,
vm_ram_length=vm_ram_length,
vm_vcpu_length=vm_vcpu_length,
@ -1314,20 +1588,21 @@ def format_list(config, vm_list, raw):
vm_state_colour='',
end_colour='',
vm_name='Name',
vm_uuid='UUID',
vm_state='State',
vm_tags='Tags',
vm_networks='Networks',
vm_memory='RAM (M)',
vm_vcpu='vCPUs',
vm_node='Node',
vm_node='Current',
vm_migrated='Migrated'
)
)
# Keep track of nets we found to be valid to cut down on duplicate API hits
valid_net_list = []
# Get a list of cluster networks for validity comparisons
cluster_net_list = call_api(config, 'get', '/network').json()
# Format the string (elements)
for domain_information in vm_list:
for domain_information in sorted(vm_list, key=lambda v: v['name']):
if domain_information['state'] == 'start':
vm_state_colour = ansiprint.green()
elif domain_information['state'] == 'restart':
@ -1342,29 +1617,27 @@ def format_list(config, vm_list, raw):
vm_state_colour = ansiprint.blue()
# Handle colouring for an invalid network config
raw_net_list = getNiceNetID(domain_information)
net_list = []
net_list = getNiceNetID(domain_information)
tag_list = getNiceTagName(domain_information)
if len(tag_list) < 1:
tag_list = ['N/A']
vm_net_colour = ''
for net_vni in raw_net_list:
if net_vni not in valid_net_list:
response = call_api(config, 'get', '/network/{net}'.format(net=net_vni))
if response.status_code != 200 and net_vni not in ['cluster', 'storage', 'upstream']:
for net_vni in net_list:
if net_vni not in ['cluster', 'storage', 'upstream'] and not re.match(r'^macvtap:.*', net_vni) and not re.match(r'^hostdev:.*', net_vni):
if int(net_vni) not in [net['vni'] for net in cluster_net_list]:
vm_net_colour = ansiprint.red()
else:
valid_net_list.append(net_vni)
net_list.append(net_vni)
vm_list_output.append(
'{bold}{vm_name: <{vm_name_length}} {vm_uuid: <{vm_uuid_length}} \
'{bold}{vm_name: <{vm_name_length}} \
{vm_state_colour}{vm_state: <{vm_state_length}}{end_colour} \
{vm_tags: <{vm_tags_length}} \
{vm_net_colour}{vm_networks: <{vm_nets_length}}{end_colour} \
{vm_memory: <{vm_ram_length}} {vm_vcpu: <{vm_vcpu_length}} \
{vm_node: <{vm_node_length}} \
{vm_migrated: <{vm_migrated_length}}{end_bold}'.format(
vm_name_length=vm_name_length,
vm_uuid_length=vm_uuid_length,
vm_state_length=vm_state_length,
vm_tags_length=vm_tags_length,
vm_nets_length=vm_nets_length,
vm_ram_length=vm_ram_length,
vm_vcpu_length=vm_vcpu_length,
@ -1375,8 +1648,8 @@ def format_list(config, vm_list, raw):
vm_state_colour=vm_state_colour,
end_colour=ansiprint.end(),
vm_name=domain_information['name'],
vm_uuid=domain_information['uuid'],
vm_state=domain_information['state'],
vm_tags=','.join(tag_list),
vm_net_colour=vm_net_colour,
vm_networks=','.join(net_list),
vm_memory=domain_information['memory'],
@ -1386,4 +1659,4 @@ def format_list(config, vm_list, raw):
)
)
return '\n'.join(sorted(vm_list_output))
return '\n'.join(vm_list_output)

View File

@ -34,16 +34,17 @@ from distutils.util import strtobool
from functools import wraps
import cli_lib.ansiprint as ansiprint
import cli_lib.cluster as pvc_cluster
import cli_lib.node as pvc_node
import cli_lib.vm as pvc_vm
import cli_lib.network as pvc_network
import cli_lib.ceph as pvc_ceph
import cli_lib.provisioner as pvc_provisioner
import pvc.cli_lib.ansiprint as ansiprint
import pvc.cli_lib.cluster as pvc_cluster
import pvc.cli_lib.node as pvc_node
import pvc.cli_lib.vm as pvc_vm
import pvc.cli_lib.network as pvc_network
import pvc.cli_lib.ceph as pvc_ceph
import pvc.cli_lib.provisioner as pvc_provisioner
myhostname = socket.gethostname().split('.')[0]
zk_host = ''
is_completion = True if os.environ.get('_PVC_COMPLETE', '') == 'complete' else False
default_store_data = {
'cfgfile': '/etc/pvc/pvcapid.yaml'
@ -133,32 +134,31 @@ def update_store(store_path, store_data):
fh.write(json.dumps(store_data, sort_keys=True, indent=4))
pvc_client_dir = os.environ.get('PVC_CLIENT_DIR', None)
home_dir = os.environ.get('HOME', None)
if pvc_client_dir:
store_path = '{}'.format(pvc_client_dir)
elif home_dir:
store_path = '{}/.config/pvc'.format(home_dir)
else:
print('WARNING: No client or home config dir found, using /tmp instead')
store_path = '/tmp/pvc'
if not is_completion:
pvc_client_dir = os.environ.get('PVC_CLIENT_DIR', None)
home_dir = os.environ.get('HOME', None)
if pvc_client_dir:
store_path = '{}'.format(pvc_client_dir)
elif home_dir:
store_path = '{}/.config/pvc'.format(home_dir)
else:
print('WARNING: No client or home config dir found, using /tmp instead')
store_path = '/tmp/pvc'
if not os.path.isdir(store_path):
os.makedirs(store_path)
if not os.path.isfile(store_path + '/pvc-cli.json'):
update_store(store_path, {"local": default_store_data})
if not os.path.isdir(store_path):
os.makedirs(store_path)
if not os.path.isfile(store_path + '/pvc-cli.json'):
update_store(store_path, {"local": default_store_data})
CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help'], max_content_width=120)
def cleanup(retcode, retmsg):
if retmsg != '':
click.echo(retmsg)
if retcode is True:
if retmsg != '':
click.echo(retmsg)
exit(0)
else:
if retmsg != '':
click.echo(retmsg)
exit(1)
@ -251,7 +251,11 @@ def cluster_remove(name):
# pvc cluster list
###############################################################################
@click.command(name='list', short_help='List all available clusters.')
def cluster_list():
@click.option(
'-r', '--raw', 'raw', is_flag=True, default=False,
help='Display the raw list of cluster names only.'
)
def cluster_list(raw):
"""
List all the available PVC clusters configured in this CLI instance.
"""
@ -302,27 +306,28 @@ def cluster_list():
if _api_key_length > api_key_length:
api_key_length = _api_key_length
# Display the data nicely
click.echo("Available clusters:")
click.echo()
click.echo(
'{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
name="Name",
name_length=name_length,
description="Description",
description_length=description_length,
address="Address",
address_length=address_length,
port="Port",
port_length=port_length,
scheme="Scheme",
scheme_length=scheme_length,
api_key="API Key",
api_key_length=api_key_length
if not raw:
# Display the data nicely
click.echo("Available clusters:")
click.echo()
click.echo(
'{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}'.format(
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
name="Name",
name_length=name_length,
description="Description",
description_length=description_length,
address="Address",
address_length=address_length,
port="Port",
port_length=port_length,
scheme="Scheme",
scheme_length=scheme_length,
api_key="API Key",
api_key_length=api_key_length
)
)
)
for cluster in clusters:
cluster_details = clusters[cluster]
@ -341,24 +346,27 @@ def cluster_list():
if not api_key:
api_key = 'N/A'
click.echo(
'{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}'.format(
bold='',
end_bold='',
name=cluster,
name_length=name_length,
description=description,
description_length=description_length,
address=address,
address_length=address_length,
port=port,
port_length=port_length,
scheme=scheme,
scheme_length=scheme_length,
api_key=api_key,
api_key_length=api_key_length
if not raw:
click.echo(
'{bold}{name: <{name_length}} {description: <{description_length}} {address: <{address_length}} {port: <{port_length}} {scheme: <{scheme_length}} {api_key: <{api_key_length}}{end_bold}'.format(
bold='',
end_bold='',
name=cluster,
name_length=name_length,
description=description,
description_length=description_length,
address=address,
address_length=address_length,
port=port,
port_length=port_length,
scheme=scheme,
scheme_length=scheme_length,
api_key=api_key,
api_key_length=api_key_length
)
)
)
else:
click.echo(cluster)
# Validate that the cluster is set for a given command
@ -532,6 +540,43 @@ def node_unflush(node, wait):
cleanup(retcode, retmsg)
###############################################################################
# pvc node log
###############################################################################
@click.command(name='log', short_help='Show logs of a node.')
@click.argument(
'node'
)
@click.option(
'-l', '--lines', 'lines', default=None, show_default=False,
help='Display this many log lines from the end of the log buffer. [default: 1000; with follow: 10]'
)
@click.option(
'-f', '--follow', 'follow', is_flag=True, default=False,
help='Follow the log buffer; output may be delayed by a few seconds relative to the live system. The --lines value defaults to 10 for the initial output.'
)
@cluster_req
def node_log(node, lines, follow):
"""
Show node logs of virtual machine DOMAIN on its current node in a pager or continuously. DOMAIN may be a UUID or name. Note that migrating a VM to a different node will cause the log buffer to be overwritten by entries from the new node.
"""
# Set the default here so we can handle it
if lines is None:
if follow:
lines = 10
else:
lines = 1000
if follow:
retcode, retmsg = pvc_node.follow_node_log(config, node, lines)
else:
retcode, retmsg = pvc_node.view_node_log(config, node, lines)
click.echo_via_pager(retmsg)
retmsg = ''
cleanup(retcode, retmsg)
###############################################################################
# pvc node info
###############################################################################
@ -630,11 +675,21 @@ def cli_vm():
type=click.Choice(['none', 'live', 'shutdown']),
help='The preferred migration method of the VM between nodes; saved with VM.'
)
@click.option(
'-g', '--tag', 'user_tags',
default=[], multiple=True,
help='User tag for the VM; can be specified multiple times, once per tag.'
)
@click.option(
'-G', '--protected-tag', 'protected_tags',
default=[], multiple=True,
help='Protected user tag for the VM; can be specified multiple times, once per tag.'
)
@click.argument(
'vmconfig', type=click.File()
)
@cluster_req
def vm_define(vmconfig, target_node, node_limit, node_selector, node_autostart, migration_method):
def vm_define(vmconfig, target_node, node_limit, node_selector, node_autostart, migration_method, user_tags, protected_tags):
"""
Define a new virtual machine from Libvirt XML configuration file VMCONFIG.
"""
@ -650,7 +705,7 @@ def vm_define(vmconfig, target_node, node_limit, node_selector, node_autostart,
except Exception:
cleanup(False, 'Error: XML is malformed or invalid')
retcode, retmsg = pvc_vm.vm_define(config, new_cfg, target_node, node_limit, node_selector, node_autostart, migration_method)
retcode, retmsg = pvc_vm.vm_define(config, new_cfg, target_node, node_limit, node_selector, node_autostart, migration_method, user_tags, protected_tags)
cleanup(retcode, retmsg)
@ -674,7 +729,7 @@ def vm_define(vmconfig, target_node, node_limit, node_selector, node_autostart,
@click.option(
'-m', '--method', 'migration_method', default='none', show_default=True,
type=click.Choice(['none', 'live', 'shutdown']),
help='The preferred migration method of the VM between nodes; saved with VM.'
help='The preferred migration method of the VM between nodes.'
)
@click.option(
'-p', '--profile', 'provisioner_profile', default=None, show_default=False,
@ -709,9 +764,19 @@ def vm_meta(domain, node_limit, node_selector, node_autostart, migration_method,
help='Immediately restart VM to apply new config.'
)
@click.option(
'-y', '--yes', 'confirm_flag',
'-d', '--confirm-diff', 'confirm_diff_flag',
is_flag=True, default=False,
help='Confirm the restart'
help='Confirm the diff.'
)
@click.option(
'-c', '--confirm-restart', 'confirm_restart_flag',
is_flag=True, default=False,
help='Confirm the restart.'
)
@click.option(
'-y', '--yes', 'confirm_all_flag',
is_flag=True, default=False,
help='Confirm the diff and the restart.'
)
@click.argument(
'domain'
@ -719,7 +784,7 @@ def vm_meta(domain, node_limit, node_selector, node_autostart, migration_method,
@click.argument(
'cfgfile', type=click.File(), default=None, required=False
)
def vm_modify(domain, cfgfile, editor, restart, confirm_flag):
def vm_modify(domain, cfgfile, editor, restart, confirm_diff_flag, confirm_restart_flag, confirm_all_flag):
"""
Modify existing virtual machine DOMAIN, either in-editor or with replacement CONFIG. DOMAIN may be a UUID or name.
"""
@ -733,12 +798,12 @@ def vm_modify(domain, cfgfile, editor, restart, confirm_flag):
dom_name = vm_information.get('name')
if editor is True:
# Grab the current config
current_vm_cfg_raw = vm_information.get('xml')
xml_data = etree.fromstring(current_vm_cfg_raw)
current_vm_cfgfile = etree.tostring(xml_data, pretty_print=True).decode('utf8').strip()
# Grab the current config
current_vm_cfg_raw = vm_information.get('xml')
xml_data = etree.fromstring(current_vm_cfg_raw)
current_vm_cfgfile = etree.tostring(xml_data, pretty_print=True).decode('utf8').strip()
if editor is True:
new_vm_cfgfile = click.edit(text=current_vm_cfgfile, require_save=True, extension='.xml')
if new_vm_cfgfile is None:
click.echo('Aborting with no modifications.')
@ -776,9 +841,10 @@ def vm_modify(domain, cfgfile, editor, restart, confirm_flag):
except Exception as e:
cleanup(False, 'Error: XML is malformed or invalid: {}'.format(e))
click.confirm('Write modifications to cluster?', abort=True)
if not confirm_diff_flag and not confirm_all_flag and not config['unsafe']:
click.confirm('Write modifications to cluster?', abort=True)
if restart and not confirm_flag and not config['unsafe']:
if restart and not confirm_restart_flag and not confirm_all_flag and not config['unsafe']:
try:
click.confirm('Restart VM {}'.format(domain), prompt_suffix='? ', abort=True)
except Exception:
@ -1103,6 +1169,90 @@ def vm_flush_locks(domain):
cleanup(retcode, retmsg)
###############################################################################
# pvc vm tag
###############################################################################
@click.group(name='tag', short_help='Manage tags of a virtual machine.', context_settings=CONTEXT_SETTINGS)
def vm_tags():
"""
Manage the tags of a virtual machine in the PVC cluster."
"""
pass
###############################################################################
# pvc vm tag get
###############################################################################
@click.command(name='get', short_help='Get the current tags of a virtual machine.')
@click.argument(
'domain'
)
@click.option(
'-r', '--raw', 'raw', is_flag=True, default=False,
help='Display the raw value only without formatting.'
)
@cluster_req
def vm_tags_get(domain, raw):
"""
Get the current tags of the virtual machine DOMAIN.
"""
retcode, retdata = pvc_vm.vm_tags_get(config, domain)
if retcode:
if not raw:
retdata = pvc_vm.format_vm_tags(config, domain, retdata['tags'])
else:
if len(retdata['tags']) > 0:
retdata = '\n'.join([tag['name'] for tag in retdata['tags']])
else:
retdata = 'No tags found.'
cleanup(retcode, retdata)
###############################################################################
# pvc vm tag add
###############################################################################
@click.command(name='add', short_help='Add new tags to a virtual machine.')
@click.argument(
'domain'
)
@click.argument(
'tag'
)
@click.option(
'-p', '--protected', 'protected', is_flag=True, required=False, default=False,
help="Set this tag as protected; protected tags cannot be removed."
)
@cluster_req
def vm_tags_add(domain, tag, protected):
"""
Add TAG to the virtual machine DOMAIN.
"""
retcode, retmsg = pvc_vm.vm_tag_set(config, domain, 'add', tag, protected)
cleanup(retcode, retmsg)
###############################################################################
# pvc vm tag remove
###############################################################################
@click.command(name='remove', short_help='Remove tags from a virtual machine.')
@click.argument(
'domain'
)
@click.argument(
'tag'
)
@cluster_req
def vm_tags_remove(domain, tag):
"""
Remove TAG from the virtual machine DOMAIN.
"""
retcode, retmsg = pvc_vm.vm_tag_set(config, domain, 'remove', tag)
cleanup(retcode, retmsg)
###############################################################################
# pvc vm vcpu
###############################################################################
@ -1311,15 +1461,24 @@ def vm_network_get(domain, raw):
'domain'
)
@click.argument(
'vni'
'net'
)
@click.option(
'-a', '--macaddr', 'macaddr', default=None,
help='Use this MAC address instead of random generation; must be a valid MAC address in colon-deliniated format.'
help='Use this MAC address instead of random generation; must be a valid MAC address in colon-delimited format.'
)
@click.option(
'-m', '--model', 'model', default='virtio',
help='The model for the interface; must be a valid libvirt model.'
'-m', '--model', 'model', default='virtio', show_default=True,
help='The model for the interface; must be a valid libvirt model. Not used for "netdev" SR-IOV NETs.'
)
@click.option(
'-s', '--sriov', 'sriov', is_flag=True, default=False,
help='Identify that NET is an SR-IOV device name and not a VNI. Required for adding SR-IOV NETs.'
)
@click.option(
'-d', '--sriov-mode', 'sriov_mode', default='macvtap', show_default=True,
type=click.Choice(['hostdev', 'macvtap']),
help='For SR-IOV NETs, the SR-IOV network device mode.'
)
@click.option(
'-r', '--restart', 'restart', is_flag=True, default=False,
@ -1331,9 +1490,18 @@ def vm_network_get(domain, raw):
help='Confirm the restart'
)
@cluster_req
def vm_network_add(domain, vni, macaddr, model, restart, confirm_flag):
def vm_network_add(domain, net, macaddr, model, sriov, sriov_mode, restart, confirm_flag):
"""
Add the network VNI to the virtual machine DOMAIN. Networks are always addded to the end of the current list of networks in the virtual machine.
Add the network NET to the virtual machine DOMAIN. Networks are always addded to the end of the current list of networks in the virtual machine.
NET may be a PVC network VNI, which is added as a bridged device, or a SR-IOV VF device connected in the given mode.
NOTE: Adding a SR-IOV network device in the "hostdev" mode has the following caveats:
1. The VM will not be able to be live migrated; it must be shut down to migrate between nodes. The VM metadata will be updated to force this.
2. If an identical SR-IOV VF device is not present on the target node, post-migration startup will fail. It may be prudent to use a node limit here.
"""
if restart and not confirm_flag and not config['unsafe']:
try:
@ -1341,7 +1509,7 @@ def vm_network_add(domain, vni, macaddr, model, restart, confirm_flag):
except Exception:
restart = False
retcode, retmsg = pvc_vm.vm_networks_add(config, domain, vni, macaddr, model, restart)
retcode, retmsg = pvc_vm.vm_networks_add(config, domain, net, macaddr, model, sriov, sriov_mode, restart)
if retcode and not restart:
retmsg = retmsg + " Changes will be applied on next VM start/restart."
cleanup(retcode, retmsg)
@ -1355,7 +1523,11 @@ def vm_network_add(domain, vni, macaddr, model, restart, confirm_flag):
'domain'
)
@click.argument(
'vni'
'net'
)
@click.option(
'-s', '--sriov', 'sriov', is_flag=True, default=False,
help='Identify that NET is an SR-IOV device name and not a VNI. Required for removing SR-IOV NETs.'
)
@click.option(
'-r', '--restart', 'restart', is_flag=True, default=False,
@ -1367,9 +1539,11 @@ def vm_network_add(domain, vni, macaddr, model, restart, confirm_flag):
help='Confirm the restart'
)
@cluster_req
def vm_network_remove(domain, vni, restart, confirm_flag):
def vm_network_remove(domain, net, sriov, restart, confirm_flag):
"""
Remove the network VNI to the virtual machine DOMAIN.
Remove the network NET from the virtual machine DOMAIN.
NET may be a PVC network VNI, which is added as a bridged device, or a SR-IOV VF device connected in the given mode.
"""
if restart and not confirm_flag and not config['unsafe']:
try:
@ -1377,7 +1551,7 @@ def vm_network_remove(domain, vni, restart, confirm_flag):
except Exception:
restart = False
retcode, retmsg = pvc_vm.vm_networks_remove(config, domain, vni, restart)
retcode, retmsg = pvc_vm.vm_networks_remove(config, domain, net, sriov, restart)
if retcode and not restart:
retmsg = retmsg + " Changes will be applied on next VM start/restart."
cleanup(retcode, retmsg)
@ -1484,7 +1658,7 @@ def vm_volume_add(domain, volume, disk_id, bus, disk_type, restart, confirm_flag
'domain'
)
@click.argument(
'vni'
'volume'
)
@click.option(
'-r', '--restart', 'restart', is_flag=True, default=False,
@ -1496,9 +1670,9 @@ def vm_volume_add(domain, volume, disk_id, bus, disk_type, restart, confirm_flag
help='Confirm the restart'
)
@cluster_req
def vm_volume_remove(domain, vni, restart, confirm_flag):
def vm_volume_remove(domain, volume, restart, confirm_flag):
"""
Remove the volume VNI to the virtual machine DOMAIN.
Remove VOLUME from the virtual machine DOMAIN; VOLUME must be a file path or RBD path in 'pool/volume' format.
"""
if restart and not confirm_flag and not config['unsafe']:
try:
@ -1506,7 +1680,7 @@ def vm_volume_remove(domain, vni, restart, confirm_flag):
except Exception:
restart = False
retcode, retmsg = pvc_vm.vm_volumes_remove(config, domain, vni, restart)
retcode, retmsg = pvc_vm.vm_volumes_remove(config, domain, volume, restart)
if retcode and not restart:
retmsg = retmsg + " Changes will be applied on next VM start/restart."
cleanup(retcode, retmsg)
@ -1576,24 +1750,34 @@ def vm_info(domain, long_output):
# pvc vm dump
###############################################################################
@click.command(name='dump', short_help='Dump a virtual machine XML to stdout.')
@click.option(
'-f', '--file', 'filename',
default=None, type=click.File(mode='w'),
help='Write VM XML to this file.'
)
@click.argument(
'domain'
)
@cluster_req
def vm_dump(domain):
def vm_dump(filename, domain):
"""
Dump the Libvirt XML definition of virtual machine DOMAIN to stdout. DOMAIN may be a UUID or name.
"""
retcode, vm_information = pvc_vm.vm_info(config, domain)
if not retcode or not vm_information.get('name', None):
retcode, retdata = pvc_vm.vm_info(config, domain)
if not retcode or not retdata.get('name', None):
cleanup(False, 'ERROR: Could not find VM "{}"!'.format(domain))
# Grab the current config
current_vm_cfg_raw = vm_information.get('xml')
current_vm_cfg_raw = retdata.get('xml')
xml_data = etree.fromstring(current_vm_cfg_raw)
current_vm_cfgfile = etree.tostring(xml_data, pretty_print=True).decode('utf8')
click.echo(current_vm_cfgfile.strip())
xml = current_vm_cfgfile.strip()
if filename is not None:
filename.write(xml)
cleanup(retcode, 'VM XML written to "{}".'.format(filename.name))
else:
cleanup(retcode, xml)
###############################################################################
@ -1611,19 +1795,23 @@ def vm_dump(domain):
'-s', '--state', 'target_state', default=None,
help='Limit list to VMs in the specified state.'
)
@click.option(
'-g', '--tag', 'target_tag', default=None,
help='Limit list to VMs with the specified tag.'
)
@click.option(
'-r', '--raw', 'raw', is_flag=True, default=False,
help='Display the raw list of VM names only.'
)
@cluster_req
def vm_list(target_node, target_state, limit, raw):
def vm_list(target_node, target_state, target_tag, limit, raw):
"""
List all virtual machines; optionally only match names matching regex LIMIT.
List all virtual machines; optionally only match names or full UUIDs matching regex LIMIT.
NOTE: Red-coloured network lists indicate one or more configured networks are missing/invalid.
"""
retcode, retdata = pvc_vm.vm_list(config, limit, target_node, target_state)
retcode, retdata = pvc_vm.vm_list(config, limit, target_node, target_state, target_tag)
if retcode:
retdata = pvc_vm.format_list(config, retdata, raw)
else:
@ -2093,6 +2281,154 @@ def net_acl_list(net, limit, direction):
cleanup(retcode, retdata)
###############################################################################
# pvc network sriov
###############################################################################
@click.group(name='sriov', short_help='Manage SR-IOV network resources.', context_settings=CONTEXT_SETTINGS)
def net_sriov():
"""
Manage SR-IOV network resources on nodes (PFs and VFs).
"""
pass
###############################################################################
# pvc network sriov pf
###############################################################################
@click.group(name='pf', short_help='Manage PF devices.', context_settings=CONTEXT_SETTINGS)
def net_sriov_pf():
"""
Manage SR-IOV PF devices on nodes.
"""
pass
###############################################################################
# pvc network sriov pf list
###############################################################################
@click.command(name='list', short_help='List PF devices.')
@click.argument(
'node'
)
@cluster_req
def net_sriov_pf_list(node):
"""
List all SR-IOV PFs on NODE.
"""
retcode, retdata = pvc_network.net_sriov_pf_list(config, node)
if retcode:
retdata = pvc_network.format_list_sriov_pf(retdata)
cleanup(retcode, retdata)
###############################################################################
# pvc network sriov vf
###############################################################################
@click.group(name='vf', short_help='Manage VF devices.', context_settings=CONTEXT_SETTINGS)
def net_sriov_vf():
"""
Manage SR-IOV VF devices on nodes.
"""
pass
###############################################################################
# pvc network sriov vf set
###############################################################################
@click.command(name='set', short_help='Set VF device properties.')
@click.option(
'--vlan-id', 'vlan_id', default=None, show_default=False,
help='The vLAN ID for vLAN tagging.'
)
@click.option(
'--qos-prio', 'vlan_qos', default=None, show_default=False,
help='The vLAN QOS priority.'
)
@click.option(
'--tx-min', 'tx_rate_min', default=None, show_default=False,
help='The minimum TX rate.'
)
@click.option(
'--tx-max', 'tx_rate_max', default=None, show_default=False,
help='The maximum TX rate.'
)
@click.option(
'--link-state', 'link_state', default=None, show_default=False,
type=click.Choice(['auto', 'enable', 'disable']),
help='The administrative link state.'
)
@click.option(
'--spoof-check/--no-spoof-check', 'spoof_check', is_flag=True, default=None, show_default=False,
help='Enable or disable spoof checking.'
)
@click.option(
'--trust/--no-trust', 'trust', is_flag=True, default=None, show_default=False,
help='Enable or disable VF user trust.'
)
@click.option(
'--query-rss/--no-query-rss', 'query_rss', is_flag=True, default=None, show_default=False,
help='Enable or disable query RSS support.'
)
@click.argument(
'node'
)
@click.argument(
'vf'
)
@cluster_req
def net_sriov_vf_set(node, vf, vlan_id, vlan_qos, tx_rate_min, tx_rate_max, link_state, spoof_check, trust, query_rss):
"""
Set a property of SR-IOV VF on NODE.
"""
if vlan_id is None and vlan_qos is None and tx_rate_min is None and tx_rate_max is None and link_state is None and spoof_check is None and trust is None and query_rss is None:
cleanup(False, 'At least one configuration property must be specified to update.')
retcode, retmsg = pvc_network.net_sriov_vf_set(config, node, vf, vlan_id, vlan_qos, tx_rate_min, tx_rate_max, link_state, spoof_check, trust, query_rss)
cleanup(retcode, retmsg)
###############################################################################
# pvc network sriov vf list
###############################################################################
@click.command(name='list', short_help='List VF devices.')
@click.argument(
'node'
)
@click.argument(
'pf', default=None, required=False
)
@cluster_req
def net_sriov_vf_list(node, pf):
"""
List all SR-IOV VFs on NODE, optionally limited to device PF.
"""
retcode, retdata = pvc_network.net_sriov_vf_list(config, node, pf)
if retcode:
retdata = pvc_network.format_list_sriov_vf(retdata)
cleanup(retcode, retdata)
###############################################################################
# pvc network sriov vf info
###############################################################################
@click.command(name='info', short_help='List VF devices.')
@click.argument(
'node'
)
@click.argument(
'vf'
)
@cluster_req
def net_sriov_vf_info(node, vf):
"""
Show details of the SR-IOV VF on NODE.
"""
retcode, retdata = pvc_network.net_sriov_vf_info(config, node, vf)
if retcode:
retdata = pvc_network.format_info_sriov_vf(config, retdata, node)
cleanup(retcode, retdata)
###############################################################################
# pvc storage
###############################################################################
@ -4252,7 +4588,7 @@ def cli_task():
@click.command(name='backup', short_help='Create JSON backup of cluster.')
@click.option(
'-f', '--file', 'filename',
default=None, type=click.File(),
default=None, type=click.File(mode='w'),
help='Write backup data to this file.'
)
@cluster_req
@ -4262,11 +4598,14 @@ def task_backup(filename):
"""
retcode, retdata = pvc_cluster.backup(config)
if filename:
with open(filename, 'wb') as fh:
fh.write(retdata)
retdata = 'Data written to {}'.format(filename)
cleanup(retcode, retdata)
if retcode:
if filename is not None:
json.dump(json.loads(retdata), filename)
cleanup(retcode, 'Backup written to "{}".'.format(filename.name))
else:
cleanup(retcode, retdata)
else:
cleanup(retcode, retdata)
###############################################################################
@ -4305,7 +4644,7 @@ def task_restore(filename, confirm_flag):
###############################################################################
@click.command(name='init', short_help='Initialize a new cluster.')
@click.option(
'-o', '--overwite', 'overwrite_flag',
'-o', '--overwrite', 'overwrite_flag',
is_flag=True, default=False,
help='Remove and overwrite any existing data'
)
@ -4416,9 +4755,14 @@ cli_node.add_command(node_primary)
cli_node.add_command(node_flush)
cli_node.add_command(node_ready)
cli_node.add_command(node_unflush)
cli_node.add_command(node_log)
cli_node.add_command(node_info)
cli_node.add_command(node_list)
vm_tags.add_command(vm_tags_get)
vm_tags.add_command(vm_tags_add)
vm_tags.add_command(vm_tags_remove)
vm_vcpu.add_command(vm_vcpu_get)
vm_vcpu.add_command(vm_vcpu_set)
@ -4449,6 +4793,7 @@ cli_vm.add_command(vm_move)
cli_vm.add_command(vm_migrate)
cli_vm.add_command(vm_unmigrate)
cli_vm.add_command(vm_flush_locks)
cli_vm.add_command(vm_tags)
cli_vm.add_command(vm_vcpu)
cli_vm.add_command(vm_memory)
cli_vm.add_command(vm_network)
@ -4464,6 +4809,7 @@ cli_network.add_command(net_info)
cli_network.add_command(net_list)
cli_network.add_command(net_dhcp)
cli_network.add_command(net_acl)
cli_network.add_command(net_sriov)
net_dhcp.add_command(net_dhcp_list)
net_dhcp.add_command(net_dhcp_add)
@ -4473,6 +4819,15 @@ net_acl.add_command(net_acl_add)
net_acl.add_command(net_acl_remove)
net_acl.add_command(net_acl_list)
net_sriov.add_command(net_sriov_pf)
net_sriov.add_command(net_sriov_vf)
net_sriov_pf.add_command(net_sriov_pf_list)
net_sriov_vf.add_command(net_sriov_vf_list)
net_sriov_vf.add_command(net_sriov_vf_info)
net_sriov_vf.add_command(net_sriov_vf_set)
ceph_benchmark.add_command(ceph_benchmark_run)
ceph_benchmark.add_command(ceph_benchmark_info)
ceph_benchmark.add_command(ceph_benchmark_list)

20
client-cli/setup.py Normal file
View File

@ -0,0 +1,20 @@
from setuptools import setup
setup(
name='pvc',
version='0.9.28',
packages=['pvc', 'pvc.cli_lib'],
install_requires=[
'Click',
'PyYAML',
'lxml',
'colorama',
'requests',
'requests-toolbelt'
],
entry_points={
'console_scripts': [
'pvc = pvc.pvc:cli',
],
},
)

View File

@ -25,6 +25,8 @@ import json
import time
import math
from concurrent.futures import ThreadPoolExecutor
import daemon_lib.vm as vm
import daemon_lib.common as common
@ -35,29 +37,29 @@ import daemon_lib.common as common
# Verify OSD is valid in cluster
def verifyOSD(zkhandler, osd_id):
return zkhandler.exists('/ceph/osds/{}'.format(osd_id))
return zkhandler.exists(('osd', osd_id))
# Verify Pool is valid in cluster
def verifyPool(zkhandler, name):
return zkhandler.exists('/ceph/pools/{}'.format(name))
return zkhandler.exists(('pool', name))
# Verify Volume is valid in cluster
def verifyVolume(zkhandler, pool, name):
return zkhandler.exists('/ceph/volumes/{}/{}'.format(pool, name))
return zkhandler.exists(('volume', f'{pool}/{name}'))
# Verify Snapshot is valid in cluster
def verifySnapshot(zkhandler, pool, volume, name):
return zkhandler.exists('/ceph/snapshots/{}/{}/{}'.format(pool, volume, name))
return zkhandler.exists(('snapshot', f'{pool}/{volume}/{name}'))
# Verify OSD path is valid in cluster
def verifyOSDBlock(zkhandler, node, device):
for osd in zkhandler.children('/ceph/osds'):
osd_node = zkhandler.read('/ceph/osds/{}/node'.format(osd))
osd_device = zkhandler.read('/ceph/osds/{}/device'.format(osd))
for osd in zkhandler.children('base.osd'):
osd_node = zkhandler.read(('osd.node', osd))
osd_device = zkhandler.read(('osd.device', osd))
if node == osd_node and device == osd_device:
return osd
return None
@ -144,8 +146,8 @@ def format_pct_tohuman(datapct):
# Status functions
#
def get_status(zkhandler):
primary_node = zkhandler.read('/config/primary_node')
ceph_status = zkhandler.read('/ceph').rstrip()
primary_node = zkhandler.read('base.config.primary_node')
ceph_status = zkhandler.read('base.storage').rstrip()
# Create a data structure for the information
status_data = {
@ -157,8 +159,8 @@ def get_status(zkhandler):
def get_util(zkhandler):
primary_node = zkhandler.read('/config/primary_node')
ceph_df = zkhandler.read('/ceph/util').rstrip()
primary_node = zkhandler.read('base.config.primary_node')
ceph_df = zkhandler.read('base.storage.util').rstrip()
# Create a data structure for the information
status_data = {
@ -174,12 +176,12 @@ def get_util(zkhandler):
#
def getClusterOSDList(zkhandler):
# Get a list of VNIs by listing the children of /networks
return zkhandler.children('/ceph/osds')
return zkhandler.children('base.osd')
def getOSDInformation(zkhandler, osd_id):
# Parse the stats data
osd_stats_raw = zkhandler.read('/ceph/osds/{}/stats'.format(osd_id))
osd_stats_raw = zkhandler.read(('osd.stats', osd_id))
osd_stats = dict(json.loads(osd_stats_raw))
osd_information = {
@ -203,14 +205,15 @@ def add_osd(zkhandler, node, device, weight):
# Tell the cluster to create a new OSD for the host
add_osd_string = 'osd_add {},{},{}'.format(node, device, weight)
zkhandler.write([('/cmd/ceph', add_osd_string)])
zkhandler.write([
('base.cmd.ceph', add_osd_string)
])
# Wait 1/2 second for the cluster to get the message and start working
time.sleep(0.5)
# Acquire a read lock, so we get the return exclusively
lock = zkhandler.readlock('/cmd/ceph')
with lock:
with zkhandler.readlock('base.cmd.ceph'):
try:
result = zkhandler.read('/cmd/ceph').split()[0]
result = zkhandler.read('base.cmd.ceph').split()[0]
if result == 'success-osd_add':
message = 'Created new OSD with block device "{}" on node "{}".'.format(device, node)
success = True
@ -222,10 +225,11 @@ def add_osd(zkhandler, node, device, weight):
success = False
# Acquire a write lock to ensure things go smoothly
lock = zkhandler.writelock('/cmd/ceph')
with lock:
with zkhandler.writelock('base.cmd.ceph'):
time.sleep(0.5)
zkhandler.write([('/cmd/ceph', '')])
zkhandler.write([
('base.cmd.ceph', '')
])
return success, message
@ -236,14 +240,15 @@ def remove_osd(zkhandler, osd_id):
# Tell the cluster to remove an OSD
remove_osd_string = 'osd_remove {}'.format(osd_id)
zkhandler.write([('/cmd/ceph', remove_osd_string)])
zkhandler.write([
('base.cmd.ceph', remove_osd_string)
])
# Wait 1/2 second for the cluster to get the message and start working
time.sleep(0.5)
# Acquire a read lock, so we get the return exclusively
lock = zkhandler.readlock('/cmd/ceph')
with lock:
with zkhandler.readlock('base.cmd.ceph'):
try:
result = zkhandler.read('/cmd/ceph').split()[0]
result = zkhandler.read('base.cmd.ceph').split()[0]
if result == 'success-osd_remove':
message = 'Removed OSD "{}" from the cluster.'.format(osd_id)
success = True
@ -255,10 +260,11 @@ def remove_osd(zkhandler, osd_id):
message = 'ERROR Command ignored by node.'
# Acquire a write lock to ensure things go smoothly
lock = zkhandler.writelock('/cmd/ceph')
with lock:
with zkhandler.writelock('base.cmd.ceph'):
time.sleep(0.5)
zkhandler.write([('/cmd/ceph', '')])
zkhandler.write([
('base.cmd.ceph', '')
])
return success, message
@ -303,7 +309,7 @@ def unset_osd(zkhandler, option):
def get_list_osd(zkhandler, limit, is_fuzzy=True):
osd_list = []
full_osd_list = zkhandler.children('/ceph/osds')
full_osd_list = zkhandler.children('base.osd')
if is_fuzzy and limit:
# Implicitly assume fuzzy limits
@ -330,7 +336,7 @@ def get_list_osd(zkhandler, limit, is_fuzzy=True):
#
def getPoolInformation(zkhandler, pool):
# Parse the stats data
pool_stats_raw = zkhandler.read('/ceph/pools/{}/stats'.format(pool))
pool_stats_raw = zkhandler.read(('pool.stats', pool))
pool_stats = dict(json.loads(pool_stats_raw))
volume_count = len(getCephVolumes(zkhandler, pool))
@ -375,11 +381,11 @@ def add_pool(zkhandler, name, pgs, replcfg):
# 4. Add the new pool to Zookeeper
zkhandler.write([
('/ceph/pools/{}'.format(name), ''),
('/ceph/pools/{}/pgs'.format(name), pgs),
('/ceph/pools/{}/stats'.format(name), '{}'),
('/ceph/volumes/{}'.format(name), ''),
('/ceph/snapshots/{}'.format(name), ''),
(('pool', name), ''),
(('pool.pgs', name), pgs),
(('pool.stats', name), '{}'),
(('volume', name), ''),
(('snapshot', name), ''),
])
return True, 'Created RBD pool "{}" with {} PGs'.format(name, pgs)
@ -390,7 +396,7 @@ def remove_pool(zkhandler, name):
return False, 'ERROR: No pool with name "{}" is present in the cluster.'.format(name)
# 1. Remove pool volumes
for volume in zkhandler.children('/ceph/volumes/{}'.format(name)):
for volume in zkhandler.children(('volume', name)):
remove_volume(zkhandler, name, volume)
# 2. Remove the pool
@ -399,32 +405,46 @@ def remove_pool(zkhandler, name):
return False, 'ERROR: Failed to remove pool "{}": {}'.format(name, stderr)
# 3. Delete pool from Zookeeper
zkhandler.delete('/ceph/pools/{}'.format(name))
zkhandler.delete('/ceph/volumes/{}'.format(name))
zkhandler.delete('/ceph/snapshots/{}'.format(name))
zkhandler.delete([
('pool', name),
('volume', name),
('snapshot', name),
])
return True, 'Removed RBD pool "{}" and all volumes.'.format(name)
def get_list_pool(zkhandler, limit, is_fuzzy=True):
pool_list = []
full_pool_list = zkhandler.children('/ceph/pools')
full_pool_list = zkhandler.children('base.pool')
if limit:
if not is_fuzzy:
limit = '^' + limit + '$'
get_pool_info = dict()
for pool in full_pool_list:
is_limit_match = False
if limit:
try:
if re.match(limit, pool):
pool_list.append(getPoolInformation(zkhandler, pool))
is_limit_match = True
except Exception as e:
return False, 'Regex Error: {}'.format(e)
else:
pool_list.append(getPoolInformation(zkhandler, pool))
is_limit_match = True
return True, sorted(pool_list, key=lambda x: int(x['stats']['id']))
get_pool_info[pool] = True if is_limit_match else False
pool_execute_list = [pool for pool in full_pool_list if get_pool_info[pool]]
pool_data_list = list()
with ThreadPoolExecutor(max_workers=32, thread_name_prefix='pool_list') as executor:
futures = []
for pool in pool_execute_list:
futures.append(executor.submit(getPoolInformation, zkhandler, pool))
for future in futures:
pool_data_list.append(future.result())
return True, sorted(pool_data_list, key=lambda x: int(x['stats']['id']))
#
@ -433,12 +453,12 @@ def get_list_pool(zkhandler, limit, is_fuzzy=True):
def getCephVolumes(zkhandler, pool):
volume_list = list()
if not pool:
pool_list = zkhandler.children('/ceph/pools')
pool_list = zkhandler.children('base.pool')
else:
pool_list = [pool]
for pool_name in pool_list:
for volume_name in zkhandler.children('/ceph/volumes/{}'.format(pool_name)):
for volume_name in zkhandler.children(('volume', pool_name)):
volume_list.append('{}/{}'.format(pool_name, volume_name))
return volume_list
@ -446,7 +466,7 @@ def getCephVolumes(zkhandler, pool):
def getVolumeInformation(zkhandler, pool, volume):
# Parse the stats data
volume_stats_raw = zkhandler.read('/ceph/volumes/{}/{}/stats'.format(pool, volume))
volume_stats_raw = zkhandler.read(('volume.stats', f'{pool}/{volume}'))
volume_stats = dict(json.loads(volume_stats_raw))
# Format the size to something nicer
volume_stats['size'] = format_bytes_tohuman(volume_stats['size'])
@ -481,9 +501,9 @@ def add_volume(zkhandler, pool, name, size):
# 3. Add the new volume to Zookeeper
zkhandler.write([
('/ceph/volumes/{}/{}'.format(pool, name), ''),
('/ceph/volumes/{}/{}/stats'.format(pool, name), volstats),
('/ceph/snapshots/{}/{}'.format(pool, name), ''),
(('volume', f'{pool}/{name}'), ''),
(('volume.stats', f'{pool}/{name}'), volstats),
(('snapshot', f'{pool}/{name}'), ''),
])
return True, 'Created RBD volume "{}/{}" ({}).'.format(pool, name, size)
@ -504,9 +524,9 @@ def clone_volume(zkhandler, pool, name_src, name_new):
# 3. Add the new volume to Zookeeper
zkhandler.write([
('/ceph/volumes/{}/{}'.format(pool, name_new), ''),
('/ceph/volumes/{}/{}/stats'.format(pool, name_new), volstats),
('/ceph/snapshots/{}/{}'.format(pool, name_new), ''),
(('volume', f'{pool}/{name_new}'), ''),
(('volume.stats', f'{pool}/{name_new}'), volstats),
(('snapshot', f'{pool}/{name_new}'), ''),
])
return True, 'Cloned RBD volume "{}" to "{}" in pool "{}"'.format(name_src, name_new, pool)
@ -551,9 +571,9 @@ def resize_volume(zkhandler, pool, name, size):
# 3. Add the new volume to Zookeeper
zkhandler.write([
('/ceph/volumes/{}/{}'.format(pool, name), ''),
('/ceph/volumes/{}/{}/stats'.format(pool, name), volstats),
('/ceph/snapshots/{}/{}'.format(pool, name), ''),
(('volume', f'{pool}/{name}'), ''),
(('volume.stats', f'{pool}/{name}'), volstats),
(('snapshot', f'{pool}/{name}'), ''),
])
return True, 'Resized RBD volume "{}" to size "{}" in pool "{}".'.format(name, size, pool)
@ -570,8 +590,8 @@ def rename_volume(zkhandler, pool, name, new_name):
# 2. Rename the volume in Zookeeper
zkhandler.rename([
('/ceph/volumes/{}/{}'.format(pool, name), '/ceph/volumes/{}/{}'.format(pool, new_name)),
('/ceph/snapshots/{}/{}'.format(pool, name), '/ceph/snapshots/{}/{}'.format(pool, new_name))
(('volume', f'{pool}/{name}'), ('volume', f'{pool}/{new_name}')),
(('snapshot', f'{pool}/{name}'), ('snapshot', f'{pool}/{new_name}')),
])
# 3. Get volume stats
@ -580,7 +600,7 @@ def rename_volume(zkhandler, pool, name, new_name):
# 4. Update the volume stats in Zookeeper
zkhandler.write([
('/ceph/volumes/{}/{}/stats'.format(pool, new_name), volstats)
(('volume.stats', f'{pool}/{new_name}'), volstats),
])
return True, 'Renamed RBD volume "{}" to "{}" in pool "{}".'.format(name, new_name, pool)
@ -591,7 +611,7 @@ def remove_volume(zkhandler, pool, name):
return False, 'ERROR: No volume with name "{}" is present in pool "{}".'.format(name, pool)
# 1. Remove volume snapshots
for snapshot in zkhandler.children('/ceph/snapshots/{}/{}'.format(pool, name)):
for snapshot in zkhandler.children(('snapshot', f'{pool}/{name}')):
remove_snapshot(zkhandler, pool, name, snapshot)
# 2. Remove the volume
@ -600,8 +620,10 @@ def remove_volume(zkhandler, pool, name):
return False, 'ERROR: Failed to remove RBD volume "{}" in pool "{}": {}'.format(name, pool, stderr)
# 3. Delete volume from Zookeeper
zkhandler.delete('/ceph/volumes/{}/{}'.format(pool, name))
zkhandler.delete('/ceph/snapshots/{}/{}'.format(pool, name))
zkhandler.delete([
('volume', f'{pool}/{name}'),
('snapshot', f'{pool}/{name}'),
])
return True, 'Removed RBD volume "{}" in pool "{}".'.format(name, pool)
@ -644,7 +666,6 @@ def unmap_volume(zkhandler, pool, name):
def get_list_volume(zkhandler, pool, limit, is_fuzzy=True):
volume_list = []
if pool and not verifyPool(zkhandler, pool):
return False, 'ERROR: No pool with name "{}" is present in the cluster.'.format(pool)
@ -660,18 +681,36 @@ def get_list_volume(zkhandler, pool, limit, is_fuzzy=True):
if not re.match(r'.*\$', limit):
limit = limit + '.*'
get_volume_info = dict()
for volume in full_volume_list:
pool_name, volume_name = volume.split('/')
is_limit_match = False
# Check on limit
if limit:
# Try to match the limit against the volume name
try:
if re.match(limit, volume_name):
volume_list.append(getVolumeInformation(zkhandler, pool_name, volume_name))
is_limit_match = True
except Exception as e:
return False, 'Regex Error: {}'.format(e)
else:
volume_list.append(getVolumeInformation(zkhandler, pool_name, volume_name))
is_limit_match = True
return True, sorted(volume_list, key=lambda x: str(x['name']))
get_volume_info[volume] = True if is_limit_match else False
# Obtain our volume data in a thread pool
volume_execute_list = [volume for volume in full_volume_list if get_volume_info[volume]]
volume_data_list = list()
with ThreadPoolExecutor(max_workers=32, thread_name_prefix='volume_list') as executor:
futures = []
for volume in volume_execute_list:
pool_name, volume_name = volume.split('/')
futures.append(executor.submit(getVolumeInformation, zkhandler, pool_name, volume_name))
for future in futures:
volume_data_list.append(future.result())
return True, sorted(volume_data_list, key=lambda x: str(x['name']))
#
@ -689,7 +728,7 @@ def getCephSnapshots(zkhandler, pool, volume):
volume_list = ['{}/{}'.format(volume_pool, volume_name)]
for volume_entry in volume_list:
for snapshot_name in zkhandler.children('/ceph/snapshots/{}'.format(volume_entry)):
for snapshot_name in zkhandler.children(('snapshot', volume_entry)):
snapshot_list.append('{}@{}'.format(volume_entry, snapshot_name))
return snapshot_list
@ -706,18 +745,18 @@ def add_snapshot(zkhandler, pool, volume, name):
# 2. Add the snapshot to Zookeeper
zkhandler.write([
('/ceph/snapshots/{}/{}/{}'.format(pool, volume, name), ''),
('/ceph/snapshots/{}/{}/{}/stats'.format(pool, volume, name), '{}')
(('snapshot', f'{pool}/{volume}/{name}'), ''),
(('snapshot.stats', f'{pool}/{volume}/{name}'), '{}'),
])
# 3. Update the count of snapshots on this volume
volume_stats_raw = zkhandler.read('/ceph/volumes/{}/{}/stats'.format(pool, volume))
volume_stats_raw = zkhandler.read(('volume.stats', f'{pool}/{volume}'))
volume_stats = dict(json.loads(volume_stats_raw))
# Format the size to something nicer
volume_stats['snapshot_count'] = volume_stats['snapshot_count'] + 1
volume_stats_raw = json.dumps(volume_stats)
zkhandler.write([
('/ceph/volumes/{}/{}/stats'.format(pool, volume), volume_stats_raw)
(('volume.stats', f'{pool}/{volume}'), volume_stats_raw),
])
return True, 'Created RBD snapshot "{}" of volume "{}" in pool "{}".'.format(name, volume, pool)
@ -730,13 +769,13 @@ def rename_snapshot(zkhandler, pool, volume, name, new_name):
return False, 'ERROR: No snapshot with name "{}" is present for volume "{}" in pool "{}".'.format(name, volume, pool)
# 1. Rename the snapshot
retcode, stdout, stderr = common.run_os_command('rbd snap rename {}/{}@{} {}'.format(pool, volume, name, new_name))
retcode, stdout, stderr = common.run_os_command('rbd snap rename {pool}/{volume}@{name} {pool}/{volume}@{new_name}'.format(pool=pool, volume=volume, name=name, new_name=new_name))
if retcode:
return False, 'ERROR: Failed to rename RBD snapshot "{}" to "{}" for volume "{}" in pool "{}": {}'.format(name, new_name, volume, pool, stderr)
# 2. Rename the snapshot in ZK
zkhandler.rename([
('/ceph/snapshots/{}/{}/{}'.format(pool, volume, name), '/ceph/snapshots/{}/{}/{}'.format(pool, volume, new_name))
(('snapshot', f'{pool}/{volume}/{name}'), ('snapshot', f'{pool}/{volume}/{new_name}')),
])
return True, 'Renamed RBD snapshot "{}" to "{}" for volume "{}" in pool "{}".'.format(name, new_name, volume, pool)
@ -754,16 +793,18 @@ def remove_snapshot(zkhandler, pool, volume, name):
return False, 'Failed to remove RBD snapshot "{}" of volume "{}" in pool "{}": {}'.format(name, volume, pool, stderr)
# 2. Delete snapshot from Zookeeper
zkhandler.delete('/ceph/snapshots/{}/{}/{}'.format(pool, volume, name))
zkhandler.delete([
('snapshot', f'{pool}/{volume}/{name}')
])
# 3. Update the count of snapshots on this volume
volume_stats_raw = zkhandler.read('/ceph/volumes/{}/{}/stats'.format(pool, volume))
volume_stats_raw = zkhandler.read(('volume.stats', f'{pool}/{volume}'))
volume_stats = dict(json.loads(volume_stats_raw))
# Format the size to something nicer
volume_stats['snapshot_count'] = volume_stats['snapshot_count'] - 1
volume_stats_raw = json.dumps(volume_stats)
zkhandler.write([
('/ceph/volumes/{}/{}/stats'.format(pool, volume), volume_stats_raw)
(('volume.stats', f'{pool}/{volume}'), volume_stats_raw)
])
return True, 'Removed RBD snapshot "{}" of volume "{}" in pool "{}".'.format(name, volume, pool)

View File

@ -29,7 +29,7 @@ import daemon_lib.ceph as pvc_ceph
def set_maintenance(zkhandler, maint_state):
current_maint_state = zkhandler.read('/config/maintenance')
current_maint_state = zkhandler.read('base.config.maintenance')
if maint_state == current_maint_state:
if maint_state == 'true':
return True, 'Cluster is already in maintenance mode'
@ -38,19 +38,19 @@ def set_maintenance(zkhandler, maint_state):
if maint_state == 'true':
zkhandler.write([
('/config/maintenance', 'true')
('base.config.maintenance', 'true')
])
return True, 'Successfully set cluster in maintenance mode'
else:
zkhandler.write([
('/config/maintenance', 'false')
('base.config.maintenance', 'false')
])
return True, 'Successfully set cluster in normal mode'
def getClusterInformation(zkhandler):
# Get cluster maintenance state
maint_state = zkhandler.read('/config/maintenance')
maint_state = zkhandler.read('base.config.maintenance')
# List of messages to display to the clients
cluster_health_msg = []
@ -60,7 +60,7 @@ def getClusterInformation(zkhandler):
retcode, node_list = pvc_node.get_list(zkhandler, None)
# Get vm information object list
retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None)
retcode, vm_list = pvc_vm.get_list(zkhandler, None, None, None, None)
# Get network information object list
retcode, network_list = pvc_network.get_list(zkhandler, None, None)
@ -168,7 +168,7 @@ def getClusterInformation(zkhandler):
cluster_health = 'Optimal'
# Find out our storage health from Ceph
ceph_status = zkhandler.read('/ceph').split('\n')
ceph_status = zkhandler.read('base.storage').split('\n')
ceph_health = ceph_status[2].split()[-1]
# Parse the status output to get the health indicators
@ -239,7 +239,7 @@ def getClusterInformation(zkhandler):
'storage_health': storage_health,
'storage_health_msg': storage_health_msg,
'primary_node': common.getPrimaryNode(zkhandler),
'upstream_ip': zkhandler.read('/config/upstream_ip'),
'upstream_ip': zkhandler.read('base.config.upstream_ip'),
'nodes': formatted_node_states,
'vms': formatted_vm_states,
'networks': network_count,
@ -259,3 +259,81 @@ def get_info(zkhandler):
return True, cluster_information
else:
return False, 'ERROR: Failed to obtain cluster information!'
def cluster_initialize(zkhandler, overwrite=False):
# Abort if we've initialized the cluster before
if zkhandler.exists('base.config.primary_node') and not overwrite:
return False, 'ERROR: Cluster contains data and overwrite not set.'
if overwrite:
# Delete the existing keys
for key in zkhandler.schema.keys('base'):
if key == 'root':
# Don't delete the root key
continue
status = zkhandler.delete('base.{}'.format(key), recursive=True)
if not status:
return False, 'ERROR: Failed to delete data in cluster; running nodes perhaps?'
# Create the root keys
zkhandler.schema.apply(zkhandler)
return True, 'Successfully initialized cluster'
def cluster_backup(zkhandler):
# Dictionary of values to come
cluster_data = dict()
def get_data(path):
data = zkhandler.read(path)
children = zkhandler.children(path)
cluster_data[path] = data
if children:
if path == '/':
child_prefix = '/'
else:
child_prefix = path + '/'
for child in children:
if child_prefix + child == '/zookeeper':
# We must skip the built-in /zookeeper tree
continue
if child_prefix + child == '/patroni':
# We must skip the /patroni tree
continue
get_data(child_prefix + child)
try:
get_data('/')
except Exception as e:
return False, 'ERROR: Failed to obtain backup: {}'.format(e)
return True, cluster_data
def cluster_restore(zkhandler, cluster_data):
# Build a key+value list
kv = []
schema_version = None
for key in cluster_data:
if key == zkhandler.schema.path('base.schema.version'):
schema_version = cluster_data[key]
data = cluster_data[key]
kv.append((key, data))
if int(schema_version) != int(zkhandler.schema.version):
return False, 'ERROR: Schema version of backup ({}) does not match cluster schema version ({}).'.format(schema_version, zkhandler.schema.version)
# Close the Zookeeper connection
result = zkhandler.write(kv)
if result:
return True, 'Restore completed successfully.'
else:
return False, 'Restore failed.'

View File

@ -26,9 +26,63 @@ import subprocess
import signal
from json import loads
from re import match as re_match
from re import split as re_split
from distutils.util import strtobool
from threading import Thread
from shlex import split as shlex_split
from functools import wraps
###############################################################################
# Performance Profiler decorator
###############################################################################
# Get performance statistics on a function or class
class Profiler(object):
def __init__(self, config):
self.is_debug = config['debug']
self.pvc_logdir = '/var/log/pvc'
def __call__(self, function):
if not callable(function):
return
if not self.is_debug:
return function
@wraps(function)
def profiler_wrapper(*args, **kwargs):
import cProfile
import pstats
from os import path, makedirs
from datetime import datetime
if not path.exists(self.pvc_logdir):
print('Profiler: Requested profiling of {} but no log dir present; printing instead.'.format(str(function.__name__)))
log_result = False
else:
log_result = True
profiler_logdir = '{}/profiler'.format(self.pvc_logdir)
if not path.exists(profiler_logdir):
makedirs(profiler_logdir)
pr = cProfile.Profile()
pr.enable()
ret = function(*args, **kwargs)
pr.disable()
stats = pstats.Stats(pr)
stats.sort_stats(pstats.SortKey.TIME)
if log_result:
stats.dump_stats(filename='{}/{}_{}.log'.format(profiler_logdir, str(function.__name__), str(datetime.now()).replace(' ', '_')))
else:
print('Profiler stats for function {} at {}:'.format(str(function.__name__), str(datetime.now())))
stats.print_stats()
return ret
return profiler_wrapper
###############################################################################
@ -133,7 +187,7 @@ def validateUUID(dom_uuid):
#
def getDomainXML(zkhandler, dom_uuid):
try:
xml = zkhandler.read('/domains/{}/xml'.format(dom_uuid))
xml = zkhandler.read(('domain.xml', dom_uuid))
except Exception:
return None
@ -253,34 +307,37 @@ def getDomainDiskList(zkhandler, dom_uuid):
#
# Get domain information from XML
# Get a list of domain tags
#
def getInformationFromXML(zkhandler, uuid):
def getDomainTags(zkhandler, dom_uuid):
"""
Gather information about a VM from the Libvirt XML configuration in the Zookeper database
and return a dict() containing it.
"""
domain_state = zkhandler.read('/domains/{}/state'.format(uuid))
domain_node = zkhandler.read('/domains/{}/node'.format(uuid))
domain_lastnode = zkhandler.read('/domains/{}/lastnode'.format(uuid))
domain_failedreason = zkhandler.read('/domains/{}/failedreason'.format(uuid))
Get a list of tags for domain dom_uuid
try:
domain_node_limit = zkhandler.read('/domains/{}/node_limit'.format(uuid))
except Exception:
domain_node_limit = None
try:
domain_node_selector = zkhandler.read('/domains/{}/node_selector'.format(uuid))
except Exception:
domain_node_selector = None
try:
domain_node_autostart = zkhandler.read('/domains/{}/node_autostart'.format(uuid))
except Exception:
domain_node_autostart = None
try:
domain_migration_method = zkhandler.read('/domains/{}/migration_method'.format(uuid))
except Exception:
domain_migration_method = None
The UUID must be validated before calling this function!
"""
tags = list()
for tag in zkhandler.children(('domain.meta.tags', dom_uuid)):
tag_type = zkhandler.read(('domain.meta.tags', dom_uuid, 'tag.type', tag))
protected = bool(strtobool(zkhandler.read(('domain.meta.tags', dom_uuid, 'tag.protected', tag))))
tags.append({'name': tag, 'type': tag_type, 'protected': protected})
return tags
#
# Get a set of domain metadata
#
def getDomainMetadata(zkhandler, dom_uuid):
"""
Get the domain metadata for domain dom_uuid
The UUID must be validated before calling this function!
"""
domain_node_limit = zkhandler.read(('domain.meta.node_limit', dom_uuid))
domain_node_selector = zkhandler.read(('domain.meta.node_selector', dom_uuid))
domain_node_autostart = zkhandler.read(('domain.meta.autostart', dom_uuid))
domain_migration_method = zkhandler.read(('domain.meta.migrate_method', dom_uuid))
if not domain_node_limit:
domain_node_limit = None
@ -290,23 +347,42 @@ def getInformationFromXML(zkhandler, uuid):
if not domain_node_autostart:
domain_node_autostart = None
try:
domain_profile = zkhandler.read('/domains/{}/profile'.format(uuid))
except Exception:
domain_profile = None
return domain_node_limit, domain_node_selector, domain_node_autostart, domain_migration_method
try:
domain_vnc = zkhandler.read('/domains/{}/vnc'.format(uuid))
#
# Get domain information from XML
#
def getInformationFromXML(zkhandler, uuid):
"""
Gather information about a VM from the Libvirt XML configuration in the Zookeper database
and return a dict() containing it.
"""
domain_state = zkhandler.read(('domain.state', uuid))
domain_node = zkhandler.read(('domain.node', uuid))
domain_lastnode = zkhandler.read(('domain.last_node', uuid))
domain_failedreason = zkhandler.read(('domain.failed_reason', uuid))
domain_node_limit, domain_node_selector, domain_node_autostart, domain_migration_method = getDomainMetadata(zkhandler, uuid)
domain_tags = getDomainTags(zkhandler, uuid)
domain_profile = zkhandler.read(('domain.profile', uuid))
domain_vnc = zkhandler.read(('domain.console.vnc', uuid))
if domain_vnc:
domain_vnc_listen, domain_vnc_port = domain_vnc.split(':')
except Exception:
else:
domain_vnc_listen = 'None'
domain_vnc_port = 'None'
parsed_xml = getDomainXML(zkhandler, uuid)
try:
stats_data = loads(zkhandler.read('/domains/{}/stats'.format(uuid)))
except Exception:
stats_data = zkhandler.read(('domain.stats', uuid))
if stats_data is not None:
try:
stats_data = loads(stats_data)
except Exception:
stats_data = {}
else:
stats_data = {}
domain_uuid, domain_name, domain_description, domain_memory, domain_vcpu, domain_vcputopo = getDomainMainDetails(parsed_xml)
@ -335,6 +411,7 @@ def getInformationFromXML(zkhandler, uuid):
'node_selector': domain_node_selector,
'node_autostart': bool(strtobool(domain_node_autostart)),
'migration_method': domain_migration_method,
'tags': domain_tags,
'description': domain_description,
'profile': domain_profile,
'memory': int(domain_memory),
@ -372,23 +449,28 @@ def getDomainNetworks(parsed_xml, stats_data):
net_type = device.attrib.get('type')
except Exception:
net_type = None
try:
net_mac = device.mac.attrib.get('address')
except Exception:
net_mac = None
try:
net_bridge = device.source.attrib.get(net_type)
except Exception:
net_bridge = None
try:
net_model = device.model.attrib.get('type')
except Exception:
net_model = None
try:
net_stats_list = [x for x in stats_data.get('net_stats', []) if x.get('bridge') == net_bridge]
net_stats = net_stats_list[0]
except Exception:
net_stats = {}
net_rd_bytes = net_stats.get('rd_bytes', 0)
net_rd_packets = net_stats.get('rd_packets', 0)
net_rd_errors = net_stats.get('rd_errors', 0)
@ -397,9 +479,19 @@ def getDomainNetworks(parsed_xml, stats_data):
net_wr_packets = net_stats.get('wr_packets', 0)
net_wr_errors = net_stats.get('wr_errors', 0)
net_wr_drops = net_stats.get('wr_drops', 0)
if net_type == 'direct':
net_vni = 'macvtap:' + device.source.attrib.get('dev')
net_bridge = device.source.attrib.get('dev')
elif net_type == 'hostdev':
net_vni = 'hostdev:' + str(device.sriov_device)
net_bridge = str(device.sriov_device)
else:
net_vni = re_match(r'[vm]*br([0-9a-z]+)', net_bridge).group(1)
net_obj = {
'type': net_type,
'vni': re_match(r'[vm]*br([0-9a-z]+)', net_bridge).group(1),
'vni': net_vni,
'mac': net_mac,
'source': net_bridge,
'model': net_model,
@ -439,7 +531,7 @@ def getDomainControllers(parsed_xml):
# Verify node is valid in cluster
#
def verifyNode(zkhandler, node):
return zkhandler.exists('/nodes/{}'.format(node))
return zkhandler.exists(('node', node))
#
@ -449,7 +541,7 @@ def getPrimaryNode(zkhandler):
failcount = 0
while True:
try:
primary_node = zkhandler.read('/config/primary_node')
primary_node = zkhandler.read('base.config.primary_node')
except Exception:
primary_node == 'none'
@ -473,7 +565,7 @@ def getPrimaryNode(zkhandler):
def findTargetNode(zkhandler, dom_uuid):
# Determine VM node limits; set config value if read fails
try:
node_limit = zkhandler.read('/domains/{}/node_limit'.format(dom_uuid)).split(',')
node_limit = zkhandler.read(('domain.meta.node_limit', dom_uuid)).split(',')
if not any(node_limit):
node_limit = None
except Exception:
@ -481,13 +573,13 @@ def findTargetNode(zkhandler, dom_uuid):
# Determine VM search field or use default; set config value if read fails
try:
search_field = zkhandler.read('/domains/{}/node_selector'.format(dom_uuid))
search_field = zkhandler.read(('domain.meta.node_selector', dom_uuid))
except Exception:
search_field = None
# If our search field is invalid, use the default
if search_field is None or search_field == 'None':
search_field = zkhandler.read('/config/migration_target_selector')
search_field = zkhandler.read('base.config.migration_target_selector')
# Execute the search
if search_field == 'mem':
@ -508,15 +600,15 @@ def findTargetNode(zkhandler, dom_uuid):
#
def getNodes(zkhandler, node_limit, dom_uuid):
valid_node_list = []
full_node_list = zkhandler.children('/nodes')
current_node = zkhandler.read('/domains/{}/node'.format(dom_uuid))
full_node_list = zkhandler.children('base.node')
current_node = zkhandler.read(('domain.node', dom_uuid))
for node in full_node_list:
if node_limit and node not in node_limit:
continue
daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node))
domain_state = zkhandler.read('/nodes/{}/domainstate'.format(node))
daemon_state = zkhandler.read(('node.state.daemon', node))
domain_state = zkhandler.read(('node.state.domain', node))
if node == current_node:
continue
@ -538,9 +630,9 @@ def findTargetNodeMem(zkhandler, node_limit, dom_uuid):
node_list = getNodes(zkhandler, node_limit, dom_uuid)
for node in node_list:
memprov = int(zkhandler.read('/nodes/{}/memprov'.format(node)))
memused = int(zkhandler.read('/nodes/{}/memused'.format(node)))
memfree = int(zkhandler.read('/nodes/{}/memfree'.format(node)))
memprov = int(zkhandler.read(('node.memory.provisioned', node)))
memused = int(zkhandler.read(('node.memory.used', node)))
memfree = int(zkhandler.read(('node.memory.free', node)))
memtotal = memused + memfree
provfree = memtotal - memprov
@ -560,7 +652,7 @@ def findTargetNodeLoad(zkhandler, node_limit, dom_uuid):
node_list = getNodes(zkhandler, node_limit, dom_uuid)
for node in node_list:
load = float(zkhandler.read('/nodes/{}/cpuload'.format(node)))
load = float(zkhandler.read(('node.cpu.load', node)))
if load < least_load:
least_load = load
@ -578,7 +670,7 @@ def findTargetNodeVCPUs(zkhandler, node_limit, dom_uuid):
node_list = getNodes(zkhandler, node_limit, dom_uuid)
for node in node_list:
vcpus = int(zkhandler.read('/nodes/{}/vcpualloc'.format(node)))
vcpus = int(zkhandler.read(('node.vcpu.allocated', node)))
if vcpus < least_vcpus:
least_vcpus = vcpus
@ -596,7 +688,7 @@ def findTargetNodeVMs(zkhandler, node_limit, dom_uuid):
node_list = getNodes(zkhandler, node_limit, dom_uuid)
for node in node_list:
vms = int(zkhandler.read('/nodes/{}/domainscount'.format(node)))
vms = int(zkhandler.read(('node.count.provisioned_domains', node)))
if vms < least_vms:
least_vms = vms
@ -681,3 +773,25 @@ def removeIPAddress(ipaddr, cidrnetmask, dev):
dev
)
)
#
# Sort a set of interface names (e.g. ens1f1v10)
#
def sortInterfaceNames(interface_names):
# We can't handle non-list inputs
if not isinstance(interface_names, list):
return interface_names
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
"""
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
"""
return [atoi(c) for c in re_split(r'(\d+)', text)]
return sorted(interface_names, key=natural_keys)

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python3
# log.py - Output (stdout + logfile) functions
# log.py - PVC daemon logger functions
# Part of the Parallel Virtual Cluster (PVC) system
#
# Copyright (C) 2018-2021 Joshua M. Boniface <joshua@boniface.me>
@ -19,7 +19,12 @@
#
###############################################################################
import datetime
from collections import deque
from threading import Thread
from queue import Queue
from datetime import datetime
from daemon_lib.zkhandler import ZKHandler
class Logger(object):
@ -77,17 +82,32 @@ class Logger(object):
self.last_colour = ''
self.last_prompt = ''
if self.config['zookeeper_logging']:
self.zookeeper_logger = ZookeeperLogger(config)
self.zookeeper_logger.start()
# Provide a hup function to close and reopen the writer
def hup(self):
self.writer.close()
self.writer = open(self.logfile, 'a', buffering=0)
# Provide a termination function so all messages are flushed before terminating the main daemon
def terminate(self):
if self.config['file_logging']:
self.writer.close()
if self.config['zookeeper_logging']:
self.out("Waiting for Zookeeper message queue to drain", state='s')
while not self.zookeeper_logger.queue.empty():
pass
self.zookeeper_logger.stop()
self.zookeeper_logger.join()
# Output function
def out(self, message, state=None, prefix=''):
# Get the date
if self.config['log_dates']:
date = '{} - '.format(datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S.%f'))
date = '{} '.format(datetime.now().strftime('%Y/%m/%d %H:%M:%S.%f'))
else:
date = ''
@ -123,6 +143,71 @@ class Logger(object):
if self.config['file_logging']:
self.writer.write(message + '\n')
# Log to Zookeeper
if self.config['zookeeper_logging']:
self.zookeeper_logger.queue.put(message)
# Set last message variables
self.last_colour = colour
self.last_prompt = prompt
class ZookeeperLogger(Thread):
"""
Defines a threaded writer for Zookeeper locks. Threading prevents the blocking of other
daemon events while the records are written. They will be eventually-consistent
"""
def __init__(self, config):
self.config = config
self.node = self.config['node']
self.max_lines = self.config['node_log_lines']
self.queue = Queue()
self.zkhandler = None
self.start_zkhandler()
# Ensure the root keys for this are instantiated
self.zkhandler.write([
('base.logs', ''),
(('logs', self.node), '')
])
self.running = False
Thread.__init__(self, args=(), kwargs=None)
def start_zkhandler(self):
# We must open our own dedicated Zookeeper instance because we can't guarantee one already exists when this starts
if self.zkhandler is not None:
try:
self.zkhandler.disconnect()
except Exception:
pass
self.zkhandler = ZKHandler(self.config, logger=None)
self.zkhandler.connect(persistent=True)
def run(self):
self.running = True
# Get the logs that are currently in Zookeeper and populate our deque
raw_logs = self.zkhandler.read(('logs.messages', self.node))
if raw_logs is None:
raw_logs = ''
logs = deque(raw_logs.split('\n'), self.max_lines)
while self.running:
# Get a new message
try:
message = self.queue.get(timeout=1)
if not message:
continue
except Exception:
continue
if not self.config['log_dates']:
# We want to log dates here, even if the log_dates config is not set
date = '{} '.format(datetime.now().strftime('%Y/%m/%d %H:%M:%S.%f'))
else:
date = ''
# Add the message to the deque
logs.append(f'{date}{message}')
# Write the updated messages into Zookeeper
self.zkhandler.write([(('logs.messages', self.node), '\n'.join(logs))])
return
def stop(self):
self.running = False

View File

@ -0,0 +1 @@
{"version": "0", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "migrate.sync_lock": "/migrate_sync_lock"}, "network": {"vni": "", "type": "/nettype", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -0,0 +1 @@
{"version": "1", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "migrate.sync_lock": "/migrate_sync_lock"}, "network": {"vni": "", "type": "/nettype", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -0,0 +1 @@
{"version": "2", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "migrate.sync_lock": "/migrate_sync_lock"}, "network": {"vni": "", "type": "/nettype", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -0,0 +1 @@
{"version": "3", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -0,0 +1 @@
{"version": "4", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "logs": "/logs", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -21,17 +21,19 @@
import re
import daemon_lib.common as common
#
# Cluster search functions
#
def getClusterNetworkList(zkhandler):
# Get a list of VNIs by listing the children of /networks
vni_list = zkhandler.children('/networks')
vni_list = zkhandler.children('base.network')
description_list = []
# For each VNI, get the corresponding description from the data
for vni in vni_list:
description_list.append(zkhandler.read('/networks/{}'.format(vni)))
description_list.append(zkhandler.read(('network', vni)))
return vni_list, description_list
@ -91,14 +93,14 @@ def getNetworkDescription(zkhandler, network):
def getNetworkDHCPLeases(zkhandler, vni):
# Get a list of DHCP leases by listing the children of /networks/<vni>/dhcp4_leases
dhcp4_leases = zkhandler.children('/networks/{}/dhcp4_leases'.format(vni))
return sorted(dhcp4_leases)
leases = zkhandler.children(('network.lease', vni))
return sorted(leases)
def getNetworkDHCPReservations(zkhandler, vni):
# Get a list of DHCP reservations by listing the children of /networks/<vni>/dhcp4_reservations
dhcp4_reservations = zkhandler.children('/networks/{}/dhcp4_reservations'.format(vni))
return sorted(dhcp4_reservations)
reservations = zkhandler.children(('network.reservation', vni))
return sorted(reservations)
def getNetworkACLs(zkhandler, vni, _direction):
@ -110,32 +112,39 @@ def getNetworkACLs(zkhandler, vni, _direction):
full_acl_list = []
for direction in directions:
unordered_acl_list = zkhandler.children('/networks/{}/firewall_rules/{}'.format(vni, direction))
unordered_acl_list = zkhandler.children((f'network.rule.{direction}', vni))
if len(unordered_acl_list) < 1:
continue
ordered_acls = dict()
for acl in unordered_acl_list:
order = zkhandler.read('/networks/{}/firewall_rules/{}/{}/order'.format(vni, direction, acl))
order = zkhandler.read((f'network.rule.{direction}', vni, 'rule.order', acl))
if order is None:
continue
ordered_acls[order] = acl
for order in sorted(ordered_acls.keys()):
rule = zkhandler.read('/networks/{}/firewall_rules/{}/{}/rule'.format(vni, direction, acl))
rule = zkhandler.read((f'network.rule.{direction}', vni, 'rule.rule', acl))
if rule is None:
continue
full_acl_list.append({'direction': direction, 'order': int(order), 'description': ordered_acls[order], 'rule': rule})
return full_acl_list
def getNetworkInformation(zkhandler, vni):
description = zkhandler.read('/networks/{}'.format(vni))
nettype = zkhandler.read('/networks/{}/nettype'.format(vni))
domain = zkhandler.read('/networks/{}/domain'.format(vni))
name_servers = zkhandler.read('/networks/{}/name_servers'.format(vni))
ip6_network = zkhandler.read('/networks/{}/ip6_network'.format(vni))
ip6_gateway = zkhandler.read('/networks/{}/ip6_gateway'.format(vni))
dhcp6_flag = zkhandler.read('/networks/{}/dhcp6_flag'.format(vni))
ip4_network = zkhandler.read('/networks/{}/ip4_network'.format(vni))
ip4_gateway = zkhandler.read('/networks/{}/ip4_gateway'.format(vni))
dhcp4_flag = zkhandler.read('/networks/{}/dhcp4_flag'.format(vni))
dhcp4_start = zkhandler.read('/networks/{}/dhcp4_start'.format(vni))
dhcp4_end = zkhandler.read('/networks/{}/dhcp4_end'.format(vni))
description = zkhandler.read(('network', vni))
nettype = zkhandler.read(('network.type', vni))
domain = zkhandler.read(('network.domain', vni))
name_servers = zkhandler.read(('network.nameservers', vni))
ip6_network = zkhandler.read(('network.ip6.network', vni))
ip6_gateway = zkhandler.read(('network.ip6.gateway', vni))
dhcp6_flag = zkhandler.read(('network.ip6.dhcp', vni))
ip4_network = zkhandler.read(('network.ip4.network', vni))
ip4_gateway = zkhandler.read(('network.ip4.gateway', vni))
dhcp4_flag = zkhandler.read(('network.ip4.dhcp', vni))
dhcp4_start = zkhandler.read(('network.ip4.dhcp_start', vni))
dhcp4_end = zkhandler.read(('network.ip4.dhcp_end', vni))
# Construct a data structure to represent the data
network_information = {
@ -162,17 +171,17 @@ def getNetworkInformation(zkhandler, vni):
def getDHCPLeaseInformation(zkhandler, vni, mac_address):
# Check whether this is a dynamic or static lease
if zkhandler.exists('/networks/{}/dhcp4_leases/{}'.format(vni, mac_address)):
type_key = 'dhcp4_leases'
elif zkhandler.exists('/networks/{}/dhcp4_reservations/{}'.format(vni, mac_address)):
type_key = 'dhcp4_reservations'
if zkhandler.exists(('network.lease', vni, 'lease', mac_address)):
type_key = 'lease'
elif zkhandler.exists(('network.reservation', vni, 'reservation', mac_address)):
type_key = 'reservation'
else:
return {}
return None
hostname = zkhandler.read('/networks/{}/{}/{}/hostname'.format(vni, type_key, mac_address))
ip4_address = zkhandler.read('/networks/{}/{}/{}/ipaddr'.format(vni, type_key, mac_address))
if type_key == 'dhcp4_leases':
timestamp = zkhandler.read('/networks/{}/{}/{}/expiry'.format(vni, type_key, mac_address))
hostname = zkhandler.read((f'network.{type_key}', vni, f'{type_key}.hostname', mac_address))
ip4_address = zkhandler.read((f'network.{type_key}', vni, f'{type_key}.ip', mac_address))
if type_key == 'lease':
timestamp = zkhandler.read((f'network.{type_key}', vni, f'{type_key}.expiry', mac_address))
else:
timestamp = 'static'
@ -186,14 +195,14 @@ def getDHCPLeaseInformation(zkhandler, vni, mac_address):
return lease_information
def getACLInformation(zkhandler, vni, direction, description):
order = zkhandler.read('/networks/{}/firewall_rules/{}/{}/order'.format(vni, direction, description))
rule = zkhandler.read('/networks/{}/firewall_rules/{}/{}/rule'.format(vni, direction, description))
def getACLInformation(zkhandler, vni, direction, acl):
order = zkhandler.read((f'network.rule.{direction}', vni, 'rule.order', acl))
rule = zkhandler.read((f'network.rule.{direction}', vni, 'rule.rule', acl))
# Construct a data structure to represent the data
acl_information = {
'order': order,
'description': description,
'description': acl,
'rule': rule,
'direction': direction
}
@ -201,12 +210,7 @@ def getACLInformation(zkhandler, vni, direction, description):
def isValidMAC(macaddr):
allowed = re.compile(r"""
(
^([0-9A-F]{2}[:]){5}([0-9A-F]{2})$
)
""",
re.VERBOSE | re.IGNORECASE)
allowed = re.compile(r'(^([0-9A-F]{2}[:]){5}([0-9A-F]{2})$)', re.VERBOSE | re.IGNORECASE)
if allowed.match(macaddr):
return True
@ -239,10 +243,11 @@ def add_network(zkhandler, vni, description, nettype,
return False, 'ERROR: DHCPv4 start and end addresses are required for a DHCPv4-enabled network.'
# Check if a network with this VNI or description already exists
if zkhandler.exists('/networks/{}'.format(vni)):
if zkhandler.exists(('network', vni)):
return False, 'ERROR: A network with VNI "{}" already exists!'.format(vni)
for network in zkhandler.children('/networks'):
network_description = zkhandler.read('/networks/{}'.format(network))
for network in zkhandler.children('base.network'):
network_description = zkhandler.read(('network', network))
if network_description == description:
return False, 'ERROR: A network with description "{}" already exists!'.format(description)
@ -255,31 +260,34 @@ def add_network(zkhandler, vni, description, nettype,
else:
dhcp6_flag = 'False'
if nettype == 'managed' and not domain:
if nettype in ['managed'] and not domain:
domain = '{}.local'.format(description)
# Add the new network to Zookeeper
zkhandler.write([
('/networks/{}'.format(vni), description),
('/networks/{}/nettype'.format(vni), nettype),
('/networks/{}/domain'.format(vni), domain),
('/networks/{}/name_servers'.format(vni), name_servers),
('/networks/{}/ip6_network'.format(vni), ip6_network),
('/networks/{}/ip6_gateway'.format(vni), ip6_gateway),
('/networks/{}/dhcp6_flag'.format(vni), dhcp6_flag),
('/networks/{}/ip4_network'.format(vni), ip4_network),
('/networks/{}/ip4_gateway'.format(vni), ip4_gateway),
('/networks/{}/dhcp4_flag'.format(vni), dhcp4_flag),
('/networks/{}/dhcp4_start'.format(vni), dhcp4_start),
('/networks/{}/dhcp4_end'.format(vni), dhcp4_end),
('/networks/{}/dhcp4_leases'.format(vni), ''),
('/networks/{}/dhcp4_reservations'.format(vni), ''),
('/networks/{}/firewall_rules'.format(vni), ''),
('/networks/{}/firewall_rules/in'.format(vni), ''),
('/networks/{}/firewall_rules/out'.format(vni), '')
result = zkhandler.write([
(('network', vni), description),
(('network.type', vni), nettype),
(('network.domain', vni), domain),
(('network.nameservers', vni), name_servers),
(('network.ip6.network', vni), ip6_network),
(('network.ip6.gateway', vni), ip6_gateway),
(('network.ip6.dhcp', vni), dhcp6_flag),
(('network.ip4.network', vni), ip4_network),
(('network.ip4.gateway', vni), ip4_gateway),
(('network.ip4.dhcp', vni), dhcp4_flag),
(('network.ip4.dhcp_start', vni), dhcp4_start),
(('network.ip4.dhcp_end', vni), dhcp4_end),
(('network.lease', vni), ''),
(('network.reservation', vni), ''),
(('network.rule', vni), ''),
(('network.rule.in', vni), ''),
(('network.rule.out', vni), '')
])
return True, 'Network "{}" added successfully!'.format(description)
if result:
return True, 'Network "{}" added successfully!'.format(description)
else:
return False, 'ERROR: Failed to add network.'
def modify_network(zkhandler, vni, description=None, domain=None, name_servers=None,
@ -288,36 +296,36 @@ def modify_network(zkhandler, vni, description=None, domain=None, name_servers=N
# Add the modified parameters to Zookeeper
update_data = list()
if description is not None:
update_data.append(('/networks/{}'.format(vni), description))
update_data.append((('network', vni), description))
if domain is not None:
update_data.append(('/networks/{}/domain'.format(vni), domain))
update_data.append((('network.domain', vni), domain))
if name_servers is not None:
update_data.append(('/networks/{}/name_servers'.format(vni), name_servers))
update_data.append((('network.nameservers', vni), name_servers))
if ip4_network is not None:
update_data.append(('/networks/{}/ip4_network'.format(vni), ip4_network))
update_data.append((('network.ip4.network', vni), ip4_network))
if ip4_gateway is not None:
update_data.append(('/networks/{}/ip4_gateway'.format(vni), ip4_gateway))
update_data.append((('network.ip4.gateway', vni), ip4_gateway))
if ip6_network is not None:
update_data.append(('/networks/{}/ip6_network'.format(vni), ip6_network))
update_data.append((('network.ip6.network', vni), ip6_network))
if ip6_network:
update_data.append(('/networks/{}/dhcp6_flag'.format(vni), 'True'))
update_data.append((('network.ip6.dhcp', vni), 'True'))
else:
update_data.append(('/networks/{}/dhcp6_flag'.format(vni), 'False'))
update_data.append((('network.ip6.dhcp', vni), 'False'))
if ip6_gateway is not None:
update_data.append(('/networks/{}/ip6_gateway'.format(vni), ip6_gateway))
update_data.append((('network.ip6.gateway', vni), ip6_gateway))
else:
# If we're changing the network, but don't also specify the gateway,
# generate a new one automatically
if ip6_network:
ip6_netpart, ip6_maskpart = ip6_network.split('/')
ip6_gateway = '{}1'.format(ip6_netpart)
update_data.append(('/networks/{}/ip6_gateway'.format(vni), ip6_gateway))
update_data.append((('network.ip6.gateway', vni), ip6_gateway))
if dhcp4_flag is not None:
update_data.append(('/networks/{}/dhcp4_flag'.format(vni), dhcp4_flag))
update_data.append((('network.ip4.dhcp', vni), dhcp4_flag))
if dhcp4_start is not None:
update_data.append(('/networks/{}/dhcp4_start'.format(vni), dhcp4_start))
update_data.append((('network.ip4.dhcp_start', vni), dhcp4_start))
if dhcp4_end is not None:
update_data.append(('/networks/{}/dhcp4_end'.format(vni), dhcp4_end))
update_data.append((('network.ip4.dhcp_end', vni), dhcp4_end))
zkhandler.write(update_data)
@ -332,7 +340,9 @@ def remove_network(zkhandler, network):
return False, 'ERROR: Could not find network "{}" in the cluster!'.format(network)
# Delete the configuration
zkhandler.delete('/networks/{}'.format(vni))
zkhandler.delete([
('network', vni)
])
return True, 'Network "{}" removed successfully!'.format(description)
@ -352,18 +362,15 @@ def add_dhcp_reservation(zkhandler, network, ipaddress, macaddress, hostname):
if not isValidIP(ipaddress):
return False, 'ERROR: IP address "{}" is not valid!'.format(macaddress)
if zkhandler.exists('/networks/{}/dhcp4_reservations/{}'.format(net_vni, macaddress)):
if zkhandler.exists(('network.reservation', net_vni, 'reservation', macaddress)):
return False, 'ERROR: A reservation with MAC "{}" already exists!'.format(macaddress)
# Add the new static lease to ZK
try:
zkhandler.write([
('/networks/{}/dhcp4_reservations/{}'.format(net_vni, macaddress), 'static'),
('/networks/{}/dhcp4_reservations/{}/hostname'.format(net_vni, macaddress), hostname),
('/networks/{}/dhcp4_reservations/{}/ipaddr'.format(net_vni, macaddress), ipaddress)
])
except Exception as e:
return False, 'ERROR: Failed to write to Zookeeper! Exception: "{}".'.format(e)
zkhandler.write([
(('network.reservation', net_vni, 'reservation', macaddress), 'static'),
(('network.reservation', net_vni, 'reservation.hostname', macaddress), hostname),
(('network.reservation', net_vni, 'reservation.ip', macaddress), ipaddress),
])
return True, 'DHCP reservation "{}" added successfully!'.format(macaddress)
@ -379,28 +386,30 @@ def remove_dhcp_reservation(zkhandler, network, reservation):
# Check if the reservation matches a static reservation description, a mac, or an IP address currently in the database
dhcp4_reservations_list = getNetworkDHCPReservations(zkhandler, net_vni)
for macaddr in dhcp4_reservations_list:
hostname = zkhandler.read('/networks/{}/dhcp4_reservations/{}/hostname'.format(net_vni, macaddr))
ipaddress = zkhandler.read('/networks/{}/dhcp4_reservations/{}/ipaddr'.format(net_vni, macaddr))
hostname = zkhandler.read(('network.reservation', net_vni, 'reservation.hostname', macaddr))
ipaddress = zkhandler.read(('network.reservation', net_vni, 'reservation.ip', macaddr))
if reservation == macaddr or reservation == hostname or reservation == ipaddress:
match_description = macaddr
lease_type_zk = 'reservations'
lease_type_zk = 'reservation'
lease_type_human = 'static reservation'
# Check if the reservation matches a dynamic reservation description, a mac, or an IP address currently in the database
dhcp4_leases_list = getNetworkDHCPLeases(zkhandler, net_vni)
for macaddr in dhcp4_leases_list:
hostname = zkhandler.read('/networks/{}/dhcp4_leases/{}/hostname'.format(net_vni, macaddr))
ipaddress = zkhandler.read('/networks/{}/dhcp4_leases/{}/ipaddr'.format(net_vni, macaddr))
hostname = zkhandler.read(('network.lease', net_vni, 'lease.hostname', macaddr))
ipaddress = zkhandler.read(('network.lease', net_vni, 'lease.ip', macaddr))
if reservation == macaddr or reservation == hostname or reservation == ipaddress:
match_description = macaddr
lease_type_zk = 'leases'
lease_type_zk = 'lease'
lease_type_human = 'dynamic lease'
if not match_description:
return False, 'ERROR: No DHCP reservation or lease exists matching "{}"!'.format(reservation)
# Remove the entry from zookeeper
zkhandler.delete('/networks/{}/dhcp4_{}/{}'.format(net_vni, lease_type_zk, match_description))
zkhandler.delete([
(f'network.{lease_type_zk}', net_vni, f'{lease_type_zk}', match_description),
])
return True, 'DHCP {} "{}" removed successfully!'.format(lease_type_human, match_description)
@ -449,14 +458,14 @@ def add_acl(zkhandler, network, direction, description, rule, order):
continue
else:
zkhandler.write([
('/networks/{}/firewall_rules/{}/{}/order'.format(net_vni, direction, acl['description']), idx)
((f'network.rule.{direction}', net_vni, 'rule.order', acl['description']), idx)
])
# Add the new rule
zkhandler.write([
('/networks/{}/firewall_rules/{}/{}'.format(net_vni, direction, description), ''),
('/networks/{}/firewall_rules/{}/{}/order'.format(net_vni, direction, description), order),
('/networks/{}/firewall_rules/{}/{}/rule'.format(net_vni, direction, description), rule)
((f'network.rule.{direction}', net_vni, 'rule', description), ''),
((f'network.rule.{direction}', net_vni, 'rule.order', description), order),
((f'network.rule.{direction}', net_vni, 'rule.rule', description), rule),
])
return True, 'Firewall rule "{}" added successfully!'.format(description)
@ -481,10 +490,9 @@ def remove_acl(zkhandler, network, description):
return False, 'ERROR: No firewall rule exists matching description "{}"!'.format(description)
# Remove the entry from zookeeper
try:
zkhandler.delete('/networks/{}/firewall_rules/{}/{}'.format(net_vni, match_direction, match_description))
except Exception as e:
return False, 'ERROR: Failed to write to Zookeeper! Exception: "{}".'.format(e)
zkhandler.delete([
(f'network.rule.{match_direction}', net_vni, 'rule', match_description)
])
# Update the existing ordering
updated_acl_list = getNetworkACLs(zkhandler, net_vni, match_direction)
@ -496,7 +504,7 @@ def remove_acl(zkhandler, network, description):
continue
else:
zkhandler.write([
('/networks/{}/firewall_rules/{}/{}/order'.format(net_vni, match_direction, acl['description']), idx)
((f'network.rule.{match_direction}', net_vni, 'rule.order', acl['description']), idx),
])
return True, 'Firewall rule "{}" removed successfully!'.format(match_description)
@ -517,10 +525,10 @@ def get_info(zkhandler, network):
def get_list(zkhandler, limit, is_fuzzy=True):
net_list = []
full_net_list = zkhandler.children('/networks')
full_net_list = zkhandler.children('base.network')
for net in full_net_list:
description = zkhandler.read('/networks/{}'.format(net))
description = zkhandler.read(('network', net))
if limit:
try:
if not is_fuzzy:
@ -623,3 +631,226 @@ def get_list_acl(zkhandler, network, limit, direction, is_fuzzy=True):
acl_list.append(acl)
return True, acl_list
#
# SR-IOV functions
#
# These are separate since they don't work like other network types
#
def getSRIOVPFInformation(zkhandler, node, pf):
mtu = zkhandler.read(('node.sriov.pf', node, 'sriov_pf.mtu', pf))
retcode, vf_list = get_list_sriov_vf(zkhandler, node, pf)
if retcode:
vfs = common.sortInterfaceNames([vf['phy'] for vf in vf_list if vf['pf'] == pf])
else:
vfs = []
# Construct a data structure to represent the data
pf_information = {
'phy': pf,
'mtu': mtu,
'vfs': vfs,
}
return pf_information
def get_info_sriov_pf(zkhandler, node, pf):
pf_information = getSRIOVPFInformation(zkhandler, node, pf)
if not pf_information:
return False, 'ERROR: Could not get information about SR-IOV PF "{}" on node "{}"'.format(pf, node)
return True, pf_information
def get_list_sriov_pf(zkhandler, node):
pf_list = list()
pf_phy_list = zkhandler.children(('node.sriov.pf', node))
for phy in pf_phy_list:
retcode, pf_information = get_info_sriov_pf(zkhandler, node, phy)
if retcode:
pf_list.append(pf_information)
return True, pf_list
def getSRIOVVFInformation(zkhandler, node, vf):
if not zkhandler.exists(('node.sriov.vf', node, 'sriov_vf', vf)):
return []
pf = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.pf', vf))
mtu = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.mtu', vf))
mac = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.mac', vf))
vlan_id = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.vlan_id', vf))
vlan_qos = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.vlan_qos', vf))
tx_rate_min = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.tx_rate_min', vf))
tx_rate_max = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.tx_rate_max', vf))
link_state = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.link_state', vf))
spoof_check = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.spoof_check', vf))
trust = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.trust', vf))
query_rss = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.config.query_rss', vf))
pci_domain = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.pci.domain', vf))
pci_bus = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.pci.bus', vf))
pci_slot = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.pci.slot', vf))
pci_function = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.pci.function', vf))
used = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.used', vf))
used_by_domain = zkhandler.read(('node.sriov.vf', node, 'sriov_vf.used_by', vf))
vf_information = {
'phy': vf,
'pf': pf,
'mtu': mtu,
'mac': mac,
'config': {
'vlan_id': vlan_id,
'vlan_qos': vlan_qos,
'tx_rate_min': tx_rate_min,
'tx_rate_max': tx_rate_max,
'link_state': link_state,
'spoof_check': spoof_check,
'trust': trust,
'query_rss': query_rss,
},
'pci': {
'domain': pci_domain,
'bus': pci_bus,
'slot': pci_slot,
'function': pci_function,
},
'usage': {
'used': used,
'domain': used_by_domain,
}
}
return vf_information
def get_info_sriov_vf(zkhandler, node, vf):
# Verify node is valid
valid_node = common.verifyNode(zkhandler, node)
if not valid_node:
return False, 'ERROR: Specified node "{}" is invalid.'.format(node)
vf_information = getSRIOVVFInformation(zkhandler, node, vf)
if not vf_information:
return False, 'ERROR: Could not find SR-IOV VF "{}" on node "{}"'.format(vf, node)
return True, vf_information
def get_list_sriov_vf(zkhandler, node, pf=None):
# Verify node is valid
valid_node = common.verifyNode(zkhandler, node)
if not valid_node:
return False, 'ERROR: Specified node "{}" is invalid.'.format(node)
vf_list = list()
vf_phy_list = common.sortInterfaceNames(zkhandler.children(('node.sriov.vf', node)))
for phy in vf_phy_list:
retcode, vf_information = get_info_sriov_vf(zkhandler, node, phy)
if retcode:
if pf is not None:
if vf_information['pf'] == pf:
vf_list.append(vf_information)
else:
vf_list.append(vf_information)
return True, vf_list
def set_sriov_vf_config(zkhandler, node, vf, vlan_id=None, vlan_qos=None, tx_rate_min=None, tx_rate_max=None, link_state=None, spoof_check=None, trust=None, query_rss=None):
# Verify node is valid
valid_node = common.verifyNode(zkhandler, node)
if not valid_node:
return False, 'ERROR: Specified node "{}" is invalid.'.format(node)
# Verify VF is valid
vf_information = getSRIOVVFInformation(zkhandler, node, vf)
if not vf_information:
return False, 'ERROR: Could not find SR-IOV VF "{}" on node "{}".'.format(vf, node)
update_list = list()
if vlan_id is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.vlan_id', vf), vlan_id))
if vlan_qos is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.vlan_qos', vf), vlan_qos))
if tx_rate_min is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.tx_rate_min', vf), tx_rate_min))
if tx_rate_max is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.tx_rate_max', vf), tx_rate_max))
if link_state is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.link_state', vf), link_state))
if spoof_check is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.spoof_check', vf), spoof_check))
if trust is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.trust', vf), trust))
if query_rss is not None:
update_list.append((('node.sriov.vf', node, 'sriov_vf.config.query_rss', vf), query_rss))
if len(update_list) < 1:
return False, 'ERROR: No changes to apply.'
result = zkhandler.write(update_list)
if result:
return True, 'Successfully modified configuration of SR-IOV VF "{}" on node "{}".'.format(vf, node)
else:
return False, 'Failed to modify configuration of SR-IOV VF "{}" on node "{}".'.format(vf, node)
def set_sriov_vf_vm(zkhandler, vm_uuid, node, vf, vf_macaddr, vf_type):
# Verify node is valid
valid_node = common.verifyNode(zkhandler, node)
if not valid_node:
return False
# Verify VF is valid
vf_information = getSRIOVVFInformation(zkhandler, node, vf)
if not vf_information:
return False
update_list = [
(('node.sriov.vf', node, 'sriov_vf.used', vf), 'True'),
(('node.sriov.vf', node, 'sriov_vf.used_by', vf), vm_uuid),
(('node.sriov.vf', node, 'sriov_vf.mac', vf), vf_macaddr),
]
# Hostdev type SR-IOV prevents the guest from live migrating
if vf_type == 'hostdev':
update_list.append(
(('domain.meta.migrate_method', vm_uuid), 'shutdown')
)
zkhandler.write(update_list)
return True
def unset_sriov_vf_vm(zkhandler, node, vf):
# Verify node is valid
valid_node = common.verifyNode(zkhandler, node)
if not valid_node:
return False
# Verify VF is valid
vf_information = getSRIOVVFInformation(zkhandler, node, vf)
if not vf_information:
return False
update_list = [
(('node.sriov.vf', node, 'sriov_vf.used', vf), 'False'),
(('node.sriov.vf', node, 'sriov_vf.used_by', vf), ''),
(('node.sriov.vf', node, 'sriov_vf.mac', vf), zkhandler.read(('node.sriov.vf', node, 'sriov_vf.phy_mac', vf)))
]
zkhandler.write(update_list)
return True

View File

@ -29,23 +29,24 @@ def getNodeInformation(zkhandler, node_name):
"""
Gather information about a node from the Zookeeper database and return a dict() containing it.
"""
node_daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node_name))
node_coordinator_state = zkhandler.read('/nodes/{}/routerstate'.format(node_name))
node_domain_state = zkhandler.read('/nodes/{}/domainstate'.format(node_name))
node_static_data = zkhandler.read('/nodes/{}/staticdata'.format(node_name)).split()
node_daemon_state = zkhandler.read(('node.state.daemon', node_name))
node_coordinator_state = zkhandler.read(('node.state.router', node_name))
node_domain_state = zkhandler.read(('node.state.domain', node_name))
node_static_data = zkhandler.read(('node.data.static', node_name)).split()
node_pvc_version = zkhandler.read(('node.data.pvc_version', node_name))
node_cpu_count = int(node_static_data[0])
node_kernel = node_static_data[1]
node_os = node_static_data[2]
node_arch = node_static_data[3]
node_vcpu_allocated = int(zkhandler.read('nodes/{}/vcpualloc'.format(node_name)))
node_mem_total = int(zkhandler.read('/nodes/{}/memtotal'.format(node_name)))
node_mem_allocated = int(zkhandler.read('/nodes/{}/memalloc'.format(node_name)))
node_mem_provisioned = int(zkhandler.read('/nodes/{}/memprov'.format(node_name)))
node_mem_used = int(zkhandler.read('/nodes/{}/memused'.format(node_name)))
node_mem_free = int(zkhandler.read('/nodes/{}/memfree'.format(node_name)))
node_load = float(zkhandler.read('/nodes/{}/cpuload'.format(node_name)))
node_domains_count = int(zkhandler.read('/nodes/{}/domainscount'.format(node_name)))
node_running_domains = zkhandler.read('/nodes/{}/runningdomains'.format(node_name)).split()
node_vcpu_allocated = int(zkhandler.read(('node.vcpu.allocated', node_name)))
node_mem_total = int(zkhandler.read(('node.memory.total', node_name)))
node_mem_allocated = int(zkhandler.read(('node.memory.allocated', node_name)))
node_mem_provisioned = int(zkhandler.read(('node.memory.provisioned', node_name)))
node_mem_used = int(zkhandler.read(('node.memory.used', node_name)))
node_mem_free = int(zkhandler.read(('node.memory.free', node_name)))
node_load = float(zkhandler.read(('node.cpu.load', node_name)))
node_domains_count = int(zkhandler.read(('node.count.provisioned_domains', node_name)))
node_running_domains = zkhandler.read(('node.running_domains', node_name)).split()
# Construct a data structure to represent the data
node_information = {
@ -53,6 +54,7 @@ def getNodeInformation(zkhandler, node_name):
'daemon_state': node_daemon_state,
'coordinator_state': node_coordinator_state,
'domain_state': node_domain_state,
'pvc_version': node_pvc_version,
'cpu_count': node_cpu_count,
'kernel': node_kernel,
'os': node_os,
@ -84,23 +86,23 @@ def secondary_node(zkhandler, node):
return False, 'ERROR: No node named "{}" is present in the cluster.'.format(node)
# Ensure node is a coordinator
daemon_mode = zkhandler.read('/nodes/{}/daemonmode'.format(node))
daemon_mode = zkhandler.read(('node.mode', node))
if daemon_mode == 'hypervisor':
return False, 'ERROR: Cannot change router mode on non-coordinator node "{}"'.format(node)
# Ensure node is in run daemonstate
daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node))
daemon_state = zkhandler.read(('node.state.daemon', node))
if daemon_state != 'run':
return False, 'ERROR: Node "{}" is not active'.format(node)
# Get current state
current_state = zkhandler.read('/nodes/{}/routerstate'.format(node))
current_state = zkhandler.read(('node.state.router', node))
if current_state == 'secondary':
return True, 'Node "{}" is already in secondary router mode.'.format(node)
retmsg = 'Setting node {} in secondary router mode.'.format(node)
zkhandler.write([
('/config/primary_node', 'none')
('base.config.primary_node', 'none')
])
return True, retmsg
@ -112,23 +114,23 @@ def primary_node(zkhandler, node):
return False, 'ERROR: No node named "{}" is present in the cluster.'.format(node)
# Ensure node is a coordinator
daemon_mode = zkhandler.read('/nodes/{}/daemonmode'.format(node))
daemon_mode = zkhandler.read(('node.mode', node))
if daemon_mode == 'hypervisor':
return False, 'ERROR: Cannot change router mode on non-coordinator node "{}"'.format(node)
# Ensure node is in run daemonstate
daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node))
daemon_state = zkhandler.read(('node.state.daemon', node))
if daemon_state != 'run':
return False, 'ERROR: Node "{}" is not active'.format(node)
# Get current state
current_state = zkhandler.read('/nodes/{}/routerstate'.format(node))
current_state = zkhandler.read(('node.state.router', node))
if current_state == 'primary':
return True, 'Node "{}" is already in primary router mode.'.format(node)
retmsg = 'Setting node {} in primary router mode.'.format(node)
zkhandler.write([
('/config/primary_node', node)
('base.config.primary_node', node)
])
return True, retmsg
@ -139,18 +141,18 @@ def flush_node(zkhandler, node, wait=False):
if not common.verifyNode(zkhandler, node):
return False, 'ERROR: No node named "{}" is present in the cluster.'.format(node)
if zkhandler.read('/nodes/{}/domainstate'.format(node)) == 'flushed':
if zkhandler.read(('node.state.domain', node)) == 'flushed':
return True, 'Hypervisor {} is already flushed.'.format(node)
retmsg = 'Flushing hypervisor {} of running VMs.'.format(node)
# Add the new domain to Zookeeper
zkhandler.write([
('/nodes/{}/domainstate'.format(node), 'flush')
(('node.state.domain', node), 'flush')
])
if wait:
while zkhandler.read('/nodes/{}/domainstate'.format(node)) == 'flush':
while zkhandler.read(('node.state.domain', node)) == 'flush':
time.sleep(1)
retmsg = 'Flushed hypervisor {} of running VMs.'.format(node)
@ -162,24 +164,42 @@ def ready_node(zkhandler, node, wait=False):
if not common.verifyNode(zkhandler, node):
return False, 'ERROR: No node named "{}" is present in the cluster.'.format(node)
if zkhandler.read('/nodes/{}/domainstate'.format(node)) == 'ready':
if zkhandler.read(('node.state.domain', node)) == 'ready':
return True, 'Hypervisor {} is already ready.'.format(node)
retmsg = 'Restoring hypervisor {} to active service.'.format(node)
# Add the new domain to Zookeeper
zkhandler.write([
('/nodes/{}/domainstate'.format(node), 'unflush')
(('node.state.domain', node), 'unflush')
])
if wait:
while zkhandler.read('/nodes/{}/domainstate'.format(node)) == 'unflush':
while zkhandler.read(('node.state.domain', node)) == 'unflush':
time.sleep(1)
retmsg = 'Restored hypervisor {} to active service.'.format(node)
return True, retmsg
def get_node_log(zkhandler, node, lines=2000):
# Verify node is valid
if not common.verifyNode(zkhandler, node):
return False, 'ERROR: No node named "{}" is present in the cluster.'.format(node)
# Get the data from ZK
node_log = zkhandler.read(('logs.messages', node))
if node_log is None:
return True, ''
# Shrink the log buffer to length lines
shrunk_log = node_log.split('\n')[-lines:]
loglines = '\n'.join(shrunk_log)
return True, loglines
def get_info(zkhandler, node):
# Verify node is valid
if not common.verifyNode(zkhandler, node):
@ -195,7 +215,7 @@ def get_info(zkhandler, node):
def get_list(zkhandler, limit, daemon_state=None, coordinator_state=None, domain_state=None, is_fuzzy=True):
node_list = []
full_node_list = zkhandler.children('/nodes')
full_node_list = zkhandler.children('base.node')
for node in full_node_list:
if limit:

View File

@ -24,9 +24,14 @@ import re
import lxml.objectify
import lxml.etree
from distutils.util import strtobool
from uuid import UUID
from concurrent.futures import ThreadPoolExecutor
import daemon_lib.common as common
import daemon_lib.ceph as ceph
from daemon_lib.network import set_sriov_vf_vm, unset_sriov_vf_vm
#
@ -34,11 +39,11 @@ import daemon_lib.ceph as ceph
#
def getClusterDomainList(zkhandler):
# Get a list of UUIDs by listing the children of /domains
uuid_list = zkhandler.children('/domains')
uuid_list = zkhandler.children('base.domain')
name_list = []
# For each UUID, get the corresponding name from the data
for uuid in uuid_list:
name_list.append(zkhandler.read('/domains/{}'.format(uuid)))
name_list.append(zkhandler.read(('domain', uuid)))
return uuid_list, name_list
@ -100,10 +105,10 @@ def getDomainName(zkhandler, domain):
# Helper functions
#
def change_state(zkhandler, dom_uuid, new_state):
lock = zkhandler.exclusivelock('/domains/{}/state'.format(dom_uuid))
lock = zkhandler.exclusivelock(('domain.state', dom_uuid))
with lock:
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), new_state)
(('domain.state', dom_uuid), new_state)
])
# Wait for 1/2 second to allow state to flow to all nodes
@ -119,7 +124,7 @@ def is_migrated(zkhandler, domain):
if not dom_uuid:
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
last_node = zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
last_node = zkhandler.read(('domain.last_node', dom_uuid))
if last_node:
return True
else:
@ -133,22 +138,22 @@ def flush_locks(zkhandler, domain):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Verify that the VM is in a stopped state; freeing locks is not safe otherwise
state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
state = zkhandler.read(('domain.state', dom_uuid))
if state != 'stop':
return False, 'ERROR: VM "{}" is not in stopped state; flushing RBD locks on a running VM is dangerous.'.format(domain)
# Tell the cluster to create a new OSD for the host
flush_locks_string = 'flush_locks {}'.format(dom_uuid)
zkhandler.write([
('/cmd/domains', flush_locks_string)
('base.cmd.domain', flush_locks_string)
])
# Wait 1/2 second for the cluster to get the message and start working
time.sleep(0.5)
# Acquire a read lock, so we get the return exclusively
lock = zkhandler.readlock('/cmd/domains')
lock = zkhandler.readlock('base.cmd.domain')
with lock:
try:
result = zkhandler.read('/cmd/domains').split()[0]
result = zkhandler.read('base.cmd.domain').split()[0]
if result == 'success-flush_locks':
message = 'Flushed locks on VM "{}"'.format(domain)
success = True
@ -160,17 +165,17 @@ def flush_locks(zkhandler, domain):
success = False
# Acquire a write lock to ensure things go smoothly
lock = zkhandler.writelock('/cmd/domains')
lock = zkhandler.writelock('base.cmd.domain')
with lock:
time.sleep(0.5)
zkhandler.write([
('/cmd/domains', '')
('base.cmd.domain', '')
])
return success, message
def define_vm(zkhandler, config_data, target_node, node_limit, node_selector, node_autostart, migration_method=None, profile=None, initial_state='stop'):
def define_vm(zkhandler, config_data, target_node, node_limit, node_selector, node_autostart, migration_method=None, profile=None, tags=[], initial_state='stop'):
# Parse the XML data
try:
parsed_xml = lxml.objectify.fromstring(config_data)
@ -191,6 +196,21 @@ def define_vm(zkhandler, config_data, target_node, node_limit, node_selector, no
if not valid_node:
return False, 'ERROR: Specified node "{}" is invalid.'.format(target_node)
# If a SR-IOV network device is being added, set its used state
dnetworks = common.getDomainNetworks(parsed_xml, {})
for network in dnetworks:
if network['type'] in ['direct', 'hostdev']:
dom_node = zkhandler.read(('domain.node', dom_uuid))
# Check if the network is already in use
is_used = zkhandler.read(('node.sriov.vf', dom_node, 'sriov_vf.used', network['source']))
if is_used == 'True':
used_by_name = searchClusterByUUID(zkhandler, zkhandler.read(('node.sriov.vf', dom_node, 'sriov_vf.used_by', network['source'])))
return False, 'ERROR: Attempted to use SR-IOV network "{}" which is already used by VM "{}" on node "{}".'.format(network['source'], used_by_name, dom_node)
# We must update the "used" section
set_sriov_vf_vm(zkhandler, dom_uuid, dom_node, network['source'], network['mac'], network['type'])
# Obtain the RBD disk list using the common functions
ddisks = common.getDomainDisks(parsed_xml, {})
rbd_list = []
@ -212,22 +232,33 @@ def define_vm(zkhandler, config_data, target_node, node_limit, node_selector, no
# Add the new domain to Zookeeper
zkhandler.write([
('/domains/{}'.format(dom_uuid), dom_name),
('/domains/{}/state'.format(dom_uuid), initial_state),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), ''),
('/domains/{}/node_limit'.format(dom_uuid), formatted_node_limit),
('/domains/{}/node_selector'.format(dom_uuid), node_selector),
('/domains/{}/node_autostart'.format(dom_uuid), node_autostart),
('/domains/{}/migration_method'.format(dom_uuid), migration_method),
('/domains/{}/failedreason'.format(dom_uuid), ''),
('/domains/{}/consolelog'.format(dom_uuid), ''),
('/domains/{}/rbdlist'.format(dom_uuid), formatted_rbd_list),
('/domains/{}/profile'.format(dom_uuid), profile),
('/domains/{}/vnc'.format(dom_uuid), ''),
('/domains/{}/xml'.format(dom_uuid), config_data)
(('domain', dom_uuid), dom_name),
(('domain.xml', dom_uuid), config_data),
(('domain.state', dom_uuid), initial_state),
(('domain.profile', dom_uuid), profile),
(('domain.stats', dom_uuid), ''),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), ''),
(('domain.failed_reason', dom_uuid), ''),
(('domain.storage.volumes', dom_uuid), formatted_rbd_list),
(('domain.console.log', dom_uuid), ''),
(('domain.console.vnc', dom_uuid), ''),
(('domain.meta.autostart', dom_uuid), node_autostart),
(('domain.meta.migrate_method', dom_uuid), migration_method),
(('domain.meta.node_limit', dom_uuid), formatted_node_limit),
(('domain.meta.node_selector', dom_uuid), node_selector),
(('domain.meta.tags', dom_uuid), ''),
(('domain.migrate.sync_lock', dom_uuid), ''),
])
for tag in tags:
tag_name = tag['name']
zkhandler.write([
(('domain.meta.tags', dom_uuid, 'tag.name', tag_name), tag['name']),
(('domain.meta.tags', dom_uuid, 'tag.type', tag_name), tag['type']),
(('domain.meta.tags', dom_uuid, 'tag.protected', tag_name), tag['protected']),
])
return True, 'Added new VM with Name "{}" and UUID "{}" to database.'.format(dom_name, dom_uuid)
@ -236,34 +267,63 @@ def modify_vm_metadata(zkhandler, domain, node_limit, node_selector, node_autost
if not dom_uuid:
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
update_list = list()
if node_limit is not None:
zkhandler.write([
('/domains/{}/node_limit'.format(dom_uuid), node_limit)
])
update_list.append((('domain.meta.node_limit', dom_uuid), node_limit))
if node_selector is not None:
zkhandler.write([
('/domains/{}/node_selector'.format(dom_uuid), node_selector)
])
update_list.append((('domain.meta.node_selector', dom_uuid), node_selector))
if node_autostart is not None:
zkhandler.write([
('/domains/{}/node_autostart'.format(dom_uuid), node_autostart)
])
update_list.append((('domain.meta.autostart', dom_uuid), node_autostart))
if provisioner_profile is not None:
zkhandler.write([
('/domains/{}/profile'.format(dom_uuid), provisioner_profile)
])
update_list.append((('domain.profile', dom_uuid), provisioner_profile))
if migration_method is not None:
zkhandler.write([
('/domains/{}/migration_method'.format(dom_uuid), migration_method)
])
update_list.append((('domain.meta.migrate_method', dom_uuid), migration_method))
if len(update_list) < 1:
return False, 'ERROR: No updates to apply.'
zkhandler.write(update_list)
return True, 'Successfully modified PVC metadata of VM "{}".'.format(domain)
def modify_vm_tag(zkhandler, domain, action, tag, protected=False):
dom_uuid = getDomainUUID(zkhandler, domain)
if not dom_uuid:
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
if action == 'add':
zkhandler.write([
(('domain.meta.tags', dom_uuid, 'tag.name', tag), tag),
(('domain.meta.tags', dom_uuid, 'tag.type', tag), 'user'),
(('domain.meta.tags', dom_uuid, 'tag.protected', tag), protected),
])
return True, 'Successfully added tag "{}" to VM "{}".'.format(tag, domain)
elif action == 'remove':
if not zkhandler.exists(('domain.meta.tags', dom_uuid, 'tag', tag)):
return False, 'The tag "{}" does not exist.'.format(tag)
if zkhandler.read(('domain.meta.tags', dom_uuid, 'tag.type', tag)) != 'user':
return False, 'The tag "{}" is not a user tag and cannot be removed.'.format(tag)
if bool(strtobool(zkhandler.read(('domain.meta.tags', dom_uuid, 'tag.protected', tag)))):
return False, 'The tag "{}" is protected and cannot be removed.'.format(tag)
zkhandler.delete([
(('domain.meta.tags', dom_uuid, 'tag', tag))
])
return True, 'Successfully removed tag "{}" from VM "{}".'.format(tag, domain)
else:
return False, 'Specified tag action is not available.'
def modify_vm(zkhandler, domain, restart, new_vm_config):
dom_uuid = getDomainUUID(zkhandler, domain)
if not dom_uuid:
@ -274,7 +334,39 @@ def modify_vm(zkhandler, domain, restart, new_vm_config):
try:
parsed_xml = lxml.objectify.fromstring(new_vm_config)
except Exception:
return False, 'ERROR: Failed to parse XML data.'
return False, 'ERROR: Failed to parse new XML data.'
# Get our old network list for comparison purposes
old_vm_config = zkhandler.read(('domain.xml', dom_uuid))
old_parsed_xml = lxml.objectify.fromstring(old_vm_config)
old_dnetworks = common.getDomainNetworks(old_parsed_xml, {})
# If a SR-IOV network device is being added, set its used state
dnetworks = common.getDomainNetworks(parsed_xml, {})
for network in dnetworks:
# Ignore networks that are already there
if network['source'] in [net['source'] for net in old_dnetworks]:
continue
if network['type'] in ['direct', 'hostdev']:
dom_node = zkhandler.read(('domain.node', dom_uuid))
# Check if the network is already in use
is_used = zkhandler.read(('node.sriov.vf', dom_node, 'sriov_vf.used', network['source']))
if is_used == 'True':
used_by_name = searchClusterByUUID(zkhandler, zkhandler.read(('node.sriov.vf', dom_node, 'sriov_vf.used_by', network['source'])))
return False, 'ERROR: Attempted to use SR-IOV network "{}" which is already used by VM "{}" on node "{}".'.format(network['source'], used_by_name, dom_node)
# We must update the "used" section
set_sriov_vf_vm(zkhandler, dom_uuid, dom_node, network['source'], network['mac'], network['type'])
# If a SR-IOV network device is being removed, unset its used state
for network in old_dnetworks:
if network['type'] in ['direct', 'hostdev']:
if network['mac'] not in [n['mac'] for n in dnetworks]:
dom_node = zkhandler.read(('domain.node', dom_uuid))
# We must update the "used" section
unset_sriov_vf_vm(zkhandler, dom_node, network['source'])
# Obtain the RBD disk list using the common functions
ddisks = common.getDomainDisks(parsed_xml, {})
@ -291,9 +383,9 @@ def modify_vm(zkhandler, domain, restart, new_vm_config):
# Add the modified config to Zookeeper
zkhandler.write([
('/domains/{}'.format(dom_uuid), dom_name),
('/domains/{}/rbdlist'.format(dom_uuid), formatted_rbd_list),
('/domains/{}/xml'.format(dom_uuid), new_vm_config)
(('domain', dom_uuid), dom_name),
(('domain.storage.volumes', dom_uuid), formatted_rbd_list),
(('domain.xml', dom_uuid), new_vm_config),
])
if restart:
@ -308,7 +400,7 @@ def dump_vm(zkhandler, domain):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Gram the domain XML and dump it to stdout
vm_xml = zkhandler.read('/domains/{}/xml'.format(dom_uuid))
vm_xml = zkhandler.read(('domain.xml', dom_uuid))
return True, vm_xml
@ -319,7 +411,7 @@ def rename_vm(zkhandler, domain, new_domain):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Verify that the VM is in a stopped state; renaming is not supported otherwise
state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
state = zkhandler.read(('domain.state', dom_uuid))
if state != 'stop':
return False, 'ERROR: VM "{}" is not in stopped state; VMs cannot be renamed while running.'.format(domain)
@ -353,12 +445,12 @@ def rename_vm(zkhandler, domain, new_domain):
undefine_vm(zkhandler, dom_uuid)
# Define the new VM
define_vm(zkhandler, vm_config_new, dom_info['node'], dom_info['node_limit'], dom_info['node_selector'], dom_info['node_autostart'], migration_method=dom_info['migration_method'], profile=dom_info['profile'], initial_state='stop')
define_vm(zkhandler, vm_config_new, dom_info['node'], dom_info['node_limit'], dom_info['node_selector'], dom_info['node_autostart'], migration_method=dom_info['migration_method'], profile=dom_info['profile'], tags=dom_info['tags'], initial_state='stop')
# If the VM is migrated, store that
if dom_info['migrated'] != 'no':
zkhandler.write([
('/domains/{}/lastnode'.format(dom_uuid), dom_info['last_node'])
(('domain.last_node', dom_uuid), dom_info['last_node'])
])
return True, 'Successfully renamed VM "{}" to "{}".'.format(domain, new_domain)
@ -371,7 +463,7 @@ def undefine_vm(zkhandler, domain):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Shut down the VM
current_vm_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_vm_state = zkhandler.read(('domain.state', dom_uuid))
if current_vm_state != 'stop':
change_state(zkhandler, dom_uuid, 'stop')
@ -379,7 +471,9 @@ def undefine_vm(zkhandler, domain):
change_state(zkhandler, dom_uuid, 'delete')
# Delete the configurations
zkhandler.delete('/domains/{}'.format(dom_uuid))
zkhandler.delete([
('domain', dom_uuid)
])
return True, 'Undefined VM "{}" from the cluster.'.format(domain)
@ -393,16 +487,10 @@ def remove_vm(zkhandler, domain):
disk_list = common.getDomainDiskList(zkhandler, dom_uuid)
# Shut down the VM
current_vm_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_vm_state = zkhandler.read(('domain.state', dom_uuid))
if current_vm_state != 'stop':
change_state(zkhandler, dom_uuid, 'stop')
# Gracefully terminate the class instances
change_state(zkhandler, dom_uuid, 'delete')
# Delete the configurations
zkhandler.delete('/domains/{}'.format(dom_uuid))
# Wait for 1 second to allow state to flow to all nodes
time.sleep(1)
@ -411,11 +499,28 @@ def remove_vm(zkhandler, domain):
# vmpool/vmname_volume
try:
disk_pool, disk_name = disk.split('/')
retcode, message = ceph.remove_volume(zkhandler, disk_pool, disk_name)
except ValueError:
continue
return True, 'Removed VM "{}" and disks from the cluster.'.format(domain)
retcode, message = ceph.remove_volume(zkhandler, disk_pool, disk_name)
if not retcode:
if re.match('^ERROR: No volume with name', message):
continue
else:
return False, message
# Gracefully terminate the class instances
change_state(zkhandler, dom_uuid, 'delete')
# Wait for 1/2 second to allow state to flow to all nodes
time.sleep(0.5)
# Delete the VM configuration from Zookeeper
zkhandler.delete([
('domain', dom_uuid)
])
return True, 'Removed VM "{}" and its disks from the cluster.'.format(domain)
def start_vm(zkhandler, domain):
@ -437,7 +542,7 @@ def restart_vm(zkhandler, domain, wait=False):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'start':
return False, 'ERROR: VM "{}" is not in "start" state!'.format(domain)
@ -447,8 +552,8 @@ def restart_vm(zkhandler, domain, wait=False):
change_state(zkhandler, dom_uuid, 'restart')
if wait:
while zkhandler.read('/domains/{}/state'.format(dom_uuid)) == 'restart':
time.sleep(1)
while zkhandler.read(('domain.state', dom_uuid)) == 'restart':
time.sleep(0.5)
retmsg = 'Restarted VM "{}"'.format(domain)
return True, retmsg
@ -461,7 +566,7 @@ def shutdown_vm(zkhandler, domain, wait=False):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'start':
return False, 'ERROR: VM "{}" is not in "start" state!'.format(domain)
@ -471,8 +576,8 @@ def shutdown_vm(zkhandler, domain, wait=False):
change_state(zkhandler, dom_uuid, 'shutdown')
if wait:
while zkhandler.read('/domains/{}/state'.format(dom_uuid)) == 'shutdown':
time.sleep(1)
while zkhandler.read(('domain.state', dom_uuid)) == 'shutdown':
time.sleep(0.5)
retmsg = 'Shut down VM "{}"'.format(domain)
return True, retmsg
@ -497,7 +602,7 @@ def disable_vm(zkhandler, domain):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'stop':
return False, 'ERROR: VM "{}" must be stopped before disabling!'.format(domain)
@ -507,6 +612,38 @@ def disable_vm(zkhandler, domain):
return True, 'Marked VM "{}" as disable.'.format(domain)
def update_vm_sriov_nics(zkhandler, dom_uuid, source_node, target_node):
# Update all the SR-IOV device states on both nodes, used during migrations but called by the node-side
vm_config = zkhandler.read(('domain.xml', dom_uuid))
parsed_xml = lxml.objectify.fromstring(vm_config)
dnetworks = common.getDomainNetworks(parsed_xml, {})
retcode = True
retmsg = ''
for network in dnetworks:
if network['type'] in ['direct', 'hostdev']:
# Check if the network is already in use
is_used = zkhandler.read(('node.sriov.vf', target_node, 'sriov_vf.used', network['source']))
if is_used == 'True':
used_by_name = searchClusterByUUID(zkhandler, zkhandler.read(('node.sriov.vf', target_node, 'sriov_vf.used_by', network['source'])))
if retcode:
retcode_this = False
retmsg = 'Attempting to use SR-IOV network "{}" which is already used by VM "{}"'.format(network['source'], used_by_name)
else:
retcode_this = True
# We must update the "used" section
if retcode_this:
# This conditional ensure that if we failed the is_used check, we don't try to overwrite the information of a VF that belongs to another VM
set_sriov_vf_vm(zkhandler, dom_uuid, target_node, network['source'], network['mac'], network['type'])
# ... but we still want to free the old node in an case
unset_sriov_vf_vm(zkhandler, source_node, network['source'])
if not retcode_this:
retcode = retcode_this
return retcode, retmsg
def move_vm(zkhandler, domain, target_node, wait=False, force_live=False):
# Validate that VM exists in cluster
dom_uuid = getDomainUUID(zkhandler, domain)
@ -514,7 +651,7 @@ def move_vm(zkhandler, domain, target_node, wait=False, force_live=False):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'start':
# If the current state isn't start, preserve it; we're not doing live migration
target_state = current_state
@ -524,7 +661,7 @@ def move_vm(zkhandler, domain, target_node, wait=False, force_live=False):
else:
target_state = 'migrate'
current_node = zkhandler.read('/domains/{}/node'.format(dom_uuid))
current_node = zkhandler.read(('domain.node', dom_uuid))
if not target_node:
target_node = common.findTargetNode(zkhandler, dom_uuid)
@ -535,16 +672,16 @@ def move_vm(zkhandler, domain, target_node, wait=False, force_live=False):
return False, 'ERROR: Specified node "{}" is invalid.'.format(target_node)
# Check if node is within the limit
node_limit = zkhandler.read('/domains/{}/node_limit'.format(dom_uuid))
node_limit = zkhandler.read(('domain.meta.node_limit', dom_uuid))
if node_limit and target_node not in node_limit.split(','):
return False, 'ERROR: Specified node "{}" is not in the allowed list of nodes for VM "{}".'.format(target_node, domain)
# Verify if node is current node
if target_node == current_node:
last_node = zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
last_node = zkhandler.read(('domain.last_node', dom_uuid))
if last_node:
zkhandler.write([
('/domains/{}/lastnode'.format(dom_uuid), '')
(('domain.last_node', dom_uuid), '')
])
return True, 'Making temporary migration permanent for VM "{}".'.format(domain)
@ -555,20 +692,23 @@ def move_vm(zkhandler, domain, target_node, wait=False, force_live=False):
retmsg = 'Permanently migrating VM "{}" to node "{}".'.format(domain, target_node)
lock = zkhandler.exclusivelock('/domains/{}/state'.format(dom_uuid))
lock = zkhandler.exclusivelock(('domain.state', dom_uuid))
with lock:
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), target_state),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), '')
(('domain.state', dom_uuid), target_state),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), '')
])
# Wait for 1/2 second for migration to start
time.sleep(0.5)
# Update any SR-IOV NICs
update_vm_sriov_nics(zkhandler, dom_uuid, current_node, target_node)
if wait:
while zkhandler.read('/domains/{}/state'.format(dom_uuid)) == target_state:
time.sleep(1)
while zkhandler.read(('domain.state', dom_uuid)) == target_state:
time.sleep(0.5)
retmsg = 'Permanently migrated VM "{}" to node "{}"'.format(domain, target_node)
return True, retmsg
@ -581,7 +721,7 @@ def migrate_vm(zkhandler, domain, target_node, force_migrate, wait=False, force_
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'start':
# If the current state isn't start, preserve it; we're not doing live migration
target_state = current_state
@ -591,8 +731,8 @@ def migrate_vm(zkhandler, domain, target_node, force_migrate, wait=False, force_
else:
target_state = 'migrate'
current_node = zkhandler.read('/domains/{}/node'.format(dom_uuid))
last_node = zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
current_node = zkhandler.read(('domain.node', dom_uuid))
last_node = zkhandler.read(('domain.last_node', dom_uuid))
if last_node and not force_migrate:
return False, 'ERROR: VM "{}" has been previously migrated.'.format(domain)
@ -606,7 +746,7 @@ def migrate_vm(zkhandler, domain, target_node, force_migrate, wait=False, force_
return False, 'ERROR: Specified node "{}" is invalid.'.format(target_node)
# Check if node is within the limit
node_limit = zkhandler.read('/domains/{}/node_limit'.format(dom_uuid))
node_limit = zkhandler.read(('domain.meta.node_limit', dom_uuid))
if node_limit and target_node not in node_limit.split(','):
return False, 'ERROR: Specified node "{}" is not in the allowed list of nodes for VM "{}".'.format(target_node, domain)
@ -618,25 +758,29 @@ def migrate_vm(zkhandler, domain, target_node, force_migrate, wait=False, force_
return False, 'ERROR: Could not find a valid migration target for VM "{}".'.format(domain)
# Don't overwrite an existing last_node when using force_migrate
real_current_node = current_node # Used for the SR-IOV update
if last_node and force_migrate:
current_node = last_node
retmsg = 'Migrating VM "{}" to node "{}".'.format(domain, target_node)
lock = zkhandler.exclusivelock('/domains/{}/state'.format(dom_uuid))
lock = zkhandler.exclusivelock(('domain.state', dom_uuid))
with lock:
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), target_state),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), current_node)
(('domain.state', dom_uuid), target_state),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), current_node)
])
# Wait for 1/2 second for migration to start
time.sleep(0.5)
# Update any SR-IOV NICs
update_vm_sriov_nics(zkhandler, dom_uuid, real_current_node, target_node)
if wait:
while zkhandler.read('/domains/{}/state'.format(dom_uuid)) == target_state:
time.sleep(1)
while zkhandler.read(('domain.state', dom_uuid)) == target_state:
time.sleep(0.5)
retmsg = 'Migrated VM "{}" to node "{}"'.format(domain, target_node)
return True, retmsg
@ -649,7 +793,7 @@ def unmigrate_vm(zkhandler, domain, wait=False, force_live=False):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get state and verify we're OK to proceed
current_state = zkhandler.read('/domains/{}/state'.format(dom_uuid))
current_state = zkhandler.read(('domain.state', dom_uuid))
if current_state != 'start':
# If the current state isn't start, preserve it; we're not doing live migration
target_state = current_state
@ -659,27 +803,31 @@ def unmigrate_vm(zkhandler, domain, wait=False, force_live=False):
else:
target_state = 'migrate'
target_node = zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
current_node = zkhandler.read(('domain.node', dom_uuid))
target_node = zkhandler.read(('domain.last_node', dom_uuid))
if target_node == '':
return False, 'ERROR: VM "{}" has not been previously migrated.'.format(domain)
retmsg = 'Unmigrating VM "{}" back to node "{}".'.format(domain, target_node)
lock = zkhandler.exclusivelock('/domains/{}/state'.format(dom_uuid))
lock = zkhandler.exclusivelock(('domain.state', dom_uuid))
with lock:
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), target_state),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), '')
(('domain.state', dom_uuid), target_state),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), '')
])
# Wait for 1/2 second for migration to start
time.sleep(0.5)
# Update any SR-IOV NICs
update_vm_sriov_nics(zkhandler, dom_uuid, current_node, target_node)
if wait:
while zkhandler.read('/domains/{}/state'.format(dom_uuid)) == target_state:
time.sleep(1)
while zkhandler.read(('domain.state', dom_uuid)) == target_state:
time.sleep(0.5)
retmsg = 'Unmigrated VM "{}" back to node "{}"'.format(domain, target_node)
return True, retmsg
@ -692,7 +840,10 @@ def get_console_log(zkhandler, domain, lines=1000):
return False, 'ERROR: Could not find VM "{}" in the cluster!'.format(domain)
# Get the data from ZK
console_log = zkhandler.read('/domains/{}/consolelog'.format(dom_uuid))
console_log = zkhandler.read(('domain.console.log', dom_uuid))
if console_log is None:
return True, ''
# Shrink the log buffer to length lines
shrunk_log = console_log.split('\n')[-lines:]
@ -715,7 +866,7 @@ def get_info(zkhandler, domain):
return True, domain_information
def get_list(zkhandler, node, state, limit, is_fuzzy=True):
def get_list(zkhandler, node, state, tag, limit, is_fuzzy=True):
if node:
# Verify node is valid
if not common.verifyNode(zkhandler, node):
@ -726,52 +877,91 @@ def get_list(zkhandler, node, state, limit, is_fuzzy=True):
if state not in valid_states:
return False, 'VM state "{}" is not valid.'.format(state)
full_vm_list = zkhandler.children('/domains')
vm_list = []
full_vm_list = zkhandler.children('base.domain')
# Set our limit to a sensible regex
if limit and is_fuzzy:
if limit:
# Check if the limit is a UUID
is_limit_uuid = False
try:
# Implcitly assume fuzzy limits
if not re.match(r'\^.*', limit):
limit = '.*' + limit
if not re.match(r'.*\$', limit):
limit = limit + '.*'
except Exception as e:
return False, 'Regex Error: {}'.format(e)
uuid_obj = UUID(limit, version=4)
limit = str(uuid_obj)
is_limit_uuid = True
except ValueError:
pass
# If we're limited, remove other nodes' VMs
vm_node = {}
vm_state = {}
for vm in full_vm_list:
# Check we don't match the limit
name = zkhandler.read('/domains/{}'.format(vm))
vm_node[vm] = zkhandler.read('/domains/{}/node'.format(vm))
vm_state[vm] = zkhandler.read('/domains/{}/state'.format(vm))
# Handle limiting
if limit:
if is_fuzzy and not is_limit_uuid:
try:
if re.match(limit, vm):
if not node and not state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
else:
if vm_node[vm] == node or vm_state[vm] == state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
# Implcitly assume fuzzy limits
if not re.match(r'\^.*', limit):
limit = '.*' + limit
if not re.match(r'.*\$', limit):
limit = limit + '.*'
except Exception as e:
return False, 'Regex Error: {}'.format(e)
get_vm_info = dict()
for vm in full_vm_list:
name = zkhandler.read(('domain', vm))
is_limit_match = False
is_tag_match = False
is_node_match = False
is_state_match = False
# Check on limit
if limit:
# Try to match the limit against the UUID (if applicable) and name
try:
if is_limit_uuid and re.match(limit, vm):
is_limit_match = True
if re.match(limit, name):
if not node and not state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
else:
if vm_node[vm] == node or vm_state[vm] == state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
is_limit_match = True
except Exception as e:
return False, 'Regex Error: {}'.format(e)
else:
# Check node to avoid unneeded ZK calls
if not node and not state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
else:
if vm_node[vm] == node or vm_state[vm] == state:
vm_list.append(common.getInformationFromXML(zkhandler, vm))
is_limit_match = True
return True, vm_list
if tag:
vm_tags = zkhandler.children(('domain.meta.tags', vm))
if tag in vm_tags:
is_tag_match = True
else:
is_tag_match = True
# Check on node
if node:
vm_node = zkhandler.read(('domain.node', vm))
if vm_node == node:
is_node_match = True
else:
is_node_match = True
# Check on state
if state:
vm_state = zkhandler.read(('domain.state', vm))
if vm_state == state:
is_state_match = True
else:
is_state_match = True
get_vm_info[vm] = True if is_limit_match and is_tag_match and is_node_match and is_state_match else False
# Obtain our VM data in a thread pool
# This helps parallelize the numerous Zookeeper calls a bit, within the bounds of the GIL, and
# should help prevent this task from becoming absurdly slow with very large numbers of VMs.
# The max_workers is capped at 32 to avoid creating an absurd number of threads especially if
# the list gets called multiple times simultaneously by the API, but still provides a noticeable
# speedup.
vm_execute_list = [vm for vm in full_vm_list if get_vm_info[vm]]
vm_data_list = list()
with ThreadPoolExecutor(max_workers=32, thread_name_prefix='vm_list') as executor:
futures = []
for vm_uuid in vm_execute_list:
futures.append(executor.submit(common.getInformationFromXML, zkhandler, vm_uuid))
for future in futures:
try:
vm_data_list.append(future.result())
except Exception:
pass
return True, vm_data_list

View File

@ -19,10 +19,14 @@
#
###############################################################################
import os
import time
import uuid
import json
import re
from functools import wraps
from kazoo.client import KazooClient, KazooState
from kazoo.exceptions import NoNodeError
#
@ -46,6 +50,10 @@ class ZKConnection(object):
def connection(*args, **kwargs):
zkhandler = ZKHandler(self.config)
zkhandler.connect()
schema_version = zkhandler.read('base.schema.version')
if schema_version is None:
schema_version = 0
zkhandler.schema.load(schema_version, quiet=True)
ret = function(zkhandler, *args, **kwargs)
@ -84,11 +92,14 @@ class ZKHandler(object):
Initialize an instance of the ZKHandler class with config
A zk_conn object will be created but not started
A ZKSchema instance will be created
"""
self.encoding = 'utf8'
self.coordinators = config['coordinators']
self.logger = logger
self.zk_conn = KazooClient(hosts=self.coordinators)
self._schema = ZKSchema()
#
# Class meta-functions
@ -102,28 +113,26 @@ class ZKHandler(object):
else:
print(message)
#
# Properties
#
@property
def schema(self):
return self._schema
#
# State/connection management
#
def listener(self, state):
"""
Listen for KazooState changes and log accordingly.
This function does not do anything except for log the state, and Kazoo handles the rest.
"""
if state == KazooState.CONNECTED:
self.log('Connection to Zookeeper started', state='o')
self.log('Connection to Zookeeper resumed', state='o')
else:
self.log('Connection to Zookeeper lost', state='w')
while True:
time.sleep(0.5)
_zk_conn = KazooClient(hosts=self.coordinators)
try:
_zk_conn.start()
except Exception:
del _zk_conn
continue
self.zk_conn = _zk_conn
self.zk_conn.add_listener(self.listener)
break
self.log('Connection to Zookeeper lost with state {}'.format(state), state='w')
def connect(self, persistent=False):
"""
@ -132,11 +141,12 @@ class ZKHandler(object):
try:
self.zk_conn.start()
if persistent:
self.log('Connection to Zookeeper started', state='o')
self.zk_conn.add_listener(self.listener)
except Exception as e:
raise ZKConnectionException(self, e)
def disconnect(self):
def disconnect(self, persistent=False):
"""
Stop and close the zk_conn object and disconnect from the cluster
@ -144,6 +154,52 @@ class ZKHandler(object):
"""
self.zk_conn.stop()
self.zk_conn.close()
if persistent:
self.log('Connection to Zookeeper terminated', state='o')
#
# Schema helper actions
#
def get_schema_path(self, key):
"""
Get the Zookeeper path for {key} from the current schema based on its format.
If {key} is a tuple of length 2, it's treated as a path plus an item instance of that path (e.g. a node, a VM, etc.).
If {key} is a tuple of length 4, it is treated as a path plus an item instance, as well as another item instance of the subpath.
If {key} is just a string, it's treated as a lone path (mostly used for the 'base' schema group.
Otherwise, returns None since this is not a valid key.
This function also handles the special case where a string that looks like an existing path (i.e. starts with '/') is passed;
in that case it will silently return the same path back. This was mostly a migration functionality and is deprecated.
"""
if isinstance(key, tuple):
# This is a key tuple with both an ipath and an item
if len(key) == 2:
# 2-length normal tuple
ipath, item = key
elif len(key) == 4:
# 4-length sub-level tuple
ipath, item, sub_ipath, sub_item = key
return self.schema.path(ipath, item=item) + self.schema.path(sub_ipath, item=sub_item)
else:
# This is an invalid key
return None
elif isinstance(key, str):
# This is a key string with just an ipath
ipath = key
item = None
# This is a raw key path, used by backup/restore functionality
if re.match(r'^/', ipath):
return ipath
else:
# This is an invalid key
return None
return self.schema.path(ipath, item=item)
#
# Key Actions
@ -152,7 +208,12 @@ class ZKHandler(object):
"""
Check if a key exists
"""
stat = self.zk_conn.exists(key)
path = self.get_schema_path(key)
if path is None:
# This path is invalid, this is likely due to missing schema entries, so return False
return False
stat = self.zk_conn.exists(path)
if stat:
return True
else:
@ -162,7 +223,15 @@ class ZKHandler(object):
"""
Read data from a key
"""
return self.zk_conn.get(key)[0].decode(self.encoding)
try:
path = self.get_schema_path(key)
if path is None:
# This path is invalid; this is likely due to missing schema entries, so return None
return None
return self.zk_conn.get(path)[0].decode(self.encoding)
except NoNodeError:
return None
def write(self, kvpairs):
"""
@ -182,26 +251,31 @@ class ZKHandler(object):
key = kvpair[0]
value = kvpair[1]
path = self.get_schema_path(key)
if path is None:
# This path is invalid; this is likely due to missing schema entries, so continue
continue
if not self.exists(key):
# Creating a new key
transaction.create(key, str(value).encode(self.encoding))
transaction.create(path, str(value).encode(self.encoding))
else:
# Updating an existing key
data = self.zk_conn.get(key)
data = self.zk_conn.get(path)
version = data[1].version
# Validate the expected version after the execution
new_version = version + 1
# Update the data
transaction.set_data(key, str(value).encode(self.encoding))
transaction.set_data(path, str(value).encode(self.encoding))
# Check the data
try:
transaction.check(key, new_version)
transaction.check(path, new_version)
except TypeError:
self.log("ZKHandler error: Key '{}' does not match expected version".format(key), state='e')
self.log("ZKHandler error: Key '{}' does not match expected version".format(path), state='e')
return False
try:
@ -221,9 +295,10 @@ class ZKHandler(object):
for key in keys:
if self.exists(key):
try:
self.zk_conn.delete(key, recursive=recursive)
path = self.get_schema_path(key)
self.zk_conn.delete(path, recursive=recursive)
except Exception as e:
self.log("ZKHandler error: Failed to delete key {}: {}".format(key, e), state='e')
self.log("ZKHandler error: Failed to delete key {}: {}".format(path, e), state='e')
return False
return True
@ -232,7 +307,15 @@ class ZKHandler(object):
"""
Lists all children of a key
"""
return self.zk_conn.get_children(key)
try:
path = self.get_schema_path(key)
if path is None:
# This path is invalid; this is likely due to missing schema entries, so return None
return None
return self.zk_conn.get_children(path)
except NoNodeError:
return None
def rename(self, kkpairs):
"""
@ -244,17 +327,17 @@ class ZKHandler(object):
transaction = self.zk_conn.transaction()
def rename_element(transaction, source_key, destnation_key):
data = self.zk_conn.get(source_key)[0]
transaction.create(destination_key, data)
def rename_element(transaction, source_path, destination_path):
data = self.zk_conn.get(source_path)[0]
transaction.create(destination_path, data)
if self.children(source_key):
for child_key in self.children(source_key):
child_source_key = "{}/{}".format(source_key, child_key)
child_destination_key = "{}/{}".format(destination_key, child_key)
rename_element(transaction, child_source_key, child_destination_key)
if self.children(source_path):
for child_path in self.children(source_path):
child_source_path = "{}/{}".format(source_path, child_path)
child_destination_path = "{}/{}".format(destination_path, child_path)
rename_element(transaction, child_source_path, child_destination_path)
transaction.delete(source_key, recursive=True)
transaction.delete(source_path)
for kkpair in (kkpairs):
if type(kkpair) is not tuple:
@ -262,16 +345,26 @@ class ZKHandler(object):
return False
source_key = kkpair[0]
source_path = self.get_schema_path(source_key)
if source_path is None:
# This path is invalid; this is likely due to missing schema entries, so continue
continue
destination_key = kkpair[1]
destination_path = self.get_schema_path(destination_key)
if destination_path is None:
# This path is invalid; this is likely due to missing schema entries, so continue
continue
if not self.exists(source_key):
self.log("ZKHander error: Source key '{}' does not exist".format(source_key), state='e')
return False
if self.exists(destination_key):
self.log("ZKHander error: Destination key '{}' already exists".format(destination_key), state='e')
self.log("ZKHander error: Source key '{}' does not exist".format(source_path), state='e')
return False
rename_element(transaction, source_key, destination_key)
if self.exists(destination_key):
self.log("ZKHander error: Destination key '{}' already exists".format(destination_path), state='e')
return False
rename_element(transaction, source_path, destination_path)
try:
transaction.commit()
@ -290,11 +383,16 @@ class ZKHandler(object):
count = 1
lock = None
path = self.get_schema_path(key)
while True:
try:
lock_id = str(uuid.uuid1())
lock = self.zk_conn.ReadLock(key, lock_id)
lock = self.zk_conn.ReadLock(path, lock_id)
break
except NoNodeError:
self.log("ZKHandler warning: Failed to acquire read lock on nonexistent path {}".format(path), state='e')
return None
except Exception as e:
if count > 5:
self.log("ZKHandler warning: Failed to acquire read lock after 5 tries: {}".format(e), state='e')
@ -313,11 +411,16 @@ class ZKHandler(object):
count = 1
lock = None
path = self.get_schema_path(key)
while True:
try:
lock_id = str(uuid.uuid1())
lock = self.zk_conn.WriteLock(key, lock_id)
lock = self.zk_conn.WriteLock(path, lock_id)
break
except NoNodeError:
self.log("ZKHandler warning: Failed to acquire write lock on nonexistent path {}".format(path), state='e')
return None
except Exception as e:
if count > 5:
self.log("ZKHandler warning: Failed to acquire write lock after 5 tries: {}".format(e), state='e')
@ -336,11 +439,16 @@ class ZKHandler(object):
count = 1
lock = None
path = self.get_schema_path(key)
while True:
try:
lock_id = str(uuid.uuid1())
lock = self.zk_conn.Lock(key, lock_id)
lock = self.zk_conn.Lock(path, lock_id)
break
except NoNodeError:
self.log("ZKHandler warning: Failed to acquire exclusive lock on nonexistent path {}".format(path), state='e')
return None
except Exception as e:
if count > 5:
self.log("ZKHandler warning: Failed to acquire exclusive lock after 5 tries: {}".format(e), state='e')
@ -351,3 +459,581 @@ class ZKHandler(object):
continue
return lock
#
# Schema classes
#
class ZKSchema(object):
# Current version
_version = 4
# Root for doing nested keys
_schema_root = ''
# Primary schema definition for the current version
_schema = {
'version': f'{_version}',
'root': f'{_schema_root}',
# Base schema defining core keys; this is all that is initialized on cluster init()
'base': {
'root': f'{_schema_root}',
'schema': f'{_schema_root}/schema',
'schema.version': f'{_schema_root}/schema/version',
'config': f'{_schema_root}/config',
'config.maintenance': f'{_schema_root}/config/maintenance',
'config.primary_node': f'{_schema_root}/config/primary_node',
'config.primary_node.sync_lock': f'{_schema_root}/config/primary_node/sync_lock',
'config.upstream_ip': f'{_schema_root}/config/upstream_ip',
'config.migration_target_selector': f'{_schema_root}/config/migration_target_selector',
'cmd': f'{_schema_root}/cmd',
'cmd.node': f'{_schema_root}/cmd/nodes',
'cmd.domain': f'{_schema_root}/cmd/domains',
'cmd.ceph': f'{_schema_root}/cmd/ceph',
'logs': '/logs',
'node': f'{_schema_root}/nodes',
'domain': f'{_schema_root}/domains',
'network': f'{_schema_root}/networks',
'storage': f'{_schema_root}/ceph',
'storage.util': f'{_schema_root}/ceph/util',
'osd': f'{_schema_root}/ceph/osds',
'pool': f'{_schema_root}/ceph/pools',
'volume': f'{_schema_root}/ceph/volumes',
'snapshot': f'{_schema_root}/ceph/snapshots',
},
# The schema of an individual logs entry (/logs/{node_name})
'logs': {
'node': '', # The root key
'messages': '/messages',
},
# The schema of an individual node entry (/nodes/{node_name})
'node': {
'name': '', # The root key
'keepalive': '/keepalive',
'mode': '/daemonmode',
'data.active_schema': '/activeschema',
'data.latest_schema': '/latestschema',
'data.static': '/staticdata',
'data.pvc_version': '/pvcversion',
'running_domains': '/runningdomains',
'count.provisioned_domains': '/domainscount',
'count.networks': '/networkscount',
'state.daemon': '/daemonstate',
'state.router': '/routerstate',
'state.domain': '/domainstate',
'cpu.load': '/cpuload',
'vcpu.allocated': '/vcpualloc',
'memory.total': '/memtotal',
'memory.used': '/memused',
'memory.free': '/memfree',
'memory.allocated': '/memalloc',
'memory.provisioned': '/memprov',
'ipmi.hostname': '/ipmihostname',
'ipmi.username': '/ipmiusername',
'ipmi.password': '/ipmipassword',
'sriov': '/sriov',
'sriov.pf': '/sriov/pf',
'sriov.vf': '/sriov/vf',
},
# The schema of an individual SR-IOV PF entry (/nodes/{node_name}/sriov/pf/{pf})
'sriov_pf': {
'phy': '', # The root key
'mtu': '/mtu',
'vfcount': '/vfcount'
},
# The schema of an individual SR-IOV VF entry (/nodes/{node_name}/sriov/vf/{vf})
'sriov_vf': {
'phy': '', # The root key
'pf': '/pf',
'mtu': '/mtu',
'mac': '/mac',
'phy_mac': '/phy_mac',
'config': '/config',
'config.vlan_id': '/config/vlan_id',
'config.vlan_qos': '/config/vlan_qos',
'config.tx_rate_min': '/config/tx_rate_min',
'config.tx_rate_max': '/config/tx_rate_max',
'config.spoof_check': '/config/spoof_check',
'config.link_state': '/config/link_state',
'config.trust': '/config/trust',
'config.query_rss': '/config/query_rss',
'pci': '/pci',
'pci.domain': '/pci/domain',
'pci.bus': '/pci/bus',
'pci.slot': '/pci/slot',
'pci.function': '/pci/function',
'used': '/used',
'used_by': '/used_by'
},
# The schema of an individual domain entry (/domains/{domain_uuid})
'domain': {
'name': '', # The root key
'xml': '/xml',
'state': '/state',
'profile': '/profile',
'stats': '/stats',
'node': '/node',
'last_node': '/lastnode',
'failed_reason': '/failedreason',
'storage.volumes': '/rbdlist',
'console.log': '/consolelog',
'console.vnc': '/vnc',
'meta.autostart': '/node_autostart',
'meta.migrate_method': '/migration_method',
'meta.node_selector': '/node_selector',
'meta.node_limit': '/node_limit',
'meta.tags': '/tags',
'migrate.sync_lock': '/migrate_sync_lock'
},
# The schema of an individual domain tag entry (/domains/{domain}/tags/{tag})
'tag': {
'name': '', # The root key
'type': '/type',
'protected': '/protected'
},
# The schema of an individual network entry (/networks/{vni})
'network': {
'vni': '', # The root key
'type': '/nettype',
'rule': '/firewall_rules',
'rule.in': '/firewall_rules/in',
'rule.out': '/firewall_rules/out',
'nameservers': '/name_servers',
'domain': '/domain',
'reservation': '/dhcp4_reservations',
'lease': '/dhcp4_leases',
'ip4.gateway': '/ip4_gateway',
'ip4.network': '/ip4_network',
'ip4.dhcp': '/dhcp4_flag',
'ip4.dhcp_start': '/dhcp4_start',
'ip4.dhcp_end': '/dhcp4_end',
'ip6.gateway': '/ip6_gateway',
'ip6.network': '/ip6_network',
'ip6.dhcp': '/dhcp6_flag'
},
# The schema of an individual network DHCP(v4) reservation entry (/networks/{vni}/dhcp4_reservations/{mac})
'reservation': {
'mac': '', # The root key
'ip': '/ipaddr',
'hostname': '/hostname'
},
# The schema of an individual network DHCP(v4) lease entry (/networks/{vni}/dhcp4_leases/{mac})
'lease': {
'mac': '', # The root key
'ip': '/ipaddr',
'hostname': '/hostname',
'expiry': '/expiry',
'client_id': '/clientid'
},
# The schema for an individual network ACL entry (/networks/{vni}/firewall_rules/(in|out)/{acl}
'rule': {
'description': '', # The root key
'rule': '/rule',
'order': '/order'
},
# The schema of an individual OSD entry (/ceph/osds/{osd_id})
'osd': {
'id': '', # The root key
'node': '/node',
'device': '/device',
'stats': '/stats'
},
# The schema of an individual pool entry (/ceph/pools/{pool_name})
'pool': {
'name': '', # The root key
'pgs': '/pgs',
'stats': '/stats'
},
# The schema of an individual volume entry (/ceph/volumes/{pool_name}/{volume_name})
'volume': {
'name': '', # The root key
'stats': '/stats'
},
# The schema of an individual snapshot entry (/ceph/volumes/{pool_name}/{volume_name}/{snapshot_name})
'snapshot': {
'name': '', # The root key
'stats': '/stats'
}
}
# Properties
@property
def schema_root(self):
return self._schema_root
@schema_root.setter
def schema_root(self, schema_root):
self._schema_root = schema_root
@property
def version(self):
return int(self._version)
@version.setter
def version(self, version):
self._version = int(version)
@property
def schema(self):
return self._schema
@schema.setter
def schema(self, schema):
self._schema = schema
def __init__(self):
pass
def __repr__(self):
return f'ZKSchema({self.version})'
def __lt__(self, other):
if self.version < other.version:
return True
else:
return False
def __le__(self, other):
if self.version <= other.version:
return True
else:
return False
def __gt__(self, other):
if self.version > other.version:
return True
else:
return False
def __ge__(self, other):
if self.version >= other.version:
return True
else:
return False
def __eq__(self, other):
if self.version == other.version:
return True
else:
return False
# Load the schema of a given version from a file
def load(self, version, quiet=False):
if not quiet:
print(f'Loading schema version {version}')
with open(f'daemon_lib/migrations/versions/{version}.json', 'r') as sfh:
self.schema = json.load(sfh)
self.version = self.schema.get('version')
# Get key paths
def path(self, ipath, item=None):
itype, *ipath = ipath.split('.')
if item is None:
return self.schema.get(itype).get('.'.join(ipath))
else:
base_path = self.schema.get('base').get(itype, None)
if base_path is None:
# This should only really happen for second-layer key types where the helper functions join them together
base_path = ''
if not ipath:
# This is a root path
return f'{base_path}/{item}'
sub_path = self.schema.get(itype).get('.'.join(ipath))
if sub_path is None:
# We didn't find the path we're looking for, so we don't want to do anything
return None
return f'{base_path}/{item}{sub_path}'
# Get keys of a schema location
def keys(self, itype=None):
if itype is None:
return list(self.schema.get('base').keys())
else:
return list(self.schema.get(itype).keys())
# Get the active version of a cluster's schema
def get_version(self, zkhandler):
try:
current_version = zkhandler.read(self.path('base.schema.version'))
except NoNodeError:
current_version = 0
return current_version
# Validate an active schema against a Zookeeper cluster
def validate(self, zkhandler, logger=None):
result = True
# Walk the entire tree checking our schema
for elem in ['base']:
for key in self.keys(elem):
kpath = f'{elem}.{key}'
if not zkhandler.zk_conn.exists(self.path(kpath)):
if logger is not None:
logger.out(f'Key not found: {self.path(kpath)}', state='w')
result = False
for elem in ['logs', 'node', 'domain', 'network', 'osd', 'pool']:
# First read all the subelements of the key class
for child in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
if logger is not None:
logger.out(f'Key not found: {self.path(kpath, child)}', state='w')
result = False
# Continue for child keys under network (reservation, acl)
if elem in ['network'] and ikey in ['reservation', 'rule.in', 'rule.out']:
if ikey in ['rule.in', 'rule.out']:
sikey = 'rule'
else:
sikey = ikey
npath = self.path(f'{elem}.{ikey}', child)
for nchild in zkhandler.zk_conn.get_children(npath):
nkpath = f'{npath}/{nchild}'
for esikey in self.keys(sikey):
nkipath = f'{nkpath}/{esikey}'
if not zkhandler.zk_conn.exists(nkipath):
result = False
# One might expect child keys under node (specifically, sriov.pf and sriov.vf) to be
# managed here as well, but those are created automatically every time pvcnoded starts
# and thus never need to be validated or applied.
# These two have several children layers that must be parsed through
for elem in ['volume']:
# First read all the subelements of the key class (pool layer)
for pchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# Finally read all the subelements of the key class (volume layer)
for vchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}'):
child = f'{pchild}/{vchild}'
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
if logger is not None:
logger.out(f'Key not found: {self.path(kpath, child)}', state='w')
result = False
for elem in ['snapshot']:
# First read all the subelements of the key class (pool layer)
for pchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# Next read all the subelements of the key class (volume layer)
for vchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}'):
# Finally read all the subelements of the key class (volume layer)
for schild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}/{vchild}'):
child = f'{pchild}/{vchild}/{schild}'
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
if logger is not None:
logger.out(f'Key not found: {self.path(kpath, child)}', state='w')
result = False
return result
# Apply the current schema to the cluster
def apply(self, zkhandler):
# Walk the entire tree checking our schema
for elem in ['base']:
for key in self.keys(elem):
kpath = f'{elem}.{key}'
if not zkhandler.zk_conn.exists(self.path(kpath)):
# Ensure that we create base.schema.version with the current valid version value
if kpath == 'base.schema.version':
data = str(self.version)
else:
data = ''
zkhandler.zk_conn.create(self.path(kpath), data.encode(zkhandler.encoding))
for elem in ['logs', 'node', 'domain', 'network', 'osd', 'pool']:
# First read all the subelements of the key class
for child in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
zkhandler.zk_conn.create(self.path(kpath, child), ''.encode(zkhandler.encoding))
# Continue for child keys under network (reservation, acl)
if elem in ['network'] and ikey in ['reservation', 'rule.in', 'rule.out']:
if ikey in ['rule.in', 'rule.out']:
sikey = 'rule'
else:
sikey = ikey
npath = self.path(f'{elem}.{ikey}', child)
for nchild in zkhandler.zk_conn.get_children(npath):
nkpath = f'{npath}/{nchild}'
for esikey in self.keys(sikey):
nkipath = f'{nkpath}/{esikey}'
if not zkhandler.zk_conn.exists(nkipath):
zkhandler.zk_conn.create(nkipath, ''.encode(zkhandler.encoding))
# One might expect child keys under node (specifically, sriov.pf and sriov.vf) to be
# managed here as well, but those are created automatically every time pvcnoded starts
# and thus never need to be validated or applied.
# These two have several children layers that must be parsed through
for elem in ['volume']:
# First read all the subelements of the key class (pool layer)
for pchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# Finally read all the subelements of the key class (volume layer)
for vchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}'):
child = f'{pchild}/{vchild}'
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
zkhandler.zk_conn.create(self.path(kpath, child), ''.encode(zkhandler.encoding))
for elem in ['snapshot']:
# First read all the subelements of the key class (pool layer)
for pchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}')):
# Next read all the subelements of the key class (volume layer)
for vchild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}'):
# Finally read all the subelements of the key class (volume layer)
for schild in zkhandler.zk_conn.get_children(self.path(f'base.{elem}') + f'/{pchild}/{vchild}'):
child = f'{pchild}/{vchild}/{schild}'
# For each key in the schema for that particular elem
for ikey in self.keys(elem):
kpath = f'{elem}.{ikey}'
# Validate that the key exists for that child
if not zkhandler.zk_conn.exists(self.path(kpath, child)):
zkhandler.zk_conn.create(self.path(kpath, child), ''.encode(zkhandler.encoding))
# Migrate key diffs
def run_migrate(self, zkhandler, changes):
diff_add = changes['add']
diff_remove = changes['remove']
diff_rename = changes['rename']
add_tasks = list()
for key in diff_add.keys():
add_tasks.append((diff_add[key], ''))
remove_tasks = list()
for key in diff_remove.keys():
remove_tasks.append(diff_remove[key])
rename_tasks = list()
for key in diff_rename.keys():
rename_tasks.append((diff_rename[key]['from'], diff_rename[key]['to']))
zkhandler.write(add_tasks)
zkhandler.delete(remove_tasks)
zkhandler.rename(rename_tasks)
# Migrate from older to newer schema
def migrate(self, zkhandler, new_version):
# Determine the versions in between
versions = ZKSchema.find_all(start=self.version, end=new_version)
if versions is None:
return
for version in versions:
# Create a new schema at that version
zkschema_new = ZKSchema()
zkschema_new.load(version)
# Get a list of changes
changes = ZKSchema.key_diff(self, zkschema_new)
# Apply those changes
self.run_migrate(zkhandler, changes)
# Rollback from newer to older schema
def rollback(self, zkhandler, old_version):
# Determine the versions in between
versions = ZKSchema.find_all(start=old_version - 1, end=self.version - 1)
if versions is None:
return
versions.reverse()
for version in versions:
# Create a new schema at that version
zkschema_old = ZKSchema()
zkschema_old.load(version)
# Get a list of changes
changes = ZKSchema.key_diff(self, zkschema_old)
# Apply those changes
self.run_migrate(zkhandler, changes)
@classmethod
def key_diff(cls, schema_a, schema_b):
# schema_a = current
# schema_b = new
diff_add = dict()
diff_remove = dict()
diff_rename = dict()
# Parse through each core element
for elem in ['base', 'node', 'domain', 'network', 'osd', 'pool', 'volume', 'snapshot']:
set_a = set(schema_a.keys(elem))
set_b = set(schema_b.keys(elem))
diff_keys = set_a ^ set_b
for item in diff_keys:
elem_item = f'{elem}.{item}'
if item not in schema_a.keys(elem) and item in schema_b.keys(elem):
diff_add[elem_item] = schema_b.path(elem_item)
if item in schema_a.keys(elem) and item not in schema_b.keys(elem):
diff_remove[elem_item] = schema_a.path(elem_item)
for item in set_b:
elem_item = f'{elem}.{item}'
if schema_a.path(elem_item) is not None and \
schema_b.path(elem_item) is not None and \
schema_a.path(elem_item) != schema_b.path(elem_item):
diff_rename[elem_item] = {'from': schema_a.path(elem_item), 'to': schema_b.path(elem_item)}
return {'add': diff_add, 'remove': diff_remove, 'rename': diff_rename}
# Load in the schemal of the current cluster
@classmethod
def load_current(cls, zkhandler):
new_instance = cls()
version = new_instance.get_version(zkhandler)
new_instance.load(version)
return new_instance
# Write the latest schema to a file
@classmethod
def write(cls):
schema_file = 'daemon_lib/migrations/versions/{}.json'.format(cls._version)
with open(schema_file, 'w') as sfh:
json.dump(cls._schema, sfh)
# Static methods for reading information from the files
@staticmethod
def find_all(start=0, end=None):
versions = list()
for version in os.listdir('daemon_lib/migrations/versions'):
sequence_id = int(version.split('.')[0])
if end is None:
if sequence_id > start:
versions.append(sequence_id)
else:
if sequence_id > start and sequence_id <= end:
versions.append(sequence_id)
if len(versions) > 0:
return versions
else:
return None
@staticmethod
def find_latest():
latest_version = 0
for version in os.listdir('daemon_lib/migrations/versions'):
sequence_id = int(version.split('.')[0])
if sequence_id > latest_version:
latest_version = sequence_id
return latest_version

92
debian/changelog vendored
View File

@ -1,3 +1,95 @@
pvc (0.9.28-0) unstable; urgency=high
* [CLI Client] Revamp confirmation options for "vm modify" command
-- Joshua M. Boniface <joshua@boniface.me> Mon, 19 Jul 2021 09:29:34 -0400
pvc (0.9.27-0) unstable; urgency=high
* [CLI Client] Fixes a bug with vm modify command when passed a file
-- Joshua M. Boniface <joshua@boniface.me> Mon, 19 Jul 2021 00:03:40 -0400
pvc (0.9.26-0) unstable; urgency=high
* [Node Daemon] Corrects some bad assumptions about fencing results during hardware failures
* [All] Implements VM tagging functionality
* [All] Implements Node log access via PVC functionality
-- Joshua M. Boniface <joshua@boniface.me> Sun, 18 Jul 2021 20:49:52 -0400
pvc (0.9.25-0) unstable; urgency=high
* [Node Daemon] Returns to Rados library calls for Ceph due to performance problems
* [Node Daemon] Adds a date output to keepalive messages
* [Daemons] Configures ZK connection logging only for persistent connections
* [API Provisioner] Add context manager-based chroot to Debootstrap example script
* [Node Daemon] Fixes a bug where shutdown daemon state was overwritten
-- Joshua M. Boniface <joshua@boniface.me> Sun, 11 Jul 2021 23:19:09 -0400
pvc (0.9.24-0) unstable; urgency=high
* [Node Daemon] Removes Rados module polling of Ceph cluster and returns to command-based polling for timeout purposes, and removes some flaky return statements
* [Node Daemon] Removes flaky Zookeeper connection renewals that caused problems
* [CLI Client] Allow raw lists of clusters from `pvc cluster list`
* [API Daemon] Fixes several issues when getting VM data without stats
* [API Daemon] Fixes issues with removing VMs while disks are still in use (failed provisioning, etc.)
-- Joshua M. Boniface <joshua@boniface.me> Fri, 09 Jul 2021 15:58:36 -0400
pvc (0.9.23-0) unstable; urgency=high
* [Daemons] Fixes a critical overwriting bug in zkhandler when schema paths are not yet valid
* [Node Daemon] Ensures the daemon mode is updated on every startup (fixes the side effect of the above bug in 0.9.22)
-- Joshua M. Boniface <joshua@boniface.me> Mon, 05 Jul 2021 23:40:32 -0400
pvc (0.9.22-0) unstable; urgency=high
* [API Daemon] Drastically improves performance when getting large lists (e.g. VMs)
* [Daemons] Adds profiler functions for use in debug mode
* [Daemons] Improves reliability of ZK locking
* [Daemons] Adds the new logo in ASCII form to the Daemon startup message
* [Node Daemon] Fixes bug where VMs would sometimes not stop
* [Node Daemon] Code cleanups in various classes
* [Node Daemon] Fixes a bug when reading node schema data
* [All] Adds node PVC version information to the list output
* [CLI Client] Improves the style and formatting of list output including a new header line
* [API Worker] Fixes a bug that prevented the storage benchmark job from running
-- Joshua M. Boniface <joshua@boniface.me> Mon, 05 Jul 2021 14:18:51 -0400
pvc (0.9.21-0) unstable; urgency=high
* [API Daemon] Ensures VMs stop before removing them
* [Node Daemon] Fixes a bug with VM shutdowns not timing out
* [Documentation] Adds information about georedundancy caveats
* [All] Adds support for SR-IOV NICs (hostdev and macvtap) and surrounding documentation
* [Node Daemon] Fixes a bug where shutdown aborted migrations unexpectedly
* [Node Daemon] Fixes a bug where the migration method was not updated realtime
* [Node Daemon] Adjusts the Patroni commands to remove reference to Zookeeper path
* [CLI Client] Adjusts several help messages and fixes some typos
* [CLI Client] Converts the CLI client to a proper Python module
* [API Daemon] Improves VM list performance
* [API Daemon] Adjusts VM list matching critera (only matches against the UUID if it's a full UUID)
* [API Worker] Fixes incompatibility between Deb 10 and 11 in launching Celery worker
* [API Daemon] Corrects several bugs with initialization command
* [Documentation] Adds a shiny new logo and revamps introduction text
-- Joshua M. Boniface <joshua@boniface.me> Tue, 29 Jun 2021 19:21:31 -0400
pvc (0.9.20-0) unstable; urgency=high
* [Daemons] Implemented a Zookeeper schema handler and version 0 schema
* [Daemons] Completes major refactoring of codebase to make use of the schema handler
* [Daemons] Adds support for dynamic chema changges and "hot reloading" of pvcnoded processes
* [Daemons] Adds a functional testing script for verifying operation against a test cluster
* [Daemons, CLI] Fixes several minor bugs found by the above script
* [Daemons, CLI] Add support for Debian 11 "Bullseye"
-- Joshua M. Boniface <joshua@boniface.me> Mon, 14 Jun 2021 18:06:27 -0400
pvc (0.9.19-0) unstable; urgency=high
* [CLI] Corrects some flawed conditionals

View File

@ -1,3 +0,0 @@
client-cli/pvc.py usr/share/pvc
client-cli/cli_lib usr/share/pvc
client-cli/scripts usr/share/pvc

View File

@ -1,4 +1,8 @@
#!/bin/sh
# Install client binary to /usr/bin via symlink
ln -s /usr/share/pvc/pvc.py /usr/bin/pvc
# Generate the bash completion configuration
if [ -d /etc/bash_completion.d ]; then
_PVC_COMPLETE=source_bash pvc > /etc/bash_completion.d/pvc
fi
exit 0

View File

@ -1,4 +1,8 @@
#!/bin/sh
# Remove client binary symlink
rm -f /usr/bin/pvc
# Remove the bash completion
if [ -f /etc/bash_completion.d/pvc ]; then
rm -f /etc/bash_completion.d/pvc
fi
exit 0

View File

@ -5,5 +5,6 @@ api-daemon/pvcapid.sample.yaml etc/pvc
api-daemon/pvcapid usr/share/pvc
api-daemon/pvcapid.service lib/systemd/system
api-daemon/pvcapid-worker.service lib/systemd/system
api-daemon/pvcapid-worker.sh usr/share/pvc
api-daemon/provisioner usr/share/pvc
api-daemon/migrations usr/share/pvc

12
debian/rules vendored
View File

@ -1,13 +1,19 @@
#!/usr/bin/make -f
# See debhelper(7) (uncomment to enable)
# output every command that modifies files on the build system.
#export DH_VERBOSE = 1
export DH_VERBOSE = 1
%:
dh $@
dh $@ --with python3
override_dh_python3:
cd $(CURDIR)/client-cli; pybuild --system=distutils --dest-dir=../debian/pvc-client-cli/
mkdir -p debian/pvc-client-cli/usr/lib/python3
mv debian/pvc-client-cli/usr/lib/python3*/* debian/pvc-client-cli/usr/lib/python3/
rm -r $(CURDIR)/client-cli/.pybuild $(CURDIR)/client-cli/pvc.egg-info
override_dh_auto_clean:
find . -name "__pycache__" -exec rm -r {} \; || true
find . -name "__pycache__" -o -name ".pybuild" -exec rm -r {} \; || true
# If you need to rebuild the Sphinx documentation
# Add spinxdoc to the dh --with line

View File

@ -49,7 +49,7 @@ Node network routing for managed networks providing EBGP VXLAN and route-learnin
The storage subsystem is provided by Ceph, a distributed object-based storage subsystem with extensive scalability, self-managing, and self-healing functionality. The Ceph RBD (RADOS Block Device) subsystem is used to provide VM block devices similar to traditional LVM or ZFS zvols, but in a distributed, shared-storage manner.
All the components are designed to be run on top of Debian GNU/Linux, specifically Debian 10.X "Buster", with the SystemD system service manager. This OS provides a stable base to run the various other subsystems while remaining truly Free Software, while SystemD provides functionality such as automatic daemon restarting and complex startup/shutdown ordering.
All the components are designed to be run on top of Debian GNU/Linux, specifically Debian 10.x "Buster" or 11.x "Bullseye", with the SystemD system service manager. This OS provides a stable base to run the various other subsystems while remaining truly Free Software, while SystemD provides functionality such as automatic daemon restarting and complex startup/shutdown ordering.
## Cluster Architecture
@ -156,7 +156,7 @@ For optimal performance, nodes should use at least 10-Gigabit Ethernet network i
#### What Ceph version does PVC use?
PVC requires Ceph 14.x (Nautilus). The official PVC repository at https://repo.bonifacelabs.ca includes Ceph 14.2.x (updated regularly), since Debian Buster by default includes only 12.x (Luminous).
PVC requires Ceph 14.x (Nautilus). The official PVC repository at https://repo.bonifacelabs.ca includes Ceph 14.2.x for Debian Buster (updated regularly), since by default it only includes 12.x (Luminous).
## About The Author

View File

@ -12,6 +12,7 @@
+ [PVC client networks](#pvc-client-networks)
- [Bridged (unmanaged) Client Networks](#bridged--unmanaged--client-networks)
- [VXLAN (managed) Client Networks](#vxlan--managed--client-networks)
- [SR-IOV Client Networks](#sriov-client-networks)
- [Other Client Networks](#other-client-networks)
* [Node Layout: Considering how nodes are laid out](#node-layout--considering-how-nodes-are-laid-out)
+ [Node Functions: Coordinators versus Hypervisors](#node-functions--coordinators-versus-hypervisors)
@ -74,7 +75,7 @@ By default, the Java heap and stack sizes are set to 256MB and 512MB respectivel
### Operating System and Architecture
As an underlying OS, only Debian GNU/Linux 10.x "Buster" is supported by PVC. This is the operating system installed by the PVC [node installer](https://github.com/parallelvirtualcluster/pvc-installer) and expected by the PVC [Ansible configuration system](https://github.com/parallelvirtualcluster/pvc-ansible). Ubuntu or other Debian-derived distributions may work, but are not officially supported. PVC also makes use of a custom repository to provide the PVC software and an updated version of Ceph beyond what is available in the base operating system, and this is only compatible officially with Debian 10 "Buster". PVC will, in the future, upgrade to future versions of Debian based on their release schedule and testing; releases may be skipped for official support if required. As a general rule, using the current versions of the official node installer and Ansible repository is the preferred and only supported method for deploying PVC.
As an underlying OS, only Debian GNU/Linux 10.x "Buster" or 11.x "Bullseye" is supported by PVC. This is the operating system installed by the PVC [node installer](https://github.com/parallelvirtualcluster/pvc-installer) and expected by the PVC [Ansible configuration system](https://github.com/parallelvirtualcluster/pvc-ansible). Ubuntu or other Debian-derived distributions may work, but are not officially supported. PVC also makes use of a custom repository to provide the PVC software and (for Debian Buster) an updated version of Ceph beyond what is available in the base operating system, and this is only compatible officially with Debian 10 or 11. PVC will generally be upgraded regularly to support new Debian versions. As a rule, using the current versions of the official node installer and Ansible repository is the preferred and only supported method for deploying PVC.
Currently, only the `amd64` (Intel 64 or AMD64) architecture is officially supported by PVC. Given the cross-platform nature of Python and the various software components in Debian, it may work on `armhf` or `arm64` systems as well, however this has not been tested by the author and is not officially supported at this time.
@ -184,6 +185,26 @@ With this client network type, PVC is in full control of the network. No vLAN co
NOTE: These networks may introduce a bottleneck and tromboning if there is a large amount of external and/or inter-network traffic on the cluster. The administrator should consider this carefully when deciding whether to use managed or bridged networks and properly evaluate the inter-network traffic requirements.
#### SR-IOV Client Networks
The third type of client network is the SR-IOV network. SR-IOV (Single-Root I/O Virtualization) is a technique and feature enabled on modern high-performance NICs (for instance, those from Intel or nVidia) which allows a single physical Ethernet port (a "PF" in SR-IOV terminology) to be split, at a hardware level, into multiple virtual Ethernet ports ("VF"s), which can then be managed separately. Starting with version 0.9.21, PVC support SR-IOV PF and VF configuration at the node level, and these VFs can be passed into VMs in two ways.
SR-IOV's main benefit is to offload bridging and network functions from the hypervisor layer, and direct them onto the hardware itself. This can increase network throughput in some situations, as well as provide near-complete isolation of guest networks from the hypervisors (in contrast with bridges which *can* expose client traffic to the hypervisors, and VXLANs which *do* expose client traffic to the hypervisors). For instance, a VF can have a vLAN specified, and the tagging/untagging of packets is then carried out at the hardware layer.
There are however caveats to working with SR-IOV. At the most basic level, the biggest difference with SR-IOV compared to the other two network types is that SR-IOV must be configured on a per-node basis. That is, each node must have SR-IOV explicitly enabled, it's specific PF devices defined, and a set of VFs created at PVC startup. Generally, with identical PVC nodes, this will not be a problem but is something to consider, especially if the servers are mismatched in any way. It is thus also possible to set some nodes with SR-IOV functionality, and others without, though care must be taken in this situation to set node limits in the VM metadata of any VMs which use SR-IOV VFs to prevent failed migrations.
PFs are defined in the `pvcnoded.yml` configuration of each node, via the `sriov_device` list. Each PF can have an arbitrary number of VFs (`vfcount`) allocated, though each NIC vendor and model has specific limits. Once configured, specifically with Intel NICs, PFs (and specifically, the `vfcount` attribute in the driver) are immutable and cannot be changed easily without completely flushing the node and rebooting it, so care should be taken to select the desired settings as early in the cluster configuration as possible.
Once created, VFs are also managed on a per-node basis. That is, each VF, on each host, even if they have the exact same device names, is managed separately. For instance, the PF `ens1f0` creating a VF `ens1f0v0` on "`hv1`", can have a different configuration from the identically-named VF `ens1f0v0` on "`hv2`". The administrator is responsible for ensuring consistency here, and for ensuring that devices do not overlap (e.g. assigning the same VF name to VMs on two separate nodes which might migrate to each other). PVC will however explicitly prevent two VMs from being assigned to the same VF on the same node, even if this may be technically possible in some cases.
When attaching VFs to VMs, there are two supported modes: `macvtap`, and `hostdev`.
`macvtap`, as the name suggests, uses the Linux `macvtap` driver to connect the VF to the VM. Once attached, the vNIC behaves just like a "bridged" network connection above, and like "bridged" connections, the "mode" of the NIC can be specificed, defaulting to "virtio" but supporting various emulated devices instead. Note that in this mode, vLANs cannot be configured on the guest side; they must be specified in the VF configuration (`pvc network sriov vf set`) with one vLAN per VF. VMs with `macvtap` interfaces can be live migrated between nodes without issue, assuming there is a corresponding free VF on the destination node, and the SR-IOV functionality is transparent to the VM.
`hostdev` is a direct PCIe passthrough method. With a VF attached to a VM in `hostdev` mode, the virtual PCIe NIC device itself becomes hidden from the node, and is visible only to the guest, where it appears as a discrete PCIe device. In this mode, vLANs and other attributes can be set on the guest side at will, though setting vLANs and other properties in the VF configuration is still supported. The main caveat to this mode is that VMs with connected `hostdev` SR-IOV VFs *cannot be live migrated between nodes*. Only a `shutdown` migration is supported, and, like `macvtap`, an identical PCIe device at the same bus address must be present on the target node. To prevent unexpected failures, PVC will explicitly set the VM metadata for the "migration method" to "shutdown" the first time that a `hostdev` VF is attached to it; if this changes later, the administrator must change this back explicitly.
Generally speaking, SR-IOV connections are not recommended unless there is a good usecase for them. On modern hardware, software bridges are extremely performant, and are much simpler to manage. The functionality is provided for those rare usecases where SR-IOV is asbolutely required by the administrator, but care must be taken to understand all the requirements and caveats of SR-IOV before using it in production.
#### Other Client Networks
Future PVC versions may support other client network types, such as direct-routing between VMs.
@ -235,10 +256,17 @@ When using geographic redundancy, there are several caveats to keep in mind:
* The number of sites and positioning of coordinators at those sites is important. A majority (at least 2 in a 3-coordinator cluster, or 3 in a 5-coordinator) of coordinators must be able to reach each other in a failure scenario for the cluster as a whole to remain functional. Thus, configurations such as 2 + 1 or 3 + 2 splits across 2 sites do *not* provide full redundancy, and the whole cluster will be down if the majority site is down. It is thus recommended to always have an odd number of sites to match the odd number of coordinators, for instance a 1 + 1 + 1 or 2 + 2 + 1 configuration. Also note that all hypervisors much be able to reach the majority coordinator group or their storage will be impacted as well.
This diagram outlines the supported and unsupported/unreliable georedundant configurations for 3 nodes. Care must always be taken to ensure that the cluster can operate with the loss of any given georeundant site.
![georeundancy-caveats](/images/georedundancy-caveats.png)
*Above: Supported and unsupported/unreliable georedundant configurations*
* Even if the PVC software itself is in an unmanageable state, VMs will continue to run if at all possible. However, since the storage subsystem makes use of the same quorum, losing more than half of the nodes will very likely result in storage interruption as well, which will affect running VMs.
If these requirements cannot be fulfilled, it may be best to have separate PVC clusters at each site and handle service redundancy at a higher layer to avoid a major disruption.
## Example Configurations
This section provides diagrams of 3 possible node configurations. These diagrams can be extrapolated out to almost any possible configuration and number of nodes.

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

BIN
docs/images/pvc_icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

View File

@ -1,23 +1,121 @@
# PVC - The Parallel Virtual Cluster system
<p align="center">
<img alt="Logo banner" src="https://git.bonifacelabs.ca/uploads/-/system/project/avatar/135/pvc_logo.png"/>
<img alt="Logo banner" src="images/pvc_logo_black.png"/>
<br/><br/>
<a href="https://github.com/parallelvirtualcluster/pvc"><img alt="License" src="https://img.shields.io/github/license/parallelvirtualcluster/pvc"/></a>
<a href="https://github.com/parallelvirtualcluster/pvc/releases"><img alt="Release" src="https://img.shields.io/github/release-pre/parallelvirtualcluster/pvc"/></a>
<a href="https://parallelvirtualcluster.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/parallelvirtualcluster/badge/?version=latest"/></a>
</p>
PVC is a KVM+Ceph+Zookeeper-based, Free Software, scalable, redundant, self-healing, and self-managing private cloud solution designed with administrator simplicity in mind. It is built from the ground-up to be redundant at the host layer, allowing the cluster to gracefully handle the loss of nodes or their components, both due to hardware failure or due to maintenance. It is able to scale from a minimum of 3 nodes up to 12 or more nodes, while retaining performance and flexibility, allowing the administrator to build a small cluster today and grow it as needed.
## What is PVC?
PVC is a virtual machine-based hyperconverged infrastructure (HCI) virtualization cluster solution that is fully Free Software, scalable, redundant, self-healing, self-managing, and designed for administrator simplicity. It is an alternative to other HCI solutions such as Harvester, Nutanix, and VMWare, as well as to other common virtualization stacks such as ProxMox and OpenStack.
PVC is a complete HCI solution, built from well-known and well-trusted Free Software tools, to assist an administrator in creating and managing a cluster of servers to run virtual machines, as well as self-managing several important aspects including storage failover, node failure and recovery, virtual machine failure and recovery, and network plumbing. It is designed to act consistently, reliably, and unobtrusively, letting the administrator concentrate on more important things.
PVC is highly scalable. From a minimum (production) node count of 3, up to 12 or more, and supporting many dozens of VMs, PVC scales along with your workload and requirements. Deploy a cluster once and grow it as your needs expand.
As a consequence of its features, PVC makes administrating very high-uptime VMs extremely easy, featuring VM live migration, built-in always-enabled shared storage with transparent multi-node replication, and consistent network plumbing throughout the cluster. Nodes can also be seamlessly removed from or added to service, with zero VM downtime, to facilitate maintenance, upgrades, or other work.
PVC also features an optional, fully customizable VM provisioning framework, designed to automate and simplify VM deployments using custom provisioning profiles, scripts, and CloudInit userdata API support.
Installation of PVC is accomplished by two main components: a [Node installer ISO](https://github.com/parallelvirtualcluster/pvc-installer) which creates on-demand installer ISOs, and an [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) to configure, bootstrap, and administrate the nodes. Once up, the cluster is managed via an HTTP REST API, accessible via a Python Click CLI client or WebUI.
Just give it physical servers, and it will run your VMs without you having to think about it, all in just an hour or two of setup time.
## What is it based on?
The core node and API daemons, as well as the CLI API client, are written in Python 3 and are fully Free Software (GNU GPL v3). In addition to these, PVC makes use of the following software tools to provide a holistic hyperconverged infrastructure solution:
* Debian GNU/Linux as the base OS.
* Linux KVM, QEMU, and Libvirt for VM management.
* Linux `ip`, FRRouting, NFTables, DNSMasq, and PowerDNS for network management.
* Ceph for storage management.
* Apache Zookeeper for the primary cluster state database.
* Patroni PostgreSQL manager for the secondary relation databases (DNS aggregation, Provisioner configuration).
The major goal of PVC is to be administrator friendly, providing the power of Enterprise-grade private clouds like OpenStack, Nutanix, and VMWare to homelabbers, SMBs, and small ISPs, without the cost or complexity. It believes in picking the best tool for a job and abstracting it behind the cluster as a whole, freeing the administrator from the boring and time-consuming task of selecting the best component, and letting them get on with the things that really matter. Administration can be done from a simple CLI or via a RESTful API capable of building full-featured web frontends or additional applications, taking a self-documenting approach to keep the administrator learning curvet as low as possible. Setup is easy and straightforward with an [ISO-based node installer](https://github.com/parallelvirtualcluster/pvc-installer) and [Ansible role framework](https://github.com/parallelvirtualcluster/pvc-ansible) designed to get a cluster up and running as quickly as possible. Build your cloud in an hour, grow it as you need, and never worry about it: just add physical servers.
## Getting Started
To get started with PVC, please see the [About](https://parallelvirtualcluster.readthedocs.io/en/latest/about/) page for general information about the project, and the [Getting Started](https://parallelvirtualcluster.readthedocs.io/en/latest/getting-started/) page for details on configuring your cluster.
To get started with PVC, please see the [About](https://parallelvirtualcluster.readthedocs.io/en/latest/about/) page for general information about the project, and the [Getting Started](https://parallelvirtualcluster.readthedocs.io/en/latest/getting-started/) page for details on configuring your first cluster.
## Changelog
#### v0.9.28
* [CLI Client] Revamp confirmation options for "vm modify" command
#### v0.9.27
* [CLI Client] Fixes a bug with vm modify command when passed a file
#### v0.9.26
* [Node Daemon] Corrects some bad assumptions about fencing results during hardware failures
* [All] Implements VM tagging functionality
* [All] Implements Node log access via PVC functionality
#### v0.9.25
* [Node Daemon] Returns to Rados library calls for Ceph due to performance problems
* [Node Daemon] Adds a date output to keepalive messages
* [Daemons] Configures ZK connection logging only for persistent connections
* [API Provisioner] Add context manager-based chroot to Debootstrap example script
* [Node Daemon] Fixes a bug where shutdown daemon state was overwritten
#### v0.9.24
* [Node Daemon] Removes Rados module polling of Ceph cluster and returns to command-based polling for timeout purposes, and removes some flaky return statements
* [Node Daemon] Removes flaky Zookeeper connection renewals that caused problems
* [CLI Client] Allow raw lists of clusters from `pvc cluster list`
* [API Daemon] Fixes several issues when getting VM data without stats
* [API Daemon] Fixes issues with removing VMs while disks are still in use (failed provisioning, etc.)
#### v0.9.23
* [Daemons] Fixes a critical overwriting bug in zkhandler when schema paths are not yet valid
* [Node Daemon] Ensures the daemon mode is updated on every startup (fixes the side effect of the above bug in 0.9.22)
#### v0.9.22
* [API Daemon] Drastically improves performance when getting large lists (e.g. VMs)
* [Daemons] Adds profiler functions for use in debug mode
* [Daemons] Improves reliability of ZK locking
* [Daemons] Adds the new logo in ASCII form to the Daemon startup message
* [Node Daemon] Fixes bug where VMs would sometimes not stop
* [Node Daemon] Code cleanups in various classes
* [Node Daemon] Fixes a bug when reading node schema data
* [All] Adds node PVC version information to the list output
* [CLI Client] Improves the style and formatting of list output including a new header line
* [API Worker] Fixes a bug that prevented the storage benchmark job from running
#### v0.9.21
* [API Daemon] Ensures VMs stop before removing them
* [Node Daemon] Fixes a bug with VM shutdowns not timing out
* [Documentation] Adds information about georedundancy caveats
* [All] Adds support for SR-IOV NICs (hostdev and macvtap) and surrounding documentation
* [Node Daemon] Fixes a bug where shutdown aborted migrations unexpectedly
* [Node Daemon] Fixes a bug where the migration method was not updated realtime
* [Node Daemon] Adjusts the Patroni commands to remove reference to Zookeeper path
* [CLI Client] Adjusts several help messages and fixes some typos
* [CLI Client] Converts the CLI client to a proper Python module
* [API Daemon] Improves VM list performance
* [API Daemon] Adjusts VM list matching critera (only matches against the UUID if it's a full UUID)
* [API Worker] Fixes incompatibility between Deb 10 and 11 in launching Celery worker
* [API Daemon] Corrects several bugs with initialization command
* [Documentation] Adds a shiny new logo and revamps introduction text
#### v0.9.20
* [Daemons] Implemented a Zookeeper schema handler and version 0 schema
* [Daemons] Completes major refactoring of codebase to make use of the schema handler
* [Daemons] Adds support for dynamic chema changges and "hot reloading" of pvcnoded processes
* [Daemons] Adds a functional testing script for verifying operation against a test cluster
* [Daemons, CLI] Fixes several minor bugs found by the above script
* [Daemons, CLI] Add support for Debian 11 "Bullseye"
#### v0.9.19
* [CLI] Corrects some flawed conditionals

View File

@ -451,6 +451,12 @@ pvc_nodes:
pvc_bridge_device: bondU
pvc_sriov_enable: True
pvc_sriov_device:
- phy: ens1f0
mtu: 9000
vfcount: 6
pvc_upstream_device: "{{ networks['upstream']['device'] }}"
pvc_upstream_mtu: "{{ networks['upstream']['mtu'] }}"
pvc_upstream_domain: "{{ networks['upstream']['domain'] }}"
@ -901,6 +907,18 @@ The IPMI password for the node management controller. Unless a per-host override
The device name of the underlying network interface to be used for "bridged"-type client networks. For each "bridged"-type network, an IEEE 802.3q vLAN and bridge will be created on top of this device to pass these networks. In most cases, using the reflexive `networks['cluster']['raw_device']` or `networks['upstream']['raw_device']` from the Base role is sufficient.
#### `pvc_sriov_enable`
* *optional*
Whether to enable or disable SR-IOV functionality.
#### `pvc_sriov_device`
* *optional*
A list of SR-IOV devices. See the Daemon manual for details.
#### `pvc_<network>_*`
The next set of entries is hard-coded to use the values from the global `networks` list. It should not need to be changed under most circumstances. Refer to the previous sections for specific notes about each entry.

View File

@ -146,6 +146,11 @@ pvc:
console_log_lines: 1000
networking:
bridge_device: ens4
sriov_enable: True
sriov_device:
- phy: ens1f0
mtu: 9000
vfcount: 7
upstream:
device: ens4
mtu: 1500
@ -422,6 +427,34 @@ How many lines of VM console logs to keep in the Zookeeper database for each VM.
The network interface device used to create Bridged client network vLANs on. For most clusters, should match the underlying device of the various static networks (e.g. `ens4` or `bond0`), though may also use a separate network interface.
#### `system` → `configuration` → `networking` → `sriov_enable`
* *optional*, defaults to `False`
* *requires* `functions``enable_networking`
Enables (or disables) SR-IOV functionality in PVC. If enabled, at least one `sriov_device` entry should be specified.
#### `system` → `configuration` → `networking` → `sriov_device`
* *optional*
* *requires* `functions``enable_networking`
Contains a list of SR-IOV PF (physical function) devices and their basic configuration. Each element contains the following entries:
##### `phy`:
* *required*
The raw Linux network device with SR-IOV PF functionality.
##### `mtu`
The MTU of the PF device, set on daemon startup.
##### `vfcount`
The number of VF devices to create on this PF. VF devices are then managed via PVC on a per-node basis.
#### `system` → `configuration` → `networking`
* *optional*

View File

@ -144,6 +144,19 @@
},
"type": "object"
},
"NodeLog": {
"properties": {
"data": {
"description": "The recent log text",
"type": "string"
},
"name": {
"description": "The name of the Node",
"type": "string"
}
},
"type": "object"
},
"VMLog": {
"properties": {
"data": {
@ -215,6 +228,23 @@
},
"type": "object"
},
"VMTags": {
"properties": {
"name": {
"description": "The name of the VM",
"type": "string"
},
"tags": {
"description": "The tag(s) of the VM",
"items": {
"id": "VMTag",
"type": "object"
},
"type": "array"
}
},
"type": "object"
},
"acl": {
"properties": {
"description": {
@ -464,6 +494,10 @@
"description": "The current operating system type",
"type": "string"
},
"pvc_version": {
"description": "The current running PVC node daemon version",
"type": "string"
},
"running_domains": {
"description": "The list of running domains (VMs) by UUID",
"type": "string"
@ -764,6 +798,99 @@
},
"type": "object"
},
"sriov_pf": {
"properties": {
"mtu": {
"description": "The MTU of the SR-IOV PF device",
"type": "string"
},
"phy": {
"description": "The name of the SR-IOV PF device",
"type": "string"
},
"vfs": {
"items": {
"description": "The PHY name of a VF of this PF",
"type": "string"
},
"type": "list"
}
},
"type": "object"
},
"sriov_vf": {
"properties": {
"config": {
"id": "sriov_vf_config",
"properties": {
"link_state": {
"description": "The current SR-IOV VF link state (either enabled, disabled, or auto)",
"type": "string"
},
"query_rss": {
"description": "Whether VF RSS querying is enabled or disabled",
"type": "boolean"
},
"spoof_check": {
"description": "Whether device spoof checking is enabled or disabled",
"type": "boolean"
},
"trust": {
"description": "Whether guest device trust is enabled or disabled",
"type": "boolean"
},
"tx_rate_max": {
"description": "The maximum TX rate of the SR-IOV VF device",
"type": "string"
},
"tx_rate_min": {
"description": "The minimum TX rate of the SR-IOV VF device",
"type": "string"
},
"vlan_id": {
"description": "The tagged vLAN ID of the SR-IOV VF device",
"type": "string"
},
"vlan_qos": {
"description": "The QOS group of the tagged vLAN",
"type": "string"
}
},
"type": "object"
},
"mac": {
"description": "The current MAC address of the VF device",
"type": "string"
},
"mtu": {
"description": "The current MTU of the VF device",
"type": "integer"
},
"pf": {
"description": "The name of the SR-IOV PF parent of this VF device",
"type": "string"
},
"phy": {
"description": "The name of the SR-IOV VF device",
"type": "string"
},
"usage": {
"id": "sriov_vf_usage",
"properties": {
"domain": {
"description": "The UUID of the domain the SR-IOV VF is currently used by",
"type": "boolean"
},
"used": {
"description": "Whether the SR-IOV VF is currently used by a VM or not",
"type": "boolean"
}
},
"type": "object"
}
},
"type": "object"
},
"storage-template": {
"properties": {
"disks": {
@ -1273,6 +1400,28 @@
"description": "The current state of the VM",
"type": "string"
},
"tags": {
"description": "The tag(s) of the VM",
"items": {
"id": "VMTag",
"properties": {
"name": {
"description": "The name of the tag",
"type": "string"
},
"protected": {
"description": "Whether the tag is protected or not",
"type": "boolean"
},
"type": {
"description": "The type of the tag (user, system)",
"type": "string"
}
},
"type": "object"
},
"type": "array"
},
"type": {
"description": "The type of the VM",
"type": "string"
@ -1459,8 +1608,15 @@
},
"/api/v1/initialize": {
"post": {
"description": "Note: Normally used only once during cluster bootstrap; checks for the existence of the \"/primary_node\" key before proceeding and returns 400 if found",
"description": "<br/>If the 'overwrite' option is not True, the cluster will return 400 if the `/config/primary_node` key is found. If 'overwrite' is True, the existing cluster<br/>data will be erased and new, empty data written in its place.<br/><br/>All node daemons should be stopped before running this command, and the API daemon started manually to avoid undefined behavior.",
"parameters": [
{
"description": "A flag to enable or disable (default) overwriting existing data",
"in": "query",
"name": "overwrite",
"required": false,
"type": "bool"
},
{
"description": "A confirmation string to ensure that the API consumer really means it",
"in": "query",
@ -2311,7 +2467,7 @@
"description": "",
"parameters": [
{
"description": "A search limit; fuzzy by default, use ^/$ to force exact matches",
"description": "A search limit in the name, tags, or an exact UUID; fuzzy by default, use ^/$ to force exact matches",
"in": "query",
"name": "limit",
"required": false,
@ -2522,6 +2678,38 @@
]
}
},
"/api/v1/node/{node}/log": {
"get": {
"description": "",
"parameters": [
{
"description": "The number of lines to retrieve",
"in": "query",
"name": "lines",
"required": false,
"type": "integer"
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/NodeLog"
}
},
"404": {
"description": "Node not found",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Return the recent logs of {node}",
"tags": [
"node"
]
}
},
"/api/v1/provisioner/create": {
"post": {
"description": "Note: Starts a background job in the pvc-provisioner-worker Celery worker while returning a task ID; the task ID can be used to query the \"GET /provisioner/status/<task_id>\" endpoint for the job status",
@ -4453,6 +4641,181 @@
]
}
},
"/api/v1/sriov/pf": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/sriov_pf"
}
}
},
"summary": "Return a list of SR-IOV PFs on a given node",
"tags": [
"network / sriov"
]
}
},
"/api/v1/sriov/pf/{node}": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/sriov_pf"
}
}
},
"summary": "Return a list of SR-IOV PFs on node {node}",
"tags": [
"network / sriov"
]
}
},
"/api/v1/sriov/vf": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/sriov_vf"
}
}
},
"summary": "Return a list of SR-IOV VFs on a given node, optionally limited to those in the specified PF",
"tags": [
"network / sriov"
]
}
},
"/api/v1/sriov/vf/{node}": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/sriov_vf"
}
}
},
"summary": "Return a list of SR-IOV VFs on node {node}, optionally limited to those in the specified PF",
"tags": [
"network / sriov"
]
}
},
"/api/v1/sriov/vf/{node}/{vf}": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/sriov_vf"
}
},
"404": {
"description": "Not found",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Return information about {vf} on {node}",
"tags": [
"network / sriov"
]
},
"put": {
"description": "",
"parameters": [
{
"description": "The vLAN ID for vLAN tagging (0 is disabled)",
"in": "query",
"name": "vlan_id",
"required": false,
"type": "integer"
},
{
"description": "The vLAN QOS priority (0 is disabled)",
"in": "query",
"name": "vlan_qos",
"required": false,
"type": "integer"
},
{
"description": "The minimum TX rate (0 is disabled)",
"in": "query",
"name": "tx_rate_min",
"required": false,
"type": "integer"
},
{
"description": "The maximum TX rate (0 is disabled)",
"in": "query",
"name": "tx_rate_max",
"required": false,
"type": "integer"
},
{
"description": "The administrative link state",
"enum": [
"auto",
"enable",
"disable"
],
"in": "query",
"name": "link_state",
"required": false,
"type": "string"
},
{
"description": "Enable or disable spoof checking",
"in": "query",
"name": "spoof_check",
"required": false,
"type": "boolean"
},
{
"description": "Enable or disable VF user trust",
"in": "query",
"name": "trust",
"required": false,
"type": "boolean"
},
{
"description": "Enable or disable query RSS support",
"in": "query",
"name": "query_rss",
"required": false,
"type": "boolean"
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/Message"
}
},
"400": {
"description": "Bad request",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Set the configuration of {vf} on {node}",
"tags": [
"network / sriov"
]
}
},
"/api/v1/status": {
"get": {
"description": "",
@ -5516,7 +5879,7 @@
"description": "",
"parameters": [
{
"description": "A name search limit; fuzzy by default, use ^/$ to force exact matches",
"description": "A search limit in the name, tags, or an exact UUID; fuzzy by default, use ^/$ to force exact matches",
"in": "query",
"name": "limit",
"required": false,
@ -5535,6 +5898,13 @@
"name": "state",
"required": false,
"type": "string"
},
{
"description": "Limit list to VMs with this tag",
"in": "query",
"name": "tag",
"required": false,
"type": "string"
}
],
"responses": {
@ -5610,6 +5980,26 @@
"name": "migration_method",
"required": false,
"type": "string"
},
{
"description": "The user tag(s) of the VM",
"in": "query",
"items": {
"type": "string"
},
"name": "user_tags",
"required": false,
"type": "array"
},
{
"description": "The protected user tag(s) of the VM",
"in": "query",
"items": {
"type": "string"
},
"name": "protected_tags",
"required": false,
"type": "array"
}
],
"responses": {
@ -5721,7 +6111,8 @@
"mem",
"vcpus",
"load",
"vms"
"vms",
"none (cluster default)"
],
"in": "query",
"name": "selector",
@ -5747,6 +6138,26 @@
"name": "migration_method",
"required": false,
"type": "string"
},
{
"description": "The user tag(s) of the VM",
"in": "query",
"items": {
"type": "string"
},
"name": "user_tags",
"required": false,
"type": "array"
},
{
"description": "The protected user tag(s) of the VM",
"in": "query",
"items": {
"type": "string"
},
"name": "protected_tags",
"required": false,
"type": "array"
}
],
"responses": {
@ -5871,7 +6282,7 @@
}
},
"404": {
"description": "Not found",
"description": "VM not found",
"schema": {
"$ref": "#/definitions/Message"
}
@ -5945,6 +6356,12 @@
"schema": {
"$ref": "#/definitions/Message"
}
},
"404": {
"description": "VM not found",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Set the metadata of {vm}",
@ -6132,6 +6549,84 @@
"vm"
]
}
},
"/api/v1/vm/{vm}/tags": {
"get": {
"description": "",
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/VMTags"
}
},
"404": {
"description": "VM not found",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Return the tags of {vm}",
"tags": [
"vm"
]
},
"post": {
"description": "",
"parameters": [
{
"description": "The action to perform with the tag",
"enum": [
"add",
"remove"
],
"in": "query",
"name": "action",
"required": true,
"type": "string"
},
{
"description": "The text value of the tag",
"in": "query",
"name": "tag",
"required": true,
"type": "string"
},
{
"default": false,
"description": "Set the protected state of the tag",
"in": "query",
"name": "protected",
"required": false,
"type": "boolean"
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/Message"
}
},
"400": {
"description": "Bad request",
"schema": {
"$ref": "#/definitions/Message"
}
},
"404": {
"description": "VM not found",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Set the tags of {vm}",
"tags": [
"vm"
]
}
}
},
"swagger": "2.0"

7
gen-zk-migrations Executable file
View File

@ -0,0 +1,7 @@
#!/bin/bash
# Generate the Zookeeper migration files
pushd api-daemon
./pvcapid-manage-zk.py
popd

2
lint
View File

@ -7,7 +7,7 @@ fi
flake8 \
--ignore=E501 \
--exclude=api-daemon/migrations/versions,api-daemon/provisioner/examples
--exclude=debian,api-daemon/migrations/versions,api-daemon/provisioner/examples
ret=$?
if [[ $ret -eq 0 ]]; then
echo "No linting issues found!"

View File

@ -140,6 +140,8 @@ pvc:
file_logging: True
# stdout_logging: Enable or disable logging to stdout (i.e. journald)
stdout_logging: True
# zookeeper_logging: Enable ot disable logging to Zookeeper (for `pvc node log` functionality)
zookeeper_logging: True
# log_colours: Enable or disable ANSI colours in log output
log_colours: True
# log_dates: Enable or disable date strings in log output
@ -152,11 +154,28 @@ pvc:
log_keepalive_storage_details: True
# console_log_lines: Number of console log lines to store in Zookeeper per VM
console_log_lines: 1000
# node_log_lines: Number of node log lines to store in Zookeeper per node
node_log_lines: 2000
# networking: PVC networking configuration
# OPTIONAL if enable_networking: False
networking:
# bridge_device: Underlying device to use for bridged vLAN networks; usually the device underlying <cluster>
# bridge_device: Underlying device to use for bridged vLAN networks; usually the device of <cluster>
bridge_device: ens4
# sriov_enable: Enable or disable (default if absent) SR-IOV network support
sriov_enable: False
# sriov_device: Underlying device(s) to use for SR-IOV networks; can be bridge_device or other NIC(s)
sriov_device:
# The physical device name
- phy: ens1f1
# The preferred MTU of the physical device; OPTIONAL - defaults to the interface default if unset
mtu: 9000
# The number of VFs to enable on this device
# NOTE: This defines the maximum number of VMs which can be provisioned on this physical device; VMs
# are allocated to these VFs manually by the administrator and thus all nodes should have the
# same number
# NOTE: This value cannot be changed at runtime on Intel(R) NICs; the node will need to be restarted
# if this value changes
vfcount: 8
# upstream: Upstream physical interface device
upstream:
# device: Upstream interface device name

View File

@ -35,7 +35,7 @@ class CephOSDInstance(object):
self.size = None
self.stats = dict()
@self.zkhandler.zk_conn.DataWatch('/ceph/osds/{}/node'.format(self.osd_id))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('osd.node', self.osd_id))
def watch_osd_node(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -50,7 +50,7 @@ class CephOSDInstance(object):
if data and data != self.node:
self.node = data
@self.zkhandler.zk_conn.DataWatch('/ceph/osds/{}/stats'.format(self.osd_id))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('osd.stats', self.osd_id))
def watch_osd_stats(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -174,10 +174,10 @@ def add_osd(zkhandler, logger, node, device, weight):
# 7. Add the new OSD to the list
logger.out('Adding new OSD disk with ID {} to Zookeeper'.format(osd_id), state='i')
zkhandler.write([
('/ceph/osds/{}'.format(osd_id), ''),
('/ceph/osds/{}/node'.format(osd_id), node),
('/ceph/osds/{}/device'.format(osd_id), device),
('/ceph/osds/{}/stats'.format(osd_id), '{}')
(('osd', osd_id), ''),
(('osd.node', osd_id), node),
(('osd.device', osd_id), device),
(('osd.stats', osd_id), '{}'),
])
# Log it
@ -272,7 +272,7 @@ def remove_osd(zkhandler, logger, osd_id, osd_obj):
# 7. Delete OSD from ZK
logger.out('Deleting OSD disk with ID {} from Zookeeper'.format(osd_id), state='i')
zkhandler.delete('/ceph/osds/{}'.format(osd_id), recursive=True)
zkhandler.delete(('osd', osd_id), recursive=True)
# Log it
logger.out('Removed OSD disk with ID {}'.format(osd_id), state='o')
@ -291,7 +291,7 @@ class CephPoolInstance(object):
self.pgs = ''
self.stats = dict()
@self.zkhandler.zk_conn.DataWatch('/ceph/pools/{}/pgs'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('pool.pgs', self.name))
def watch_pool_node(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -306,7 +306,7 @@ class CephPoolInstance(object):
if data and data != self.pgs:
self.pgs = data
@self.zkhandler.zk_conn.DataWatch('/ceph/pools/{}/stats'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('pool.stats', self.name))
def watch_pool_stats(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -330,7 +330,7 @@ class CephVolumeInstance(object):
self.name = name
self.stats = dict()
@self.zkhandler.zk_conn.DataWatch('/ceph/volumes/{}/{}/stats'.format(self.pool, self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('volume.stats', f'{self.pool}/{self.name}'))
def watch_volume_stats(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -355,7 +355,7 @@ class CephSnapshotInstance(object):
self.name = name
self.stats = dict()
@self.zkhandler.zk_conn.DataWatch('/ceph/snapshots/{}/{}/{}/stats'.format(self.pool, self.volume, self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('snapshot.stats', f'{self.pool}/{self.volume}/{self.name}'))
def watch_snapshot_stats(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -382,7 +382,7 @@ def run_command(zkhandler, logger, this_node, data, d_osd):
node, device, weight = args.split(',')
if node == this_node.name:
# Lock the command queue
zk_lock = zkhandler.writelock('/cmd/ceph')
zk_lock = zkhandler.writelock('base.cmd.ceph')
with zk_lock:
# Add the OSD
result = add_osd(zkhandler, logger, node, device, weight)
@ -390,13 +390,13 @@ def run_command(zkhandler, logger, this_node, data, d_osd):
if result:
# Update the command queue
zkhandler.write([
('/cmd/ceph', 'success-{}'.format(data))
('base.cmd.ceph', 'success-{}'.format(data))
])
# Command failed
else:
# Update the command queue
zkhandler.write([
('/cmd/ceph', 'failure-{}'.format(data))
('base.cmd.ceph', 'failure-{}'.format(data))
])
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)
@ -408,7 +408,7 @@ def run_command(zkhandler, logger, this_node, data, d_osd):
# Verify osd_id is in the list
if d_osd[osd_id] and d_osd[osd_id].node == this_node.name:
# Lock the command queue
zk_lock = zkhandler.writelock('/cmd/ceph')
zk_lock = zkhandler.writelock('base.cmd.ceph')
with zk_lock:
# Remove the OSD
result = remove_osd(zkhandler, logger, osd_id, d_osd[osd_id])
@ -416,13 +416,13 @@ def run_command(zkhandler, logger, this_node, data, d_osd):
if result:
# Update the command queue
zkhandler.write([
('/cmd/ceph', 'success-{}'.format(data))
('base.cmd.ceph', 'success-{}'.format(data))
])
# Command failed
else:
# Update the command queue
zkhandler.write([
('/cmd/ceph', 'failure-{}'.format(data))
('base.cmd.ceph', 'failure-{}'.format(data))
])
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)

View File

@ -480,6 +480,9 @@ class AXFRDaemonInstance(object):
except psycopg2.IntegrityError as e:
if self.config['debug']:
self.logger.out('Failed to add record due to {}: {}'.format(e, name), state='d', prefix='dns-aggregator')
except psycopg2.errors.InFailedSqlTransaction as e:
if self.config['debug']:
self.logger.out('Failed to add record due to {}: {}'.format(e, name), state='d', prefix='dns-aggregator')
if changed:
# Increase SOA serial

View File

@ -32,6 +32,7 @@ import yaml
import json
from socket import gethostname
from datetime import datetime
from threading import Thread
from ipaddress import ip_address, ip_network
from apscheduler.schedulers.background import BackgroundScheduler
@ -49,12 +50,13 @@ import daemon_lib.common as common
import pvcnoded.VMInstance as VMInstance
import pvcnoded.NodeInstance as NodeInstance
import pvcnoded.VXNetworkInstance as VXNetworkInstance
import pvcnoded.SRIOVVFInstance as SRIOVVFInstance
import pvcnoded.DNSAggregatorInstance as DNSAggregatorInstance
import pvcnoded.CephInstance as CephInstance
import pvcnoded.MetadataAPIInstance as MetadataAPIInstance
# Version string for startup output
version = '0.9.19'
version = '0.9.28'
###############################################################################
# PVCD - node daemon startup program
@ -74,6 +76,9 @@ version = '0.9.19'
# Daemon functions
###############################################################################
# Ensure the update_timer is None until it's set for real
update_timer = None
# Create timer to update this node in Zookeeper
def startKeepaliveTimer():
@ -142,6 +147,7 @@ def readConfig(pvcnoded_config_file, myhostname):
# Handle the basic config (hypervisor-only)
try:
config_general = {
'node': o_config['pvc']['node'],
'coordinators': o_config['pvc']['cluster']['coordinators'],
'enable_hypervisor': o_config['pvc']['functions']['enable_hypervisor'],
'enable_networking': o_config['pvc']['functions']['enable_networking'],
@ -152,12 +158,14 @@ def readConfig(pvcnoded_config_file, myhostname):
'console_log_directory': o_config['pvc']['system']['configuration']['directories']['console_log_directory'],
'file_logging': o_config['pvc']['system']['configuration']['logging']['file_logging'],
'stdout_logging': o_config['pvc']['system']['configuration']['logging']['stdout_logging'],
'zookeeper_logging': o_config['pvc']['system']['configuration']['logging'].get('zookeeper_logging', False),
'log_colours': o_config['pvc']['system']['configuration']['logging']['log_colours'],
'log_dates': o_config['pvc']['system']['configuration']['logging']['log_dates'],
'log_keepalives': o_config['pvc']['system']['configuration']['logging']['log_keepalives'],
'log_keepalive_cluster_details': o_config['pvc']['system']['configuration']['logging']['log_keepalive_cluster_details'],
'log_keepalive_storage_details': o_config['pvc']['system']['configuration']['logging']['log_keepalive_storage_details'],
'console_log_lines': o_config['pvc']['system']['configuration']['logging']['console_log_lines'],
'node_log_lines': o_config['pvc']['system']['configuration']['logging'].get('node_log_lines', 0),
'vm_shutdown_timeout': int(o_config['pvc']['system']['intervals']['vm_shutdown_timeout']),
'keepalive_interval': int(o_config['pvc']['system']['intervals']['keepalive_interval']),
'fence_intervals': int(o_config['pvc']['system']['intervals']['fence_intervals']),
@ -220,6 +228,12 @@ def readConfig(pvcnoded_config_file, myhostname):
'upstream_mtu': o_config['pvc']['system']['configuration']['networking']['upstream']['mtu'],
'upstream_dev_ip': o_config['pvc']['system']['configuration']['networking']['upstream']['address'],
}
# Check if SR-IOV is enabled and activate
config_networking['enable_sriov'] = o_config['pvc']['system']['configuration']['networking'].get('sriov_enable', False)
if config_networking['enable_sriov']:
config_networking['sriov_device'] = list(o_config['pvc']['system']['configuration']['networking']['sriov_device'])
except Exception as e:
print('ERROR: Failed to load configuration: {}'.format(e))
exit(1)
@ -286,6 +300,7 @@ if debug:
# Handle the enable values
enable_hypervisor = config['enable_hypervisor']
enable_networking = config['enable_networking']
enable_sriov = config['enable_sriov']
enable_storage = config['enable_storage']
###############################################################################
@ -332,26 +347,26 @@ logger = log.Logger(config)
# Print our startup messages
logger.out('')
logger.out('|--------------------------------------------------|')
logger.out('| ######## ## ## ###### |')
logger.out('| ## ## ## ## ## ## |')
logger.out('| ## ## ## ## ## |')
logger.out('| ######## ## ## ## |')
logger.out('| ## ## ## ## |')
logger.out('| ## ## ## ## ## |')
logger.out('| ## ### ###### |')
logger.out('|--------------------------------------------------|')
logger.out('| Parallel Virtual Cluster node daemon v{0: <10} |'.format(version))
logger.out('| FQDN: {0: <42} |'.format(myfqdn))
logger.out('| Host: {0: <42} |'.format(myhostname))
logger.out('| ID: {0: <44} |'.format(mynodeid))
logger.out('| IPMI hostname: {0: <33} |'.format(config['ipmi_hostname']))
logger.out('| Machine details: |')
logger.out('| CPUs: {0: <40} |'.format(staticdata[0]))
logger.out('| Arch: {0: <40} |'.format(staticdata[3]))
logger.out('| OS: {0: <42} |'.format(staticdata[2]))
logger.out('| Kernel: {0: <38} |'.format(staticdata[1]))
logger.out('|--------------------------------------------------|')
logger.out('|----------------------------------------------------------|')
logger.out('| |')
logger.out('| ███████████ ▜█▙ ▟█▛ █████ █ █ █ |')
logger.out('| ██ ▜█▙ ▟█▛ ██ |')
logger.out('| ███████████ ▜█▙ ▟█▛ ██ |')
logger.out('| ██ ▜█▙▟█▛ ███████████ |')
logger.out('| |')
logger.out('|----------------------------------------------------------|')
logger.out('| Parallel Virtual Cluster node daemon v{0: <18} |'.format(version))
logger.out('| Debug: {0: <49} |'.format(str(config['debug'])))
logger.out('| FQDN: {0: <50} |'.format(myfqdn))
logger.out('| Host: {0: <50} |'.format(myhostname))
logger.out('| ID: {0: <52} |'.format(mynodeid))
logger.out('| IPMI hostname: {0: <41} |'.format(config['ipmi_hostname']))
logger.out('| Machine details: |')
logger.out('| CPUs: {0: <48} |'.format(staticdata[0]))
logger.out('| Arch: {0: <48} |'.format(staticdata[3]))
logger.out('| OS: {0: <50} |'.format(staticdata[2]))
logger.out('| Kernel: {0: <46} |'.format(staticdata[1]))
logger.out('|----------------------------------------------------------|')
logger.out('')
logger.out('Starting pvcnoded on host {}'.format(myfqdn), state='s')
@ -377,7 +392,40 @@ else:
fmt_purple = ''
###############################################################################
# PHASE 2a - Create local IP addresses for static networks
# PHASE 2a - Activate SR-IOV support
###############################################################################
# This happens before other networking steps to enable using VFs for cluster functions.
if enable_networking and enable_sriov:
logger.out('Setting up SR-IOV device support', state='i')
# Enable unsafe interruptts for the vfio_iommu_type1 kernel module
try:
common.run_os_command('modprobe vfio_iommu_type1 allow_unsafe_interrupts=1')
with open('/sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts', 'w') as mfh:
mfh.write('Y')
except Exception:
logger.out('Failed to enable kernel modules; SR-IOV may fail.', state='w')
# Loop through our SR-IOV NICs and enable the numvfs for each
for device in config['sriov_device']:
logger.out('Preparing SR-IOV PF {} with {} VFs'.format(device['phy'], device['vfcount']), state='i')
try:
with open('/sys/class/net/{}/device/sriov_numvfs'.format(device['phy']), 'r') as vfh:
current_sriov_count = vfh.read().strip()
with open('/sys/class/net/{}/device/sriov_numvfs'.format(device['phy']), 'w') as vfh:
vfh.write(str(device['vfcount']))
except FileNotFoundError:
logger.out('Failed to open SR-IOV configuration for PF {}; device may not support SR-IOV.'.format(device), state='w')
except OSError:
logger.out('Failed to set SR-IOV VF count for PF {} to {}; already set to {}.'.format(device['phy'], device['vfcount'], current_sriov_count), state='w')
if device.get('mtu', None) is not None:
logger.out('Setting SR-IOV PF {} to MTU {}'.format(device['phy'], device['mtu']), state='i')
common.run_os_command('ip link set {} mtu {} up'.format(device['phy'], device['mtu']))
###############################################################################
# PHASE 2b - Create local IP addresses for static networks
###############################################################################
if enable_networking:
@ -441,7 +489,7 @@ if enable_networking:
common.run_os_command('ip route add default via {} dev {}'.format(upstream_gateway, 'brupstream'))
###############################################################################
# PHASE 2b - Prepare sysctl for pvcnoded
# PHASE 2c - Prepare sysctl for pvcnoded
###############################################################################
if enable_networking:
@ -529,39 +577,118 @@ except Exception as e:
logger.out('ERROR: Failed to connect to Zookeeper cluster: {}'.format(e), state='e')
exit(1)
# Create the /config key if it does not exist
logger.out('Validating Zookeeper schema', state='i')
try:
zkhandler.read('/config')
node_schema_version = int(zkhandler.read(('node.data.active_schema', myhostname)))
except Exception:
node_schema_version = int(zkhandler.read('base.schema.version'))
if node_schema_version is None:
node_schema_version = 0
zkhandler.write([
('/config', ''),
('/config/primary_node', 'none'),
('/config/upstream_ip', 'none'),
('/config/maintenance', 'False'),
(('node.data.active_schema', myhostname), node_schema_version)
])
# MIGRATION - populate the keys from their old values
try:
primary_node = zkhandler.read('/primary_node')
# Load in the current node schema version
zkhandler.schema.load(node_schema_version)
# Record the latest intalled schema version
latest_schema_version = zkhandler.schema.find_latest()
logger.out('Latest installed schema is {}'.format(latest_schema_version), state='i')
zkhandler.write([
(('node.data.latest_schema', myhostname), latest_schema_version)
])
# Watch for a global schema update and fire
# This will only change by the API when triggered after seeing all nodes can update
@zkhandler.zk_conn.DataWatch(zkhandler.schema.path('base.schema.version'))
def update_schema(new_schema_version, stat, event=''):
global zkhandler, update_timer, node_schema_version
try:
new_schema_version = int(new_schema_version.decode('ascii'))
except Exception:
new_schema_version = 0
if new_schema_version == node_schema_version:
return True
logger.out('Hot update of schema version started', state='s')
logger.out('Current version: {} New version: {}'.format(node_schema_version, new_schema_version), state='s')
# Prevent any keepalive updates while this happens
if update_timer is not None:
stopKeepaliveTimer()
time.sleep(1)
# Perform the migration (primary only)
if zkhandler.read('base.config.primary_node') == myhostname:
logger.out('Primary node acquiring exclusive lock', state='s')
# Wait for things to settle
time.sleep(0.5)
# Acquire a write lock on the root key
with zkhandler.exclusivelock('base.schema.version'):
# Perform the schema migration tasks
logger.out('Performing schema update', state='s')
if new_schema_version > node_schema_version:
zkhandler.schema.migrate(zkhandler, new_schema_version)
if new_schema_version < node_schema_version:
zkhandler.schema.rollback(zkhandler, new_schema_version)
# Wait for the exclusive lock to be lifted
else:
logger.out('Non-primary node acquiring read lock', state='s')
# Wait for things to settle
time.sleep(1)
# Wait for a read lock
lock = zkhandler.readlock('base.schema.version')
lock.acquire()
# Wait a bit more for the primary to return to normal
time.sleep(1)
# Update the local schema version
logger.out('Updating node target schema version', state='s')
zkhandler.write([
('/config/primary_node', primary_node)
(('node.data.active_schema', myhostname), new_schema_version)
])
except Exception:
pass
try:
upstream_ip = zkhandler.read('/upstream_ip')
zkhandler.write([
('/config/upstream_ip', upstream_ip)
])
except Exception:
pass
try:
maintenance = zkhandler.read('/maintenance')
zkhandler.write([
('/config/maintenance', maintenance)
])
except Exception:
pass
node_schema_version = new_schema_version
# Restart the API daemons if applicable
logger.out('Restarting services', state='s')
common.run_os_command('systemctl restart pvcapid-worker.service')
if zkhandler.read('base.config.primary_node') == myhostname:
common.run_os_command('systemctl restart pvcapid.service')
# Restart ourselves with the new schema
logger.out('Reloading node daemon', state='s')
try:
zkhandler.disconnect(persistent=True)
del zkhandler
except Exception:
pass
os.execv(sys.argv[0], sys.argv)
# If we are the last node to get a schema update, fire the master update
if latest_schema_version > node_schema_version:
node_latest_schema_version = list()
for node in zkhandler.children('base.node'):
node_latest_schema_version.append(int(zkhandler.read(('node.data.latest_schema', node))))
# This is true if all elements of the latest schema version are identical to the latest version,
# i.e. they have all had the latest schema installed and ready to load.
if node_latest_schema_version.count(latest_schema_version) == len(node_latest_schema_version):
zkhandler.write([
('base.schema.version', latest_schema_version)
])
# Validate our schema against the active version
if not zkhandler.schema.validate(zkhandler, logger):
logger.out('Found schema violations, applying', state='i')
zkhandler.schema.apply(zkhandler)
else:
logger.out('Schema successfully validated', state='o')
###############################################################################
# PHASE 5 - Gracefully handle termination
@ -570,13 +697,13 @@ except Exception:
# Cleanup function
def cleanup():
global zkhandler, update_timer, d_domain
global logger, zkhandler, update_timer, d_domain
logger.out('Terminating pvcnoded and cleaning up', state='s')
# Set shutdown state in Zookeeper
zkhandler.write([
('/nodes/{}/daemonstate'.format(myhostname), 'shutdown')
(('node.state.daemon', myhostname), 'shutdown')
])
# Waiting for any flushes to complete
@ -599,7 +726,7 @@ def cleanup():
try:
if this_node.router_state == 'primary':
zkhandler.write([
('/config/primary_node', 'none')
('base.config.primary_node', 'none')
])
logger.out('Waiting for primary migration', state='s')
while this_node.router_state != 'secondary':
@ -620,7 +747,7 @@ def cleanup():
# Set stop state in Zookeeper
zkhandler.write([
('/nodes/{}/daemonstate'.format(myhostname), 'stop')
(('node.state.daemon', myhostname), 'stop')
])
# Forcibly terminate dnsmasq because it gets stuck sometimes
@ -628,13 +755,15 @@ def cleanup():
# Close the Zookeeper connection
try:
zkhandler.disconnect()
zkhandler.disconnect(persistent=True)
del zkhandler
except Exception:
pass
logger.out('Terminated pvc daemon', state='s')
sys.exit(0)
logger.terminate()
os._exit(0)
# Termination function
@ -663,50 +792,51 @@ if config['daemon_mode'] == 'coordinator':
init_routerstate = 'secondary'
else:
init_routerstate = 'client'
if zkhandler.exists('/nodes/{}'.format(myhostname)):
if zkhandler.exists(('node', myhostname)):
logger.out("Node is " + fmt_green + "present" + fmt_end + " in Zookeeper", state='i')
# Update static data just in case it's changed
zkhandler.write([
('/nodes/{}/daemonmode'.format(myhostname), config['daemon_mode']),
('/nodes/{}/daemonstate'.format(myhostname), 'init'),
('/nodes/{}/routerstate'.format(myhostname), init_routerstate),
('/nodes/{}/staticdata'.format(myhostname), ' '.join(staticdata)),
# Keepalives and fencing information (always load and set from config on boot)
('/nodes/{}/ipmihostname'.format(myhostname), config['ipmi_hostname']),
('/nodes/{}/ipmiusername'.format(myhostname), config['ipmi_username']),
('/nodes/{}/ipmipassword'.format(myhostname), config['ipmi_password'])
(('node', myhostname), config['daemon_mode']),
(('node.mode', myhostname), config['daemon_mode']),
(('node.state.daemon', myhostname), 'init'),
(('node.state.router', myhostname), init_routerstate),
(('node.data.static', myhostname), ' '.join(staticdata)),
(('node.data.pvc_version', myhostname), version),
(('node.ipmi.hostname', myhostname), config['ipmi_hostname']),
(('node.ipmi.username', myhostname), config['ipmi_username']),
(('node.ipmi.password', myhostname), config['ipmi_password']),
])
else:
logger.out("Node is " + fmt_red + "absent" + fmt_end + " in Zookeeper; adding new node", state='i')
keepalive_time = int(time.time())
zkhandler.write([
('/nodes/{}'.format(myhostname), config['daemon_mode']),
# Basic state information
('/nodes/{}/daemonmode'.format(myhostname), config['daemon_mode']),
('/nodes/{}/daemonstate'.format(myhostname), 'init'),
('/nodes/{}/routerstate'.format(myhostname), init_routerstate),
('/nodes/{}/domainstate'.format(myhostname), 'flushed'),
('/nodes/{}/staticdata'.format(myhostname), ' '.join(staticdata)),
('/nodes/{}/memtotal'.format(myhostname), '0'),
('/nodes/{}/memfree'.format(myhostname), '0'),
('/nodes/{}/memused'.format(myhostname), '0'),
('/nodes/{}/memalloc'.format(myhostname), '0'),
('/nodes/{}/memprov'.format(myhostname), '0'),
('/nodes/{}/vcpualloc'.format(myhostname), '0'),
('/nodes/{}/cpuload'.format(myhostname), '0.0'),
('/nodes/{}/networkscount'.format(myhostname), '0'),
('/nodes/{}/domainscount'.format(myhostname), '0'),
('/nodes/{}/runningdomains'.format(myhostname), ''),
# Keepalives and fencing information
('/nodes/{}/keepalive'.format(myhostname), str(keepalive_time)),
('/nodes/{}/ipmihostname'.format(myhostname), config['ipmi_hostname']),
('/nodes/{}/ipmiusername'.format(myhostname), config['ipmi_username']),
('/nodes/{}/ipmipassword'.format(myhostname), config['ipmi_password'])
(('node', myhostname), config['daemon_mode']),
(('node.keepalive', myhostname), str(keepalive_time)),
(('node.mode', myhostname), config['daemon_mode']),
(('node.state.daemon', myhostname), 'init'),
(('node.state.domain', myhostname), 'flushed'),
(('node.state.router', myhostname), init_routerstate),
(('node.data.static', myhostname), ' '.join(staticdata)),
(('node.data.pvc_version', myhostname), version),
(('node.ipmi.hostname', myhostname), config['ipmi_hostname']),
(('node.ipmi.username', myhostname), config['ipmi_username']),
(('node.ipmi.password', myhostname), config['ipmi_password']),
(('node.memory.total', myhostname), '0'),
(('node.memory.used', myhostname), '0'),
(('node.memory.free', myhostname), '0'),
(('node.memory.allocated', myhostname), '0'),
(('node.memory.provisioned', myhostname), '0'),
(('node.vcpu.allocated', myhostname), '0'),
(('node.cpu.load', myhostname), '0.0'),
(('node.running_domains', myhostname), '0'),
(('node.count.provisioned_domains', myhostname), '0'),
(('node.count.networks', myhostname), '0'),
])
# Check that the primary key exists, and create it with us as master if not
try:
current_primary = zkhandler.read('/config/primary_node')
current_primary = zkhandler.read('base.config.primary_node')
except kazoo.exceptions.NoNodeError:
current_primary = 'none'
@ -714,9 +844,9 @@ if current_primary and current_primary != 'none':
logger.out('Current primary node is {}{}{}.'.format(fmt_blue, current_primary, fmt_end), state='i')
else:
if config['daemon_mode'] == 'coordinator':
logger.out('No primary node found; creating with us as primary.', state='i')
logger.out('No primary node found; setting us as primary.', state='i')
zkhandler.write([
('/config/primary_node', myhostname)
('base.config.primary_node', myhostname)
])
###############################################################################
@ -798,12 +928,15 @@ logger.out('Setting up objects', state='i')
d_node = dict()
d_network = dict()
d_sriov_vf = dict()
d_domain = dict()
d_osd = dict()
d_pool = dict()
d_volume = dict() # Dict of Dicts
node_list = []
network_list = []
sriov_pf_list = []
sriov_vf_list = []
domain_list = []
osd_list = []
pool_list = []
@ -823,7 +956,7 @@ else:
# Node objects
@zkhandler.zk_conn.ChildrenWatch('/nodes')
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('base.node'))
def update_nodes(new_node_list):
global node_list, d_node
@ -852,7 +985,7 @@ this_node = d_node[myhostname]
# Maintenance mode
@zkhandler.zk_conn.DataWatch('/config/maintenance')
@zkhandler.zk_conn.DataWatch(zkhandler.schema.path('base.config.maintenance'))
def set_maintenance(_maintenance, stat, event=''):
global maintenance
try:
@ -862,7 +995,7 @@ def set_maintenance(_maintenance, stat, event=''):
# Primary node
@zkhandler.zk_conn.DataWatch('/config/primary_node')
@zkhandler.zk_conn.DataWatch(zkhandler.schema.path('base.config.primary_node'))
def update_primary(new_primary, stat, event=''):
try:
new_primary = new_primary.decode('ascii')
@ -877,7 +1010,7 @@ def update_primary(new_primary, stat, event=''):
if this_node.daemon_state == 'run' and this_node.router_state not in ['primary', 'takeover', 'relinquish']:
logger.out('Contending for primary coordinator state', state='i')
# Acquire an exclusive lock on the primary_node key
primary_lock = zkhandler.exclusivelock('/config/primary_node')
primary_lock = zkhandler.exclusivelock('base.config.primary_node')
try:
# This lock times out after 0.4s, which is 0.1s less than the pre-takeover
# timeout below, thus ensuring that a primary takeover will not deadlock
@ -885,9 +1018,9 @@ def update_primary(new_primary, stat, event=''):
primary_lock.acquire(timeout=0.4)
# Ensure when we get the lock that the versions are still consistent and that
# another node hasn't already acquired primary state
if key_version == zkhandler.zk_conn.get('/config/primary_node')[1].version:
if key_version == zkhandler.zk_conn.get(zkhandler.schema.path('base.config.primary_node'))[1].version:
zkhandler.write([
('/config/primary_node', myhostname)
('base.config.primary_node', myhostname)
])
# Cleanly release the lock
primary_lock.release()
@ -898,17 +1031,17 @@ def update_primary(new_primary, stat, event=''):
if this_node.router_state == 'secondary':
time.sleep(0.5)
zkhandler.write([
('/nodes/{}/routerstate'.format(myhostname), 'takeover')
(('node.state.router', myhostname), 'takeover')
])
else:
if this_node.router_state == 'primary':
time.sleep(0.5)
zkhandler.write([
('/nodes/{}/routerstate'.format(myhostname), 'relinquish')
(('node.state.router', myhostname), 'relinquish')
])
else:
zkhandler.write([
('/nodes/{}/routerstate'.format(myhostname), 'client')
(('node.state.router', myhostname), 'client')
])
for node in d_node:
@ -917,7 +1050,7 @@ def update_primary(new_primary, stat, event=''):
if enable_networking:
# Network objects
@zkhandler.zk_conn.ChildrenWatch('/networks')
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('base.network'))
def update_networks(new_network_list):
global network_list, d_network
@ -958,15 +1091,133 @@ if enable_networking:
for node in d_node:
d_node[node].update_network_list(d_network)
# Add the SR-IOV PFs and VFs to Zookeeper
# These do not behave like the objects; they are not dynamic (the API cannot change them), and they
# exist for the lifetime of this Node instance. The objects are set here in Zookeeper on a per-node
# basis, under the Node configuration tree.
# MIGRATION: The schema.schema.get ensures that the current active Schema contains the required keys
if enable_sriov and zkhandler.schema.schema.get('sriov_pf', None) is not None:
vf_list = list()
for device in config['sriov_device']:
pf = device['phy']
vfcount = device['vfcount']
if device.get('mtu', None) is None:
mtu = 1500
else:
mtu = device['mtu']
# Create the PF device in Zookeeper
zkhandler.write([
(('node.sriov.pf', myhostname, 'sriov_pf', pf), ''),
(('node.sriov.pf', myhostname, 'sriov_pf.mtu', pf), mtu),
(('node.sriov.pf', myhostname, 'sriov_pf.vfcount', pf), vfcount),
])
# Append the device to the list of PFs
sriov_pf_list.append(pf)
# Get the list of VFs from `ip link show`
vf_list = json.loads(common.run_os_command('ip --json link show {}'.format(pf))[1])[0].get('vfinfo_list', [])
for vf in vf_list:
# {
# 'vf': 3,
# 'link_type': 'ether',
# 'address': '00:00:00:00:00:00',
# 'broadcast': 'ff:ff:ff:ff:ff:ff',
# 'vlan_list': [{'vlan': 101, 'qos': 2}],
# 'rate': {'max_tx': 0, 'min_tx': 0},
# 'spoofchk': True,
# 'link_state': 'auto',
# 'trust': False,
# 'query_rss_en': False
# }
vfphy = '{}v{}'.format(pf, vf['vf'])
# Get the PCIe bus information
dev_pcie_path = None
try:
with open('/sys/class/net/{}/device/uevent'.format(vfphy)) as vfh:
dev_uevent = vfh.readlines()
for line in dev_uevent:
if re.match(r'^PCI_SLOT_NAME=.*', line):
dev_pcie_path = line.rstrip().split('=')[-1]
except FileNotFoundError:
# Something must already be using the PCIe device
pass
# Add the VF to Zookeeper if it does not yet exist
if not zkhandler.exists(('node.sriov.vf', myhostname, 'sriov_vf', vfphy)):
if dev_pcie_path is not None:
pcie_domain, pcie_bus, pcie_slot, pcie_function = re.split(r':|\.', dev_pcie_path)
else:
# We can't add the device - for some reason we can't get any information on its PCIe bus path,
# so just ignore this one, and continue.
# This shouldn't happen under any real circumstances, unless the admin tries to attach a non-existent
# VF to a VM manually, then goes ahead and adds that VF to the system with the VM running.
continue
zkhandler.write([
(('node.sriov.vf', myhostname, 'sriov_vf', vfphy), ''),
(('node.sriov.vf', myhostname, 'sriov_vf.pf', vfphy), pf),
(('node.sriov.vf', myhostname, 'sriov_vf.mtu', vfphy), mtu),
(('node.sriov.vf', myhostname, 'sriov_vf.mac', vfphy), vf['address']),
(('node.sriov.vf', myhostname, 'sriov_vf.phy_mac', vfphy), vf['address']),
(('node.sriov.vf', myhostname, 'sriov_vf.config', vfphy), ''),
(('node.sriov.vf', myhostname, 'sriov_vf.config.vlan_id', vfphy), vf['vlan_list'][0].get('vlan', '0')),
(('node.sriov.vf', myhostname, 'sriov_vf.config.vlan_qos', vfphy), vf['vlan_list'][0].get('qos', '0')),
(('node.sriov.vf', myhostname, 'sriov_vf.config.tx_rate_min', vfphy), vf['rate']['min_tx']),
(('node.sriov.vf', myhostname, 'sriov_vf.config.tx_rate_max', vfphy), vf['rate']['max_tx']),
(('node.sriov.vf', myhostname, 'sriov_vf.config.spoof_check', vfphy), vf['spoofchk']),
(('node.sriov.vf', myhostname, 'sriov_vf.config.link_state', vfphy), vf['link_state']),
(('node.sriov.vf', myhostname, 'sriov_vf.config.trust', vfphy), vf['trust']),
(('node.sriov.vf', myhostname, 'sriov_vf.config.query_rss', vfphy), vf['query_rss_en']),
(('node.sriov.vf', myhostname, 'sriov_vf.pci', vfphy), ''),
(('node.sriov.vf', myhostname, 'sriov_vf.pci.domain', vfphy), pcie_domain),
(('node.sriov.vf', myhostname, 'sriov_vf.pci.bus', vfphy), pcie_bus),
(('node.sriov.vf', myhostname, 'sriov_vf.pci.slot', vfphy), pcie_slot),
(('node.sriov.vf', myhostname, 'sriov_vf.pci.function', vfphy), pcie_function),
(('node.sriov.vf', myhostname, 'sriov_vf.used', vfphy), False),
(('node.sriov.vf', myhostname, 'sriov_vf.used_by', vfphy), ''),
])
# Append the device to the list of VFs
sriov_vf_list.append(vfphy)
# Remove any obsolete PFs from Zookeeper if they go away
for pf in zkhandler.children(('node.sriov.pf', myhostname)):
if pf not in sriov_pf_list:
zkhandler.delete([
('node.sriov.pf', myhostname, 'sriov_pf', pf)
])
# Remove any obsolete VFs from Zookeeper if their PF goes away
for vf in zkhandler.children(('node.sriov.vf', myhostname)):
vf_pf = zkhandler.read(('node.sriov.vf', myhostname, 'sriov_vf.pf', vf))
if vf_pf not in sriov_pf_list:
zkhandler.delete([
('node.sriov.vf', myhostname, 'sriov_vf', vf)
])
# SR-IOV VF objects
# This is a ChildrenWatch just for consistency; the list never changes at runtime
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('node.sriov.vf', myhostname))
def update_sriov_vfs(new_sriov_vf_list):
global sriov_vf_list, d_sriov_vf
# Add VFs to the list
for vf in common.sortInterfaceNames(new_sriov_vf_list):
d_sriov_vf[vf] = SRIOVVFInstance.SRIOVVFInstance(vf, zkhandler, config, logger, this_node)
sriov_vf_list = sorted(new_sriov_vf_list)
logger.out('{}SR-IOV VF list:{} {}'.format(fmt_blue, fmt_end, ' '.join(sriov_vf_list)), state='i')
if enable_hypervisor:
# VM command pipeline key
@zkhandler.zk_conn.DataWatch('/cmd/domains')
@zkhandler.zk_conn.DataWatch(zkhandler.schema.path('base.cmd.domain'))
def cmd_domains(data, stat, event=''):
if data:
VMInstance.run_command(zkhandler, logger, this_node, data.decode('ascii'))
# VM domain objects
@zkhandler.zk_conn.ChildrenWatch('/domains')
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('base.domain'))
def update_domains(new_domain_list):
global domain_list, d_domain
@ -991,13 +1242,13 @@ if enable_hypervisor:
if enable_storage:
# Ceph command pipeline key
@zkhandler.zk_conn.DataWatch('/cmd/ceph')
@zkhandler.zk_conn.DataWatch(zkhandler.schema.path('base.cmd.ceph'))
def cmd_ceph(data, stat, event=''):
if data:
CephInstance.run_command(zkhandler, logger, this_node, data.decode('ascii'), d_osd)
# OSD objects
@zkhandler.zk_conn.ChildrenWatch('/ceph/osds')
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('base.osd'))
def update_osds(new_osd_list):
global osd_list, d_osd
@ -1017,7 +1268,7 @@ if enable_storage:
logger.out('{}OSD list:{} {}'.format(fmt_blue, fmt_end, ' '.join(osd_list)), state='i')
# Pool objects
@zkhandler.zk_conn.ChildrenWatch('/ceph/pools')
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('base.pool'))
def update_pools(new_pool_list):
global pool_list, d_pool
@ -1040,7 +1291,7 @@ if enable_storage:
# Volume objects in each pool
for pool in pool_list:
@zkhandler.zk_conn.ChildrenWatch('/ceph/volumes/{}'.format(pool))
@zkhandler.zk_conn.ChildrenWatch(zkhandler.schema.path('volume', pool))
def update_volumes(new_volume_list):
global volume_list, d_volume
@ -1089,11 +1340,13 @@ def collect_ceph_stats(queue):
ceph_health = health_status['status']
except Exception as e:
logger.out('Failed to obtain Ceph health data: {}'.format(e), state='e')
return
ceph_health = 'HEALTH_UNKN'
if ceph_health == 'HEALTH_OK':
if ceph_health in ['HEALTH_OK']:
ceph_health_colour = fmt_green
elif ceph_health == 'HEALTH_WARN':
elif ceph_health in ['HEALTH_UNKN']:
ceph_health_colour = fmt_cyan
elif ceph_health in ['HEALTH_WARN']:
ceph_health_colour = fmt_yellow
else:
ceph_health_colour = fmt_red
@ -1107,11 +1360,10 @@ def collect_ceph_stats(queue):
ceph_status = ceph_conn.mon_command(json.dumps(command), b'', timeout=1)[1].decode('ascii')
try:
zkhandler.write([
('/ceph', str(ceph_status))
('base.storage', str(ceph_status))
])
except Exception as e:
logger.out('Failed to set Ceph status data: {}'.format(e), state='e')
return
if debug:
logger.out("Set ceph rados df information in zookeeper (primary only)", state='d', prefix='ceph-thread')
@ -1121,19 +1373,19 @@ def collect_ceph_stats(queue):
ceph_df = ceph_conn.mon_command(json.dumps(command), b'', timeout=1)[1].decode('ascii')
try:
zkhandler.write([
('/ceph/util', str(ceph_df))
('base.storage.util', str(ceph_df))
])
except Exception as e:
logger.out('Failed to set Ceph utilization data: {}'.format(e), state='e')
return
if debug:
logger.out("Set pool information in zookeeper (primary only)", state='d', prefix='ceph-thread')
# Get pool info
retcode, stdout, stderr = common.run_os_command('ceph df --format json', timeout=1)
command = {"prefix": "df", "format": "json"}
ceph_df_output = ceph_conn.mon_command(json.dumps(command), b'', timeout=1)[1].decode('ascii')
try:
ceph_pool_df_raw = json.loads(stdout)['pools']
ceph_pool_df_raw = json.loads(ceph_df_output)['pools']
except Exception as e:
logger.out('Failed to obtain Pool data (ceph df): {}'.format(e), state='w')
ceph_pool_df_raw = []
@ -1186,7 +1438,7 @@ def collect_ceph_stats(queue):
# Write the pool data to Zookeeper
zkhandler.write([
('/ceph/pools/{}/stats'.format(pool['name']), str(json.dumps(pool_df)))
(('pool.stats', pool['name']), str(json.dumps(pool_df)))
])
except Exception as e:
# One or more of the status commands timed out, just continue
@ -1204,9 +1456,9 @@ def collect_ceph_stats(queue):
osd_dump = dict()
command = {"prefix": "osd dump", "format": "json"}
osd_dump_output = ceph_conn.mon_command(json.dumps(command), b'', timeout=1)[1].decode('ascii')
try:
retcode, stdout, stderr = common.run_os_command('ceph osd dump --format json --connect-timeout 2', timeout=2)
osd_dump_raw = json.loads(stdout)['osds']
osd_dump_raw = json.loads(osd_dump_output)['osds']
except Exception as e:
logger.out('Failed to obtain OSD data: {}'.format(e), state='w')
osd_dump_raw = []
@ -1322,7 +1574,7 @@ def collect_ceph_stats(queue):
try:
stats = json.dumps(osd_stats[osd])
zkhandler.write([
('/ceph/osds/{}/stats'.format(osd), str(stats))
(('osd.stats', osd), str(stats))
])
except KeyError as e:
# One or more of the status commands timed out, just continue
@ -1363,7 +1615,6 @@ def collect_vm_stats(queue):
lv_conn = libvirt.open(libvirt_name)
if lv_conn is None:
logger.out('Failed to open connection to "{}"'.format(libvirt_name), state='e')
return
memalloc = 0
memprov = 0
@ -1389,7 +1640,7 @@ def collect_vm_stats(queue):
# Toggle a state "change"
logger.out("Resetting state to {} for VM {}".format(instance.getstate(), instance.domname), state='i', prefix='vm-thread')
zkhandler.write([
('/domains/{}/state'.format(domain), instance.getstate())
(('domain.state', domain), instance.getstate())
])
elif instance.getnode() == this_node.name:
memprov += instance.getmemory()
@ -1447,6 +1698,9 @@ def collect_vm_stats(queue):
logger.out("Getting network statistics for VM {}".format(domain_name), state='d', prefix='vm-thread')
domain_network_stats = []
for interface in tree.findall('devices/interface'):
interface_type = interface.get('type')
if interface_type not in ['bridge']:
continue
interface_name = interface.find('target').get('dev')
interface_bridge = interface.find('source').get('bridge')
interface_stats = domain.interfaceStats(interface_name)
@ -1481,7 +1735,7 @@ def collect_vm_stats(queue):
try:
zkhandler.write([
("/domains/{}/stats".format(domain_uuid), str(json.dumps(domain_stats)))
(('domain.stats', domain_uuid), str(json.dumps(domain_stats)))
])
except Exception as e:
if debug:
@ -1500,6 +1754,7 @@ def collect_vm_stats(queue):
# Keepalive update function
@common.Profiler(config)
def node_keepalive():
if debug:
logger.out("Keepalive starting", state='d', prefix='main-thread')
@ -1508,32 +1763,33 @@ def node_keepalive():
if config['enable_hypervisor']:
if this_node.router_state == 'primary':
try:
if zkhandler.read('/config/migration_target_selector') != config['migration_target_selector']:
if zkhandler.read('base.config.migration_target_selector') != config['migration_target_selector']:
raise
except Exception:
zkhandler.write([
('/config/migration_target_selector', config['migration_target_selector'])
('base.config.migration_target_selector', config['migration_target_selector'])
])
# Set the upstream IP in Zookeeper for clients to read
if config['enable_networking']:
if this_node.router_state == 'primary':
try:
if zkhandler.read('/config/upstream_ip') != config['upstream_floating_ip']:
if zkhandler.read('base.config.upstream_ip') != config['upstream_floating_ip']:
raise
except Exception:
zkhandler.write([
('/config/upstream_ip', config['upstream_floating_ip'])
('base.config.upstream_ip', config['upstream_floating_ip'])
])
# Get past state and update if needed
if debug:
logger.out("Get past state and update if needed", state='d', prefix='main-thread')
past_state = zkhandler.read('/nodes/{}/daemonstate'.format(this_node.name))
if past_state != 'run':
past_state = zkhandler.read(('node.state.daemon', this_node.name))
if past_state != 'run' and past_state != 'shutdown':
this_node.daemon_state = 'run'
zkhandler.write([
('/nodes/{}/daemonstate'.format(this_node.name), 'run')
(('node.state.daemon', this_node.name), 'run')
])
else:
this_node.daemon_state = 'run'
@ -1542,9 +1798,9 @@ def node_keepalive():
if debug:
logger.out("Ensure the primary key is properly set", state='d', prefix='main-thread')
if this_node.router_state == 'primary':
if zkhandler.read('/config/primary_node') != this_node.name:
if zkhandler.read('base.config.primary_node') != this_node.name:
zkhandler.write([
('/config/primary_node', this_node.name)
('base.config.primary_node', this_node.name)
])
# Run VM statistics collection in separate thread for parallelization
@ -1606,20 +1862,19 @@ def node_keepalive():
logger.out("Set our information in zookeeper", state='d', prefix='main-thread')
try:
zkhandler.write([
('/nodes/{}/memtotal'.format(this_node.name), str(this_node.memtotal)),
('/nodes/{}/memused'.format(this_node.name), str(this_node.memused)),
('/nodes/{}/memfree'.format(this_node.name), str(this_node.memfree)),
('/nodes/{}/memalloc'.format(this_node.name), str(this_node.memalloc)),
('/nodes/{}/memprov'.format(this_node.name), str(this_node.memprov)),
('/nodes/{}/vcpualloc'.format(this_node.name), str(this_node.vcpualloc)),
('/nodes/{}/cpuload'.format(this_node.name), str(this_node.cpuload)),
('/nodes/{}/domainscount'.format(this_node.name), str(this_node.domains_count)),
('/nodes/{}/runningdomains'.format(this_node.name), ' '.join(this_node.domain_list)),
('/nodes/{}/keepalive'.format(this_node.name), str(keepalive_time))
(('node.memory.total', this_node.name), str(this_node.memtotal)),
(('node.memory.used', this_node.name), str(this_node.memused)),
(('node.memory.free', this_node.name), str(this_node.memfree)),
(('node.memory.allocated', this_node.name), str(this_node.memalloc)),
(('node.memory.provisioned', this_node.name), str(this_node.memprov)),
(('node.vcpu.allocated', this_node.name), str(this_node.vcpualloc)),
(('node.cpu.load', this_node.name), str(this_node.cpuload)),
(('node.count.provisioned_domains', this_node.name), str(this_node.domains_count)),
(('node.running_domains', this_node.name), ' '.join(this_node.domain_list)),
(('node.keepalive', this_node.name), str(keepalive_time)),
])
except Exception:
logger.out('Failed to set keepalive data', state='e')
return
# Display node information to the terminal
if config['log_keepalives']:
@ -1630,9 +1885,10 @@ def node_keepalive():
else:
cst_colour = fmt_cyan
logger.out(
'{}{} keepalive{} [{}{}{}]'.format(
'{}{} keepalive @ {}{} [{}{}{}]'.format(
fmt_purple,
myhostname,
datetime.now(),
fmt_end,
fmt_bold + cst_colour,
this_node.router_state,
@ -1685,8 +1941,8 @@ def node_keepalive():
if config['daemon_mode'] == 'coordinator':
for node_name in d_node:
try:
node_daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node_name))
node_keepalive = int(zkhandler.read('/nodes/{}/keepalive'.format(node_name)))
node_daemon_state = zkhandler.read(('node.state.daemon', node_name))
node_keepalive = int(zkhandler.read(('node.keepalive', node_name)))
except Exception:
node_daemon_state = 'unknown'
node_keepalive = 0
@ -1697,16 +1953,16 @@ def node_keepalive():
node_deadtime = int(time.time()) - (int(config['keepalive_interval']) * int(config['fence_intervals']))
if node_keepalive < node_deadtime and node_daemon_state == 'run':
logger.out('Node {} seems dead - starting monitor for fencing'.format(node_name), state='w')
zk_lock = zkhandler.writelock('/nodes/{}/daemonstate'.format(node_name))
zk_lock = zkhandler.writelock(('node.state.daemon', node_name))
with zk_lock:
# Ensures that, if we lost the lock race and come out of waiting,
# we won't try to trigger our own fence thread.
if zkhandler.read('/nodes/{}/daemonstate'.format(node_name)) != 'dead':
if zkhandler.read(('node.state.daemon', node_name)) != 'dead':
fence_thread = Thread(target=fencing.fenceNode, args=(node_name, zkhandler, config, logger), kwargs={})
fence_thread.start()
# Write the updated data after we start the fence thread
zkhandler.write([
('/nodes/{}/daemonstate'.format(node_name), 'dead')
(('node.state.daemon', node_name), 'dead')
])
if debug:

View File

@ -152,8 +152,11 @@ class MetadataAPIInstance(object):
cur.execute(query, args)
data_raw = cur.fetchone()
self.close_database(conn, cur)
data = data_raw.get('userdata', None)
return data
if data_raw is not None:
data = data_raw.get('userdata', None)
return data
else:
return None
# VM details function
def get_vm_details(self, source_address):
@ -177,7 +180,7 @@ class MetadataAPIInstance(object):
client_macaddr = host_information.get('mac_address', None)
# Find the VM with that MAC address - we can't assume that the hostname is actually right
_discard, vm_list = pvc_vm.get_list(self.zkhandler, None, None, None)
_discard, vm_list = pvc_vm.get_list(self.zkhandler, None, None, None, None)
vm_details = dict()
for vm in vm_list:
try:

View File

@ -38,7 +38,7 @@ class NodeInstance(object):
# Which node is primary
self.primary_node = None
# States
self.daemon_mode = self.zkhandler.read('/nodes/{}/daemonmode'.format(self.name))
self.daemon_mode = self.zkhandler.read(('node.mode', self.name))
self.daemon_state = 'stop'
self.router_state = 'client'
self.domain_state = 'ready'
@ -90,7 +90,7 @@ class NodeInstance(object):
self.flush_stopper = False
# Zookeeper handlers for changed states
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/daemonstate'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.state.daemon', self.name))
def watch_node_daemonstate(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -105,7 +105,7 @@ class NodeInstance(object):
if data != self.daemon_state:
self.daemon_state = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/routerstate'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.state.router', self.name))
def watch_node_routerstate(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -135,10 +135,10 @@ class NodeInstance(object):
else:
# We did nothing, so just become secondary state
self.zkhandler.write([
('/nodes/{}/routerstate'.format(self.name), 'secondary')
(('node.state.router', self.name), 'secondary')
])
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/domainstate'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.state.domain', self.name))
def watch_node_domainstate(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -171,7 +171,7 @@ class NodeInstance(object):
self.flush_thread = Thread(target=self.unflush, args=(), kwargs={})
self.flush_thread.start()
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/memfree'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.memory.free', self.name))
def watch_node_memfree(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -186,7 +186,7 @@ class NodeInstance(object):
if data != self.memfree:
self.memfree = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/memused'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.memory.used', self.name))
def watch_node_memused(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -201,7 +201,7 @@ class NodeInstance(object):
if data != self.memused:
self.memused = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/memalloc'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.memory.allocated', self.name))
def watch_node_memalloc(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -216,7 +216,7 @@ class NodeInstance(object):
if data != self.memalloc:
self.memalloc = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/vcpualloc'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.vcpu.allocated', self.name))
def watch_node_vcpualloc(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -231,7 +231,7 @@ class NodeInstance(object):
if data != self.vcpualloc:
self.vcpualloc = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/runningdomains'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.running_domains', self.name))
def watch_node_runningdomains(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -246,7 +246,7 @@ class NodeInstance(object):
if data != self.domain_list:
self.domain_list = data
@self.zkhandler.zk_conn.DataWatch('/nodes/{}/domainscount'.format(self.name))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.count.provisioned_domains', self.name))
def watch_node_domainscount(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -324,30 +324,30 @@ class NodeInstance(object):
Acquire primary coordinator status from a peer node
"""
# Lock the primary node until transition is complete
primary_lock = self.zkhandler.exclusivelock('/config/primary_node')
primary_lock = self.zkhandler.exclusivelock('base.config.primary_node')
primary_lock.acquire()
# Ensure our lock key is populated
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
# Synchronize nodes A (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase A', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase A', state='o')
time.sleep(1) # Time fir reader to acquire the lock
self.logger.out('Releasing write lock for synchronization phase A', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase A', state='o')
time.sleep(0.1) # Time fir new writer to acquire the lock
# Synchronize nodes B (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase B', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase B', state='o')
@ -356,7 +356,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase B', state='o')
# Synchronize nodes C (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase C', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase C', state='o')
@ -373,13 +373,13 @@ class NodeInstance(object):
common.createIPAddress(self.upstream_floatingipaddr, self.upstream_cidrnetmask, 'brupstream')
self.logger.out('Releasing write lock for synchronization phase C', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase C', state='o')
# Synchronize nodes D (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase D', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase D', state='o')
@ -405,13 +405,13 @@ class NodeInstance(object):
common.createIPAddress(self.storage_floatingipaddr, self.storage_cidrnetmask, 'brstorage')
self.logger.out('Releasing write lock for synchronization phase D', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase D', state='o')
# Synchronize nodes E (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase E', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase E', state='o')
@ -428,13 +428,13 @@ class NodeInstance(object):
common.createIPAddress('169.254.169.254', '32', 'lo')
self.logger.out('Releasing write lock for synchronization phase E', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase E', state='o')
# Synchronize nodes F (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase F', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase F', state='o')
@ -444,13 +444,13 @@ class NodeInstance(object):
self.d_network[network].createGateways()
self.logger.out('Releasing write lock for synchronization phase F', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase F', state='o')
# Synchronize nodes G (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase G', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase G', state='o')
@ -466,7 +466,6 @@ class NodeInstance(object):
"""
patronictl
-c /etc/patroni/config.yml
-d zookeeper://localhost:2181
switchover
--candidate {}
--force
@ -503,6 +502,7 @@ class NodeInstance(object):
# 6. Start client API (and provisioner worker)
if self.config['enable_api']:
self.logger.out('Starting PVC API client service', state='i')
common.run_os_command("systemctl enable pvcapid.service")
common.run_os_command("systemctl start pvcapid.service")
self.logger.out('Starting PVC Provisioner Worker service', state='i')
common.run_os_command("systemctl start pvcapid-worker.service")
@ -518,7 +518,7 @@ class NodeInstance(object):
self.logger.out('Not starting DNS aggregator due to Patroni failures', state='e')
self.logger.out('Releasing write lock for synchronization phase G', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase G', state='o')
@ -527,7 +527,7 @@ class NodeInstance(object):
time.sleep(2)
primary_lock.release()
self.zkhandler.write([
('/nodes/{}/routerstate'.format(self.name), 'primary')
(('node.state.router', self.name), 'primary')
])
self.logger.out('Node {} transitioned to primary state'.format(self.name), state='o')
@ -538,7 +538,7 @@ class NodeInstance(object):
time.sleep(0.2) # Initial delay for the first writer to grab the lock
# Synchronize nodes A (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase A', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase A', state='o')
@ -547,7 +547,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase A', state='o')
# Synchronize nodes B (I am writer)
lock = self.zkhandler.writelock('/locks/primary_node')
lock = self.zkhandler.writelock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring write lock for synchronization phase B', state='i')
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase B', state='o')
@ -559,7 +559,7 @@ class NodeInstance(object):
self.d_network[network].stopDHCPServer()
self.logger.out('Releasing write lock for synchronization phase B', state='i')
self.zkhandler.write([
('/locks/primary_node', '')
('base.config.primary_node.sync_lock', '')
])
lock.release()
self.logger.out('Released write lock for synchronization phase B', state='o')
@ -567,12 +567,13 @@ class NodeInstance(object):
if self.config['enable_api']:
self.logger.out('Stopping PVC API client service', state='i')
common.run_os_command("systemctl stop pvcapid.service")
common.run_os_command("systemctl disable pvcapid.service")
# 4. Stop metadata API
self.metadata_api.stop()
time.sleep(0.1) # Time fir new writer to acquire the lock
# Synchronize nodes C (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase C', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase C', state='o')
@ -591,7 +592,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase C', state='o')
# Synchronize nodes D (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase D', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase D', state='o')
@ -619,7 +620,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase D', state='o')
# Synchronize nodes E (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase E', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase E', state='o')
@ -638,7 +639,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase E', state='o')
# Synchronize nodes F (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase F', state='i')
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase F', state='o')
@ -650,7 +651,7 @@ class NodeInstance(object):
self.logger.out('Released read lock for synchronization phase F', state='o')
# Synchronize nodes G (I am reader)
lock = self.zkhandler.readlock('/locks/primary_node')
lock = self.zkhandler.readlock('base.config.primary_node.sync_lock')
self.logger.out('Acquiring read lock for synchronization phase G', state='i')
try:
lock.acquire(timeout=60) # Don't wait forever and completely block us
@ -664,7 +665,7 @@ class NodeInstance(object):
# Wait 2 seconds for everything to stabilize before we declare all-done
time.sleep(2)
self.zkhandler.write([
('/nodes/{}/routerstate'.format(self.name), 'secondary')
(('node.state.router', self.name), 'secondary')
])
self.logger.out('Node {} transitioned to secondary state'.format(self.name), state='o')
@ -685,10 +686,10 @@ class NodeInstance(object):
self.logger.out('Selecting target to migrate VM "{}"'.format(dom_uuid), state='i')
# Don't replace the previous node if the VM is already migrated
if self.zkhandler.read('/domains/{}/lastnode'.format(dom_uuid)):
current_node = self.zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
if self.zkhandler.read(('domain.last_node', dom_uuid)):
current_node = self.zkhandler.read(('domain.last_node', dom_uuid))
else:
current_node = self.zkhandler.read('/domains/{}/node'.format(dom_uuid))
current_node = self.zkhandler.read(('domain.node', dom_uuid))
target_node = common.findTargetNode(self.zkhandler, dom_uuid)
if target_node == current_node:
@ -697,20 +698,20 @@ class NodeInstance(object):
if target_node is None:
self.logger.out('Failed to find migration target for VM "{}"; shutting down and setting autostart flag'.format(dom_uuid), state='e')
self.zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'shutdown'),
('/domains/{}/node_autostart'.format(dom_uuid), 'True')
(('domain.state', dom_uuid), 'shutdown'),
(('domain.meta.autostart', dom_uuid), 'True'),
])
else:
self.logger.out('Migrating VM "{}" to node "{}"'.format(dom_uuid, target_node), state='i')
self.zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'migrate'),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), current_node)
(('domain.state', dom_uuid), 'migrate'),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), current_node),
])
# Wait for the VM to migrate so the next VM's free RAM count is accurate (they migrate in serial anyways)
ticks = 0
while self.zkhandler.read('/domains/{}/state'.format(dom_uuid)) in ['migrate', 'unmigrate', 'shutdown']:
while self.zkhandler.read(('domain.state', dom_uuid)) in ['migrate', 'unmigrate', 'shutdown']:
ticks += 1
if ticks > 600:
# Abort if we've waited for 120 seconds, the VM is messed and just continue
@ -718,8 +719,8 @@ class NodeInstance(object):
time.sleep(0.2)
self.zkhandler.write([
('/nodes/{}/runningdomains'.format(self.name), ''),
('/nodes/{}/domainstate'.format(self.name), 'flushed')
(('node.running_domains', self.name), ''),
(('node.state.domain', self.name), 'flushed'),
])
self.flush_thread = None
self.flush_stopper = False
@ -737,20 +738,20 @@ class NodeInstance(object):
return
# Handle autostarts
autostart = self.zkhandler.read('/domains/{}/node_autostart'.format(dom_uuid))
node = self.zkhandler.read('/domains/{}/node'.format(dom_uuid))
autostart = self.zkhandler.read(('domain.meta.autostart', dom_uuid))
node = self.zkhandler.read(('domain.node', dom_uuid))
if autostart == 'True' and node == self.name:
self.logger.out('Starting autostart VM "{}"'.format(dom_uuid), state='i')
self.zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'start'),
('/domains/{}/node'.format(dom_uuid), self.name),
('/domains/{}/lastnode'.format(dom_uuid), ''),
('/domains/{}/node_autostart'.format(dom_uuid), 'False')
(('domain.state', dom_uuid), 'start'),
(('domain.node', dom_uuid), self.name),
(('domain.last_node', dom_uuid), ''),
(('domain.meta.autostart', dom_uuid), 'False'),
])
continue
try:
last_node = self.zkhandler.read('/domains/{}/lastnode'.format(dom_uuid))
last_node = self.zkhandler.read(('domain.last_node', dom_uuid))
except Exception:
continue
@ -759,17 +760,17 @@ class NodeInstance(object):
self.logger.out('Setting unmigration for VM "{}"'.format(dom_uuid), state='i')
self.zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'migrate'),
('/domains/{}/node'.format(dom_uuid), self.name),
('/domains/{}/lastnode'.format(dom_uuid), '')
(('domain.state', dom_uuid), 'migrate'),
(('domain.node', dom_uuid), self.name),
(('domain.last_node', dom_uuid), ''),
])
# Wait for the VM to migrate back
while self.zkhandler.read('/domains/{}/state'.format(dom_uuid)) in ['migrate', 'unmigrate', 'shutdown']:
while self.zkhandler.read(('domain.state', dom_uuid)) in ['migrate', 'unmigrate', 'shutdown']:
time.sleep(0.1)
self.zkhandler.write([
('/nodes/{}/domainstate'.format(self.name), 'ready')
(('node.state.domain', self.name), 'ready')
])
self.flush_thread = None
self.flush_stopper = False

View File

@ -0,0 +1,210 @@
#!/usr/bin/env python3
# SRIOVVFInstance.py - Class implementing a PVC SR-IOV VF and run by pvcnoded
# Part of the Parallel Virtual Cluster (PVC) system
#
# Copyright (C) 2018-2021 Joshua M. Boniface <joshua@boniface.me>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
###############################################################################
import daemon_lib.common as common
def boolToOnOff(state):
if state and str(state) == 'True':
return 'on'
else:
return 'off'
class SRIOVVFInstance(object):
# Initialization function
def __init__(self, vf, zkhandler, config, logger, this_node):
self.vf = vf
self.zkhandler = zkhandler
self.config = config
self.logger = logger
self.this_node = this_node
self.myhostname = self.this_node.name
self.pf = self.zkhandler.read(('node.sriov.vf', self.myhostname, 'sriov_vf.pf', self.vf))
self.mtu = self.zkhandler.read(('node.sriov.vf', self.myhostname, 'sriov_vf.mtu', self.vf))
self.vfid = self.vf.replace('{}v'.format(self.pf), '')
self.logger.out('Setting MTU to {}'.format(self.mtu), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} mtu {}'.format(self.vf, self.mtu))
# These properties are set via the DataWatch functions, to ensure they are configured on the system
self.mac = None
self.vlan_id = None
self.vlan_qos = None
self.tx_rate_min = None
self.tx_rate_max = None
self.spoof_check = None
self.link_state = None
self.trust = None
self.query_rss = None
# Zookeeper handlers for changed configs
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.mac', self.vf))
def watch_vf_mac(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '00:00:00:00:00:00'
if data != self.mac:
self.mac = data
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.vlan_id', self.vf))
def watch_vf_vlan_id(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '0'
if data != self.vlan_id:
self.vlan_id = data
self.logger.out('Setting vLAN ID to {}'.format(self.vlan_id), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} vlan {} qos {}'.format(self.pf, self.vfid, self.vlan_id, self.vlan_qos))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.vlan_qos', self.vf))
def watch_vf_vlan_qos(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '0'
if data != self.vlan_qos:
self.vlan_qos = data
self.logger.out('Setting vLAN QOS to {}'.format(self.vlan_qos), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} vlan {} qos {}'.format(self.pf, self.vfid, self.vlan_id, self.vlan_qos))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.tx_rate_min', self.vf))
def watch_vf_tx_rate_min(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '0'
if data != self.tx_rate_min:
self.tx_rate_min = data
self.logger.out('Setting minimum TX rate to {}'.format(self.tx_rate_min), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} min_tx_rate {}'.format(self.pf, self.vfid, self.tx_rate_min))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.tx_rate_max', self.vf))
def watch_vf_tx_rate_max(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; termaxate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '0'
if data != self.tx_rate_max:
self.tx_rate_max = data
self.logger.out('Setting maximum TX rate to {}'.format(self.tx_rate_max), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} max_tx_rate {}'.format(self.pf, self.vfid, self.tx_rate_max))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.spoof_check', self.vf))
def watch_vf_spoof_check(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = '0'
if data != self.spoof_check:
self.spoof_check = data
self.logger.out('Setting spoof checking {}'.format(boolToOnOff(self.spoof_check)), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} spoofchk {}'.format(self.pf, self.vfid, boolToOnOff(self.spoof_check)))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.link_state', self.vf))
def watch_vf_link_state(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = 'on'
if data != self.link_state:
self.link_state = data
self.logger.out('Setting link state to {}'.format(boolToOnOff(self.link_state)), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} state {}'.format(self.pf, self.vfid, self.link_state))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.trust', self.vf))
def watch_vf_trust(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = 'off'
if data != self.trust:
self.trust = data
self.logger.out('Setting trust mode {}'.format(boolToOnOff(self.trust)), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} trust {}'.format(self.pf, self.vfid, boolToOnOff(self.trust)))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('node.sriov.vf', self.myhostname) + self.zkhandler.schema.path('sriov_vf.config.query_rss', self.vf))
def watch_vf_query_rss(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
try:
data = data.decode('ascii')
except AttributeError:
data = 'off'
if data != self.query_rss:
self.query_rss = data
self.logger.out('Setting RSS query ability {}'.format(boolToOnOff(self.query_rss)), state='i', prefix='SR-IOV VF {}'.format(self.vf))
common.run_os_command('ip link set {} vf {} query_rss {}'.format(self.pf, self.vfid, boolToOnOff(self.query_rss)))

View File

@ -71,7 +71,7 @@ class VMConsoleWatcherInstance(object):
# Stop execution thread
def stop(self):
if self.thread and self.thread.isAlive():
if self.thread and self.thread.is_alive():
self.logger.out('Stopping VM log parser', state='i', prefix='Domain {}'.format(self.domuuid))
self.thread_stopper.set()
# Do one final flush
@ -92,7 +92,7 @@ class VMConsoleWatcherInstance(object):
# Update Zookeeper with the new loglines if they changed
if self.loglines != self.last_loglines:
self.zkhandler.write([
('/domains/{}/consolelog'.format(self.domuuid), self.loglines)
(('domain.console.log', self.domuuid), self.loglines)
])
self.last_loglines = self.loglines

View File

@ -38,7 +38,8 @@ import daemon_lib.common as daemon_common
def flush_locks(zkhandler, logger, dom_uuid, this_node=None):
logger.out('Flushing RBD locks for VM "{}"'.format(dom_uuid), state='i')
# Get the list of RBD images
rbd_list = zkhandler.read('/domains/{}/rbdlist'.format(dom_uuid)).split(',')
rbd_list = zkhandler.read(('domain.storage.volumes', dom_uuid)).split(',')
for rbd in rbd_list:
# Check if a lock exists
lock_list_retcode, lock_list_stdout, lock_list_stderr = common.run_os_command('rbd lock list --format json {}'.format(rbd))
@ -56,11 +57,11 @@ def flush_locks(zkhandler, logger, dom_uuid, this_node=None):
if lock_list:
# Loop through the locks
for lock in lock_list:
if this_node is not None and zkhandler.read('/domains/{}/state'.format(dom_uuid)) != 'stop' and lock['address'].split(':')[0] != this_node.storage_ipaddr:
if this_node is not None and zkhandler.read(('domain.state', dom_uuid)) != 'stop' and lock['address'].split(':')[0] != this_node.storage_ipaddr:
logger.out('RBD lock does not belong to this host (lock owner: {}): freeing this lock would be unsafe, aborting'.format(lock['address'].split(':')[0], state='e'))
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'fail'),
('/domains/{}/failedreason'.format(dom_uuid), 'Could not safely free RBD lock {} ({}) on volume {}; stop VM and flush locks manually'.format(lock['id'], lock['address'], rbd))
(('domain.state', dom_uuid), 'fail'),
(('domain.failed_reason', dom_uuid), 'Could not safely free RBD lock {} ({}) on volume {}; stop VM and flush locks manually'.format(lock['id'], lock['address'], rbd)),
])
break
# Free the lock
@ -68,8 +69,8 @@ def flush_locks(zkhandler, logger, dom_uuid, this_node=None):
if lock_remove_retcode != 0:
logger.out('Failed to free RBD lock "{}" on volume "{}": {}'.format(lock['id'], rbd, lock_remove_stderr), state='e')
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'fail'),
('/domains/{}/failedreason'.format(dom_uuid), 'Could not free RBD lock {} ({}) on volume {}: {}'.format(lock['id'], lock['address'], rbd, lock_remove_stderr))
(('domain.state', dom_uuid), 'fail'),
(('domain.failed_reason', dom_uuid), 'Could not free RBD lock {} ({}) on volume {}: {}'.format(lock['id'], lock['address'], rbd, lock_remove_stderr)),
])
break
logger.out('Freed RBD lock "{}" on volume "{}"'.format(lock['id'], rbd), state='o')
@ -89,7 +90,7 @@ def run_command(zkhandler, logger, this_node, data):
# Verify that the VM is set to run on this node
if this_node.d_domain[dom_uuid].getnode() == this_node.name:
# Lock the command queue
zk_lock = zkhandler.writelock('/cmd/domains')
zk_lock = zkhandler.writelock('base.cmd.domain')
with zk_lock:
# Flush the lock
result = flush_locks(zkhandler, logger, dom_uuid, this_node)
@ -97,13 +98,13 @@ def run_command(zkhandler, logger, this_node, data):
if result:
# Update the command queue
zkhandler.write([
('/cmd/domains', 'success-{}'.format(data))
('base.cmd.domain', 'success-{}'.format(data))
])
# Command failed
else:
# Update the command queue
zkhandler.write([
('/cmd/domains', 'failure-{}'.format(data))
('base.cmd.domain', 'failure-{}'.format(data))
])
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)
@ -120,18 +121,14 @@ class VMInstance(object):
self.this_node = this_node
# Get data from zookeeper
self.domname = self.zkhandler.read('/domains/{}'.format(domuuid))
self.state = self.zkhandler.read('/domains/{}/state'.format(self.domuuid))
self.node = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
self.lastnode = self.zkhandler.read('/domains/{}/lastnode'.format(self.domuuid))
self.last_currentnode = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
self.last_lastnode = self.zkhandler.read('/domains/{}/lastnode'.format(self.domuuid))
self.domname = self.zkhandler.read(('domain', domuuid))
self.state = self.zkhandler.read(('domain.state', domuuid))
self.node = self.zkhandler.read(('domain.node', domuuid))
self.lastnode = self.zkhandler.read(('domain.last_node', domuuid))
self.last_currentnode = self.zkhandler.read(('domain.node', domuuid))
self.last_lastnode = self.zkhandler.read(('domain.last_node', domuuid))
try:
self.pinpolicy = self.zkhandler.read('/domains/{}/pinpolicy'.format(self.domuuid))
except Exception:
self.pinpolicy = "none"
try:
self.migration_method = self.zkhandler.read('/domains/{}/migration_method'.format(self.domuuid))
self.migration_method = self.zkhandler.read(('domain.meta.migrate_method', self.domuuid))
except Exception:
self.migration_method = 'none'
@ -150,7 +147,7 @@ class VMInstance(object):
self.console_log_instance = VMConsoleWatcherInstance.VMConsoleWatcherInstance(self.domuuid, self.domname, self.zkhandler, self.config, self.logger, self.this_node)
# Watch for changes to the state field in Zookeeper
@self.zkhandler.zk_conn.DataWatch('/domains/{}/state'.format(self.domuuid))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('domain.state', self.domuuid))
def watch_state(data, stat, event=""):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -203,7 +200,7 @@ class VMInstance(object):
self.this_node.domain_list.append(self.domuuid)
# Push the change up to Zookeeper
self.zkhandler.write([
('/nodes/{}/runningdomains'.format(self.this_node.name), ' '.join(self.this_node.domain_list))
(('node.running_domains', self.this_node.name), ' '.join(self.this_node.domain_list))
])
except Exception as e:
self.logger.out('Error adding domain to list: {}'.format(e), state='e')
@ -215,7 +212,7 @@ class VMInstance(object):
self.this_node.domain_list.remove(self.domuuid)
# Push the change up to Zookeeper
self.zkhandler.write([
('/nodes/{}/runningdomains'.format(self.this_node.name), ' '.join(self.this_node.domain_list))
(('node.running_domains', self.this_node.name), ' '.join(self.this_node.domain_list))
])
except Exception as e:
self.logger.out('Error removing domain from list: {}'.format(e), state='e')
@ -230,15 +227,15 @@ class VMInstance(object):
port = graphics.get('port', '')
listen = graphics.get('listen', '')
self.zkhandler.write([
('/domains/{}/vnc'.format(self.domuuid), '{}:{}'.format(listen, port))
(('domain.console.vnc', self.domuuid), '{}:{}'.format(listen, port))
])
else:
self.zkhandler.write([
('/domains/{}/vnc'.format(self.domuuid), '')
(('domain.console.vnc', self.domuuid), '')
])
else:
self.zkhandler.write([
('/domains/{}/vnc'.format(self.domuuid), '')
(('domain.console.vnc', self.domuuid), '')
])
# Start up the VM
@ -269,7 +266,7 @@ class VMInstance(object):
# Flush locks
self.logger.out('Flushing RBD locks', state='i', prefix='Domain {}'.format(self.domuuid))
flush_locks(self.zkhandler, self.logger, self.domuuid, self.this_node)
if self.zkhandler.read('/domains/{}/state'.format(self.domuuid)) == 'fail':
if self.zkhandler.read(('domain.state', self.domuuid)) == 'fail':
lv_conn.close()
self.dom = None
self.instart = False
@ -279,25 +276,25 @@ class VMInstance(object):
# If it is running just update the model
self.addDomainToList()
self.zkhandler.write([
('/domains/{}/failedreason'.format(self.domuuid), '')
(('domain.failed_reason', self.domuuid), '')
])
else:
# Or try to create it
try:
# Grab the domain information from Zookeeper
xmlconfig = self.zkhandler.read('/domains/{}/xml'.format(self.domuuid))
xmlconfig = self.zkhandler.read(('domain.xml', self.domuuid))
dom = lv_conn.createXML(xmlconfig, 0)
self.addDomainToList()
self.logger.out('Successfully started VM', state='o', prefix='Domain {}'.format(self.domuuid))
self.dom = dom
self.zkhandler.write([
('/domains/{}/failedreason'.format(self.domuuid), '')
(('domain.failed_reason', self.domuuid), '')
])
except libvirt.libvirtError as e:
self.logger.out('Failed to create VM', state='e', prefix='Domain {}'.format(self.domuuid))
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'fail'),
('/domains/{}/failedreason'.format(self.domuuid), str(e))
(('domain.state', self.domuuid), 'fail'),
(('domain.failed_reason', self.domuuid), str(e))
])
lv_conn.close()
self.dom = None
@ -327,7 +324,7 @@ class VMInstance(object):
self.addDomainToList()
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
lv_conn.close()
self.inrestart = False
@ -338,6 +335,13 @@ class VMInstance(object):
self.instop = True
try:
self.dom.destroy()
time.sleep(0.2)
try:
if self.getdom().state()[0] == libvirt.VIR_DOMAIN_RUNNING:
# It didn't terminate, try again
self.dom.destroy()
except libvirt.libvirtError:
pass
except AttributeError:
self.logger.out('Failed to terminate VM', state='e', prefix='Domain {}'.format(self.domuuid))
self.removeDomainFromList()
@ -354,13 +358,20 @@ class VMInstance(object):
self.instop = True
try:
self.dom.destroy()
time.sleep(0.2)
try:
if self.getdom().state()[0] == libvirt.VIR_DOMAIN_RUNNING:
# It didn't terminate, try again
self.dom.destroy()
except libvirt.libvirtError:
pass
except AttributeError:
self.logger.out('Failed to stop VM', state='e', prefix='Domain {}'.format(self.domuuid))
self.removeDomainFromList()
if self.inrestart is False:
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'stop')
(('domain.state', self.domuuid), 'stop')
])
self.logger.out('Successfully stopped VM', state='o', prefix='Domain {}'.format(self.domuuid))
@ -382,8 +393,8 @@ class VMInstance(object):
time.sleep(1)
# Abort shutdown if the state changes to start
current_state = self.zkhandler.read('/domains/{}/state'.format(self.domuuid))
if current_state not in ['shutdown', 'restart']:
current_state = self.zkhandler.read(('domain.state', self.domuuid))
if current_state not in ['shutdown', 'restart', 'migrate']:
self.logger.out('Aborting VM shutdown due to state change', state='i', prefix='Domain {}'.format(self.domuuid))
is_aborted = True
break
@ -396,7 +407,7 @@ class VMInstance(object):
if lvdomstate != libvirt.VIR_DOMAIN_RUNNING:
self.removeDomainFromList()
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'stop')
(('domain.state', self.domuuid), 'stop')
])
self.logger.out('Successfully shutdown VM', state='o', prefix='Domain {}'.format(self.domuuid))
self.dom = None
@ -407,7 +418,7 @@ class VMInstance(object):
if tick >= self.config['vm_shutdown_timeout']:
self.logger.out('Shutdown timeout ({}s) expired, forcing off'.format(self.config['vm_shutdown_timeout']), state='e', prefix='Domain {}'.format(self.domuuid))
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'stop')
(('domain.state', self.domuuid), 'stop')
])
break
@ -420,7 +431,7 @@ class VMInstance(object):
# Wait to prevent race conditions
time.sleep(1)
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
# Migrate the VM to a target host
@ -438,15 +449,15 @@ class VMInstance(object):
self.logger.out('Migrating VM to node "{}"'.format(self.node), state='i', prefix='Domain {}'.format(self.domuuid))
# Used for sanity checking later
target_node = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
target_node = self.zkhandler.read(('domain.node', self.domuuid))
aborted = False
def abort_migrate(reason):
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start'),
('/domains/{}/node'.format(self.domuuid), self.this_node.name),
('/domains/{}/lastnode'.format(self.domuuid), self.last_lastnode)
(('domain.state', self.domuuid), 'start'),
(('domain.node', self.domuuid), self.this_node.name),
(('domain.last_node', self.domuuid), self.last_lastnode)
])
migrate_lock_node.release()
migrate_lock_state.release()
@ -454,8 +465,8 @@ class VMInstance(object):
self.logger.out('Aborted migration: {}'.format(reason), state='i', prefix='Domain {}'.format(self.domuuid))
# Acquire exclusive lock on the domain node key
migrate_lock_node = self.zkhandler.exclusivelock('/domains/{}/node'.format(self.domuuid))
migrate_lock_state = self.zkhandler.exclusivelock('/domains/{}/state'.format(self.domuuid))
migrate_lock_node = self.zkhandler.exclusivelock(('domain.node', self.domuuid))
migrate_lock_state = self.zkhandler.exclusivelock(('domain.state', self.domuuid))
migrate_lock_node.acquire()
migrate_lock_state.acquire()
@ -467,14 +478,14 @@ class VMInstance(object):
return
# Synchronize nodes A (I am reader)
lock = self.zkhandler.readlock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.readlock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring read lock for synchronization phase A', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase A', state='o', prefix='Domain {}'.format(self.domuuid))
if self.zkhandler.read('/locks/domain_migrate/{}'.format(self.domuuid)) == '':
if self.zkhandler.read(('domain.migrate.sync_lock', self.domuuid)) == '':
self.logger.out('Waiting for peer', state='i', prefix='Domain {}'.format(self.domuuid))
ticks = 0
while self.zkhandler.read('/locks/domain_migrate/{}'.format(self.domuuid)) == '':
while self.zkhandler.read(('domain.migrate.sync_lock', self.domuuid)) == '':
time.sleep(0.1)
ticks += 1
if ticks > 300:
@ -490,7 +501,7 @@ class VMInstance(object):
return
# Synchronize nodes B (I am writer)
lock = self.zkhandler.writelock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.writelock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring write lock for synchronization phase B', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase B', state='o', prefix='Domain {}'.format(self.domuuid))
@ -531,11 +542,7 @@ class VMInstance(object):
def migrate_shutdown():
self.logger.out('Shutting down VM for offline migration', state='i', prefix='Domain {}'.format(self.domuuid))
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'shutdown')
])
while self.zkhandler.read('/domains/{}/state'.format(self.domuuid)) != 'stop':
time.sleep(0.5)
self.shutdown_vm()
return True
do_migrate_shutdown = False
@ -580,7 +587,7 @@ class VMInstance(object):
return
# Synchronize nodes C (I am writer)
lock = self.zkhandler.writelock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.writelock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring write lock for synchronization phase C', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase C', state='o', prefix='Domain {}'.format(self.domuuid))
@ -594,21 +601,26 @@ class VMInstance(object):
self.logger.out('Released write lock for synchronization phase C', state='o', prefix='Domain {}'.format(self.domuuid))
# Synchronize nodes D (I am reader)
lock = self.zkhandler.readlock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.readlock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring read lock for synchronization phase D', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase D', state='o', prefix='Domain {}'.format(self.domuuid))
self.last_currentnode = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
self.last_lastnode = self.zkhandler.read('/domains/{}/lastnode'.format(self.domuuid))
self.last_currentnode = self.zkhandler.read(('domain.node', self.domuuid))
self.last_lastnode = self.zkhandler.read(('domain.last_node', self.domuuid))
self.logger.out('Releasing read lock for synchronization phase D', state='i', prefix='Domain {}'.format(self.domuuid))
lock.release()
self.logger.out('Released read lock for synchronization phase D', state='o', prefix='Domain {}'.format(self.domuuid))
# Wait for the receive side to complete before we declare all-done and release locks
while self.zkhandler.read('/locks/domain_migrate/{}'.format(self.domuuid)) != '':
time.sleep(0.5)
ticks = 0
while self.zkhandler.read(('domain.migrate.sync_lock', self.domuuid)) != '':
time.sleep(0.1)
ticks += 1
if ticks > 100:
self.logger.out('Sync lock clear exceeded 10s timeout, continuing', state='w', prefix='Domain {}'.format(self.domuuid))
break
migrate_lock_node.release()
migrate_lock_state.release()
@ -625,13 +637,16 @@ class VMInstance(object):
self.logger.out('Receiving VM migration from node "{}"'.format(self.node), state='i', prefix='Domain {}'.format(self.domuuid))
# Short delay to ensure sender is in sync
time.sleep(0.5)
# Ensure our lock key is populated
self.zkhandler.write([
('/locks/domain_migrate/{}'.format(self.domuuid), self.domuuid)
(('domain.migrate.sync_lock', self.domuuid), self.domuuid)
])
# Synchronize nodes A (I am writer)
lock = self.zkhandler.writelock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.writelock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring write lock for synchronization phase A', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase A', state='o', prefix='Domain {}'.format(self.domuuid))
@ -642,7 +657,7 @@ class VMInstance(object):
time.sleep(0.1) # Time for new writer to acquire the lock
# Synchronize nodes B (I am reader)
lock = self.zkhandler.readlock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.readlock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring read lock for synchronization phase B', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase B', state='o', prefix='Domain {}'.format(self.domuuid))
@ -651,27 +666,27 @@ class VMInstance(object):
self.logger.out('Released read lock for synchronization phase B', state='o', prefix='Domain {}'.format(self.domuuid))
# Synchronize nodes C (I am reader)
lock = self.zkhandler.readlock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.readlock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring read lock for synchronization phase C', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired read lock for synchronization phase C', state='o', prefix='Domain {}'.format(self.domuuid))
# Set the updated data
self.last_currentnode = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
self.last_lastnode = self.zkhandler.read('/domains/{}/lastnode'.format(self.domuuid))
self.last_currentnode = self.zkhandler.read(('domain.node', self.domuuid))
self.last_lastnode = self.zkhandler.read(('domain.last_node', self.domuuid))
self.logger.out('Releasing read lock for synchronization phase C', state='i', prefix='Domain {}'.format(self.domuuid))
lock.release()
self.logger.out('Released read lock for synchronization phase C', state='o', prefix='Domain {}'.format(self.domuuid))
# Synchronize nodes D (I am writer)
lock = self.zkhandler.writelock('/locks/domain_migrate/{}'.format(self.domuuid))
lock = self.zkhandler.writelock(('domain.migrate.sync_lock', self.domuuid))
self.logger.out('Acquiring write lock for synchronization phase D', state='i', prefix='Domain {}'.format(self.domuuid))
lock.acquire()
self.logger.out('Acquired write lock for synchronization phase D', state='o', prefix='Domain {}'.format(self.domuuid))
time.sleep(0.5) # Time for reader to acquire the lock
self.state = self.zkhandler.read('/domains/{}/state'.format(self.domuuid))
self.state = self.zkhandler.read(('domain.state', self.domuuid))
self.dom = self.lookupByUUID(self.domuuid)
if self.dom:
lvdomstate = self.dom.state()[0]
@ -679,14 +694,14 @@ class VMInstance(object):
# VM has been received and started
self.addDomainToList()
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
self.logger.out('Successfully received migrated VM', state='o', prefix='Domain {}'.format(self.domuuid))
else:
# The receive somehow failed
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'fail'),
('/domains/{}/failed_reason'.format(self.domuuid), 'Failed to receive migration')
(('domain.state', self.domuuid), 'fail'),
(('domain.failed_reason', self.domuuid), 'Failed to receive migration')
])
self.logger.out('Failed to receive migrated VM', state='e', prefix='Domain {}'.format(self.domuuid))
else:
@ -697,7 +712,7 @@ class VMInstance(object):
elif self.state in ['stop']:
# The send was shutdown-based
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
else:
# The send failed or was aborted
@ -708,7 +723,7 @@ class VMInstance(object):
self.logger.out('Released write lock for synchronization phase D', state='o', prefix='Domain {}'.format(self.domuuid))
self.zkhandler.write([
('/locks/domain_migrate/{}'.format(self.domuuid), '')
(('domain.migrate.sync_lock', self.domuuid), '')
])
self.inreceive = False
return
@ -718,9 +733,10 @@ class VMInstance(object):
#
def manage_vm_state(self):
# Update the current values from zookeeper
self.state = self.zkhandler.read('/domains/{}/state'.format(self.domuuid))
self.node = self.zkhandler.read('/domains/{}/node'.format(self.domuuid))
self.lastnode = self.zkhandler.read('/domains/{}/lastnode'.format(self.domuuid))
self.state = self.zkhandler.read(('domain.state', self.domuuid))
self.node = self.zkhandler.read(('domain.node', self.domuuid))
self.lastnode = self.zkhandler.read(('domain.last_node', self.domuuid))
self.migration_method = self.zkhandler.read(('domain.meta.migrate_method', self.domuuid))
# Check the current state of the VM
try:
@ -769,7 +785,7 @@ class VMInstance(object):
# Start the log watcher
self.console_log_instance.start()
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
# Add domain to running list
self.addDomainToList()
@ -794,7 +810,7 @@ class VMInstance(object):
# VM should be restarted (i.e. started since it isn't running)
if self.state == "restart":
self.zkhandler.write([
('/domains/{}/state'.format(self.domuuid), 'start')
(('domain.state', self.domuuid), 'start')
])
# VM should be shut down; ensure it's gone from this node's domain_list
elif self.state == "shutdown":

View File

@ -40,7 +40,7 @@ class VXNetworkInstance(object):
self.vni_mtu = config['vni_mtu']
self.bridge_dev = config['bridge_dev']
self.nettype = self.zkhandler.read('/networks/{}/nettype'.format(self.vni))
self.nettype = self.zkhandler.read(('network.type', self.vni))
if self.nettype == 'bridged':
self.logger.out(
'Creating new bridged network',
@ -72,7 +72,7 @@ class VXNetworkInstance(object):
self.bridge_nic = 'vmbr{}'.format(self.vni)
# Zookeper handlers for changed states
@self.zkhandler.zk_conn.DataWatch('/networks/{}'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network', self.vni))
def watch_network_description(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -91,16 +91,16 @@ class VXNetworkInstance(object):
self.description = None
self.domain = None
self.name_servers = None
self.ip6_gateway = self.zkhandler.read('/networks/{}/ip6_gateway'.format(self.vni))
self.ip6_network = self.zkhandler.read('/networks/{}/ip6_network'.format(self.vni))
self.ip6_cidrnetmask = self.zkhandler.read('/networks/{}/ip6_network'.format(self.vni)).split('/')[-1]
self.dhcp6_flag = (self.zkhandler.read('/networks/{}/dhcp6_flag'.format(self.vni)) == 'True')
self.ip4_gateway = self.zkhandler.read('/networks/{}/ip4_gateway'.format(self.vni))
self.ip4_network = self.zkhandler.read('/networks/{}/ip4_network'.format(self.vni))
self.ip4_cidrnetmask = self.zkhandler.read('/networks/{}/ip4_network'.format(self.vni)).split('/')[-1]
self.dhcp4_flag = (self.zkhandler.read('/networks/{}/dhcp4_flag'.format(self.vni)) == 'True')
self.dhcp4_start = (self.zkhandler.read('/networks/{}/dhcp4_start'.format(self.vni)) == 'True')
self.dhcp4_end = (self.zkhandler.read('/networks/{}/dhcp4_end'.format(self.vni)) == 'True')
self.ip6_gateway = self.zkhandler.read(('network.ip6.gateway', self.vni))
self.ip6_network = self.zkhandler.read(('network.ip6.network', self.vni))
self.ip6_cidrnetmask = self.zkhandler.read(('network.ip6.network', self.vni)).split('/')[-1]
self.dhcp6_flag = self.zkhandler.read(('network.ip6.dhcp', self.vni))
self.ip4_gateway = self.zkhandler.read(('network.ip4.gateway', self.vni))
self.ip4_network = self.zkhandler.read(('network.ip4.network', self.vni))
self.ip4_cidrnetmask = self.zkhandler.read(('network.ip4.network', self.vni)).split('/')[-1]
self.dhcp4_flag = self.zkhandler.read(('network.ip4.dhcp', self.vni))
self.dhcp4_start = self.zkhandler.read(('network.ip4.dhcp_start', self.vni))
self.dhcp4_end = self.zkhandler.read(('network.ip4.dhcp_end', self.vni))
self.vxlan_nic = 'vxlan{}'.format(self.vni)
self.bridge_nic = 'vmbr{}'.format(self.vni)
@ -157,11 +157,11 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
vxlannic=self.vxlan_nic,
)
self.firewall_rules_in = self.zkhandler.children('/networks/{}/firewall_rules/in'.format(self.vni))
self.firewall_rules_out = self.zkhandler.children('/networks/{}/firewall_rules/out'.format(self.vni))
self.firewall_rules_in = self.zkhandler.children(('network.rule.in', self.vni))
self.firewall_rules_out = self.zkhandler.children(('network.rule.out', self.vni))
# Zookeper handlers for changed states
@self.zkhandler.zk_conn.DataWatch('/networks/{}'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network', self.vni))
def watch_network_description(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -175,7 +175,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/domain'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.domain', self.vni))
def watch_network_domain(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -192,7 +192,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/name_servers'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.nameservers', self.vni))
def watch_network_name_servers(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -209,7 +209,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/ip6_network'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip6.network', self.vni))
def watch_network_ip6_network(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -224,7 +224,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/ip6_gateway'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip6.gateway', self.vni))
def watch_network_gateway6(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -246,7 +246,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/dhcp6_flag'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip6.dhcp', self.vni))
def watch_network_dhcp6_status(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -260,7 +260,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
elif self.dhcp_server_daemon and not self.dhcp4_flag and self.this_node.router_state in ['primary', 'takeover']:
self.stopDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/ip4_network'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip4.network', self.vni))
def watch_network_ip4_network(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -275,7 +275,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/ip4_gateway'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip4.gateway', self.vni))
def watch_network_gateway4(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -297,7 +297,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/dhcp4_flag'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip4.dhcp', self.vni))
def watch_network_dhcp4_status(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -311,7 +311,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
elif self.dhcp_server_daemon and not self.dhcp6_flag and self.this_node.router_state in ['primary', 'takeover']:
self.stopDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/dhcp4_start'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip4.dhcp_start', self.vni))
def watch_network_dhcp4_start(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -324,7 +324,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.DataWatch('/networks/{}/dhcp4_end'.format(self.vni))
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip4.dhcp_end', self.vni))
def watch_network_dhcp4_end(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -337,7 +337,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.ChildrenWatch('/networks/{}/dhcp4_reservations'.format(self.vni))
@self.zkhandler.zk_conn.ChildrenWatch(self.zkhandler.schema.path('network.reservation', self.vni))
def watch_network_dhcp_reservations(new_reservations, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -353,7 +353,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
@self.zkhandler.zk_conn.ChildrenWatch('/networks/{}/firewall_rules/in'.format(self.vni))
@self.zkhandler.zk_conn.ChildrenWatch(self.zkhandler.schema.path('network.rule.in', self.vni))
def watch_network_firewall_rules_in(new_rules, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -365,7 +365,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.firewall_rules_in = new_rules
self.updateFirewallRules()
@self.zkhandler.zk_conn.ChildrenWatch('/networks/{}/firewall_rules/out'.format(self.vni))
@self.zkhandler.zk_conn.ChildrenWatch(self.zkhandler.schema.path('network.rule.out', self.vni))
def watch_network_firewall_rules_out(new_rules, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
@ -388,7 +388,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
if reservation not in old_reservations_list:
# Add new reservation file
filename = '{}/{}'.format(self.dnsmasq_hostsdir, reservation)
ipaddr = self.zkhandler.read('/networks/{}/dhcp4_reservations/{}/ipaddr'.format(self.vni, reservation))
ipaddr = self.zkhandler.read(('network.reservation', self.vni, 'reservation.ip', reservation))
entry = '{},{}'.format(reservation, ipaddr)
# Write the entry
with open(filename, 'w') as outfile:
@ -419,10 +419,10 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
full_ordered_rules = []
for acl in self.firewall_rules_in:
order = self.zkhandler.read('/networks/{}/firewall_rules/in/{}/order'.format(self.vni, acl))
order = self.zkhandler.read(('network.rule.in', self.vni, 'rule.order', acl))
ordered_acls_in[order] = acl
for acl in self.firewall_rules_out:
order = self.zkhandler.read('/networks/{}/firewall_rules/out/{}/order'.format(self.vni, acl))
order = self.zkhandler.read(('network.rule.out', self.vni, 'rule.order', acl))
ordered_acls_out[order] = acl
for order in sorted(ordered_acls_in.keys()):
@ -433,7 +433,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
for direction in 'in', 'out':
for acl in sorted_acl_list[direction]:
rule_prefix = "add rule inet filter vxlan{}-{} counter".format(self.vni, direction)
rule_data = self.zkhandler.read('/networks/{}/firewall_rules/{}/{}/rule'.format(self.vni, direction, acl))
rule_data = self.zkhandler.read((f'network.rule.{direction}', self.vni, 'rule.rule', acl))
rule = '{} {}'.format(rule_prefix, rule_data)
full_ordered_rules.append(rule)

View File

@ -36,7 +36,7 @@ def fenceNode(node_name, zkhandler, config, logger):
# Wait 5 seconds
time.sleep(config['keepalive_interval'])
# Get the state
node_daemon_state = zkhandler.read('/nodes/{}/daemonstate'.format(node_name))
node_daemon_state = zkhandler.read(('node.state.daemon', node_name))
# Is it still 'dead'
if node_daemon_state == 'dead':
failcount += 1
@ -49,9 +49,9 @@ def fenceNode(node_name, zkhandler, config, logger):
logger.out('Fencing node "{}" via IPMI reboot signal'.format(node_name), state='w')
# Get IPMI information
ipmi_hostname = zkhandler.read('/nodes/{}/ipmihostname'.format(node_name))
ipmi_username = zkhandler.read('/nodes/{}/ipmiusername'.format(node_name))
ipmi_password = zkhandler.read('/nodes/{}/ipmipassword'.format(node_name))
ipmi_hostname = zkhandler.read(('node.ipmi.hostname', node_name))
ipmi_username = zkhandler.read(('node.ipmi.username', node_name))
ipmi_password = zkhandler.read(('node.ipmi.password', node_name))
# Shoot it in the head
fence_status = rebootViaIPMI(ipmi_hostname, ipmi_username, ipmi_password, logger)
@ -62,11 +62,11 @@ def fenceNode(node_name, zkhandler, config, logger):
if node_name in config['coordinators']:
logger.out('Forcing secondary status for node "{}"'.format(node_name), state='i')
zkhandler.write([
('/nodes/{}/routerstate'.format(node_name), 'secondary')
(('node.state.router', node_name), 'secondary')
])
if zkhandler.read('/config/primary_node') == node_name:
if zkhandler.read('base.config.primary_node') == node_name:
zkhandler.write([
('/config/primary_node', 'none')
('base.config.primary_node', 'none')
])
# If the fence succeeded and successful_fence is migrate
@ -83,31 +83,31 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
logger.out('Migrating VMs from dead node "{}" to new hosts'.format(node_name), state='i')
# Get the list of VMs
dead_node_running_domains = zkhandler.read('/nodes/{}/runningdomains'.format(node_name)).split()
dead_node_running_domains = zkhandler.read(('node.running_domains', node_name)).split()
# Set the node to a custom domainstate so we know what's happening
zkhandler.write([
('/nodes/{}/domainstate'.format(node_name), 'fence-flush')
(('node.state.domain', node_name), 'fence-flush')
])
# Migrate a VM after a flush
def fence_migrate_vm(dom_uuid):
VMInstance.flush_locks(zkhandler, logger, dom_uuid)
target_node = common.findTargetNode(zkhandler, config, logger, dom_uuid)
target_node = common.findTargetNode(zkhandler, dom_uuid)
if target_node is not None:
logger.out('Migrating VM "{}" to node "{}"'.format(dom_uuid, target_node), state='i')
zkhandler.write([
('/domains/{}/state'.format(dom_uuid), 'start'),
('/domains/{}/node'.format(dom_uuid), target_node),
('/domains/{}/lastnode'.format(dom_uuid), node_name)
(('domain.state', dom_uuid), 'start'),
(('domain.node', dom_uuid), target_node),
(('domain.last_node', dom_uuid), node_name),
])
else:
logger.out('No target node found for VM "{}"; VM will autostart on next unflush/ready of current node'.format(dom_uuid), state='i')
zkhandler.write({
('/domains/{}/state'.format(dom_uuid), 'stopped'),
('/domains/{}/node_autostart'.format(dom_uuid), 'True')
(('domain.state', dom_uuid), 'stopped'),
(('domain.meta.autostart', dom_uuid), 'True'),
})
# Loop through the VMs
@ -116,7 +116,7 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
# Set node in flushed state for easy remigrating when it comes back
zkhandler.write([
('/nodes/{}/domainstate'.format(node_name), 'flushed')
(('node.state.domain', node_name), 'flushed')
])
@ -133,31 +133,46 @@ def rebootViaIPMI(ipmi_hostname, ipmi_user, ipmi_password, logger):
if ipmi_reset_retcode != 0:
logger.out('Failed to reboot dead node', state='e')
print(ipmi_reset_stderr)
return False
time.sleep(1)
# Power on the node (just in case it is offline)
ipmi_command_start = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power on'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start)
time.sleep(2)
# Ensure the node is powered on
# Check the chassis power state
logger.out('Checking power state of dead node', state='i')
ipmi_command_status = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power status'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status)
# Trigger a power start if needed
if ipmi_status_stdout != "Chassis Power is on":
ipmi_command_start = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power on'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start)
if ipmi_start_retcode != 0:
logger.out('Failed to start powered-off dead node', state='e')
print(ipmi_reset_stderr)
if ipmi_reset_retcode == 0:
if ipmi_status_stdout == "Chassis Power is on":
# We successfully rebooted the node and it is powered on; this is a succeessful fence
logger.out('Successfully rebooted dead node', state='o')
return True
elif ipmi_status_stdout == "Chassis Power is off":
# We successfully rebooted the node but it is powered off; this might be expected or not, but the node is confirmed off so we can call it a successful fence
logger.out('Chassis power is in confirmed off state after successfuly IPMI reboot; proceeding with fence-flush', state='o')
return True
else:
# We successfully rebooted the node but it is in some unknown power state; since this might indicate a silent failure, we must call it a failed fence
logger.out('Chassis power is in an unknown state after successful IPMI reboot; not performing fence-flush', state='e')
return False
else:
if ipmi_status_stdout == "Chassis Power is off":
# We failed to reboot the node but it is powered off; it has probably suffered a serious hardware failure, but the node is confirmed off so we can call it a successful fence
logger.out('Chassis power is in confirmed off state after failed IPMI reboot; proceeding with fence-flush', state='o')
return True
else:
# We failed to reboot the node but it is in some unknown power state (including "on"); since this might indicate a silent failure, we must call it a failed fence
logger.out('Chassis power is not in confirmed off state after failed IPMI reboot; not performing fence-flush', state='e')
return False
# Declare success
logger.out('Successfully rebooted dead node', state='o')
return True
#

File diff suppressed because it is too large Load Diff

Before

Width:  |  Height:  |  Size: 355 KiB

148
test-cluster.sh Executable file
View File

@ -0,0 +1,148 @@
#!/usr/bin/env bash
set -o errexit
if [[ -z ${1} ]]; then
echo "Please specify a cluster to run tests against."
exit 1
fi
test_cluster="${1}"
_pvc() {
echo "> pvc --cluster ${test_cluster} $@"
pvc --quiet --cluster ${test_cluster} "$@"
sleep 1
}
time_start=$(date +%s)
# Cluster tests
_pvc maintenance on
_pvc maintenance off
backup_tmp=$(mktemp)
_pvc task backup --file ${backup_tmp}
_pvc task restore --yes --file ${backup_tmp}
rm ${backup_tmp} || true
# Provisioner tests
_pvc provisioner profile list test
_pvc provisioner create --wait testX test
sleep 30
# VM tests
vm_tmp=$(mktemp)
_pvc vm dump testX --file ${vm_tmp}
_pvc vm shutdown --yes --wait testX
_pvc vm start testX
sleep 30
_pvc vm stop --yes testX
_pvc vm disable testX
_pvc vm undefine --yes testX
_pvc vm define --target hv3 --tag pvc-test ${vm_tmp}
_pvc vm start testX
sleep 30
_pvc vm restart --yes --wait testX
sleep 30
_pvc vm migrate --wait testX
sleep 5
_pvc vm unmigrate --wait testX
sleep 5
_pvc vm move --wait --target hv1 testX
sleep 5
_pvc vm meta testX --limit hv1 --selector vms --method live --profile test --no-autostart
_pvc vm tag add testX mytag
_pvc vm tag get testX
_pvc vm list --tag mytag
_pvc vm tag remove testX mytag
_pvc vm network get testX
_pvc vm vcpu set testX 4
_pvc vm vcpu get testX
_pvc vm memory set testX 4096
_pvc vm memory get testX
_pvc vm vcpu set testX 2
_pvc vm memory set testX 2048 --restart --yes
sleep 5
_pvc vm list testX
_pvc vm info --long testX
rm ${vm_tmp} || true
# Node tests
_pvc node primary --wait hv1
sleep 10
_pvc node secondary --wait hv1
sleep 10
_pvc node primary --wait hv1
sleep 10
_pvc node flush --wait hv1
_pvc node ready --wait hv1
_pvc node list hv1
_pvc node info hv1
# Network tests
_pvc network add 10001 --description testing --type managed --domain testing.local --ipnet 10.100.100.0/24 --gateway 10.100.100.1 --dhcp --dhcp-start 10.100.100.100 --dhcp-end 10.100.100.199
sleep 5
_pvc vm network add --restart --yes testX 10001
sleep 30
_pvc vm network remove --restart --yes testX 10001
sleep 5
_pvc network acl add 10001 --in --description test-acl --order 0 --rule "'ip daddr 10.0.0.0/8 counter'"
_pvc network acl list 10001
_pvc network acl remove --yes 10001 test-acl
_pvc network dhcp add 10001 10.100.100.200 test99 12:34:56:78:90:ab
_pvc network dhcp list 10001
_pvc network dhcp remove --yes 10001 12:34:56:78:90:ab
_pvc network modify --domain test10001.local 10001
_pvc network list
_pvc network info --long 10001
# Network-VM interaction tests
_pvc vm network add testX 10001 --model virtio --restart --yes
sleep 30
_pvc vm network get testX
_pvc vm network remove testX 10001 --restart --yes
sleep 5
_pvc network remove --yes 10001
# Storage tests
_pvc storage status
_pvc storage util
_pvc storage osd set noout
_pvc storage osd out 0
_pvc storage osd in 0
_pvc storage osd unset noout
_pvc storage osd list
_pvc storage pool add testing 64 --replcfg "copies=3,mincopies=2"
sleep 5
_pvc storage pool list
_pvc storage volume add testing testX 1G
_pvc storage volume resize testing testX 2G
_pvc storage volume rename testing testX testerX
_pvc storage volume clone testing testerX testerY
_pvc storage volume list --pool testing
_pvc storage volume snapshot add testing testerX asnapshotX
_pvc storage volume snapshot rename testing testerX asnapshotX asnapshotY
_pvc storage volume snapshot list
_pvc storage volume snapshot remove --yes testing testerX asnapshotY
# Storage-VM interaction tests
_pvc vm volume add testX --type rbd --disk-id sdh --bus scsi testing/testerY --restart --yes
sleep 30
_pvc vm volume get testX
_pvc vm volume remove testX testing/testerY --restart --yes
sleep 5
_pvc storage volume remove --yes testing testerY
_pvc storage volume remove --yes testing testerX
_pvc storage pool remove --yes testing
# Remove the VM
_pvc vm stop --yes testX
_pvc vm remove --yes testX
time_end=$(date +%s)
echo
echo "Completed PVC functionality tests against cluster ${test_cluster} in $(( ${time_end} - ${time_start} )) seconds."