Compare commits

..

392 Commits

Author SHA1 Message Date
4aa6a65e6c Work around strange Python anomaly
Apparently, `True` is both an instance of `int` and `bool`, which is a
change and is very strange. Instead flip the conditional here.
2023-08-17 09:55:19 -04:00
81e16e99f6 Correct entrypoint for CLI package 2023-08-17 00:27:45 -04:00
bcabfa9d70 Update linting options for new CLI client 2023-08-16 23:55:44 -04:00
2dc2055cfa Move new CLI client into place 2023-08-16 23:55:27 -04:00
5bd2bd468a Move old CLI client out of the way 2023-08-16 23:54:51 -04:00
3ed60ac1c1 Add provisioner formatters 2023-08-16 23:48:56 -04:00
362d65c011 Add storage formatters 2023-08-16 22:46:13 -04:00
561cb8e465 Add network formatters 2023-08-10 00:58:36 -04:00
7c64f153a1 Add formatters for Node and VM, fix handling 2023-08-09 13:13:03 -04:00
865742c906 Add provisioner management commands
TODO: Add proper new formatters as required
2023-08-09 11:44:43 -04:00
8d479b4068 Add storage management commands
TODO: Add proper new formatters as required
2023-08-09 10:51:44 -04:00
96fffa42c4 Add network management commands
TODO: Add proper new formatters as required
2023-07-03 00:18:07 -04:00
eca726f7a5 Add VM management commands
TODO: Add proper new formatters as required
2023-07-02 01:03:09 -04:00
070a57df99 Fix key display and add stubs 2023-07-01 21:51:46 -04:00
e3777ff00c Add node management commands 2023-05-05 02:10:02 -04:00
d92ada623a Normalize return messages for node commands 2023-05-04 17:02:46 -04:00
a2df4e5662 Port cluster management functions 2023-05-04 03:04:10 -04:00
25f7d4c807 Initial work on new CLI client rewrite
1. lib copied verbatim from existing client
2. initial reworking of Click to split logic from Click definitions
2023-05-02 17:28:52 -04:00
458603bcde Move cli_lib to lib directory 2023-05-01 13:43:54 -04:00
71a4c28d6e Another slight wording tweak 2023-05-01 11:03:58 -04:00
1cee68e03d Reword the sections to add clarity 2023-05-01 10:59:23 -04:00
5bec9363d4 Add a bit of shade 2023-05-01 10:56:42 -04:00
13f1291970 Add another reference to Ganeti and Harvester 2023-05-01 10:54:42 -04:00
3fa111aba5 Bump version to 0.9.63 2023-04-28 14:47:04 -04:00
5298cd19f0 Improve size handling during volume add/resize 2023-04-28 12:16:16 -04:00
17bdb82670 Add full/nearfull OSD health detection 2023-04-28 11:33:39 -04:00
aeaf388933 Add *.update-* obsolete configs to dpkg plugin 2023-04-10 15:39:40 -04:00
6118278427 Mention Ganeti in the docs 2023-03-19 21:23:21 -04:00
2ae303f8bb Increase timeout for connections to API 2023-03-14 09:19:13 -04:00
2af217ced1 Use try when watching health value in NodeInstance 2023-03-07 09:53:01 -05:00
50385deb2a Bump IPMI timeout to 2 seconds 2023-03-07 09:25:27 -05:00
6ac4b7a54e Adjust keepalive health printing and ordering 2023-02-24 11:08:30 -05:00
faa96ff6c4 Correct error handling if monitoring plugins fail 2023-02-24 10:19:41 -05:00
d66e33041e Add documentation details about plugin logging 2023-02-23 22:24:07 -05:00
8dfd6c4d50 Fix bug with SMART info 2023-02-23 13:21:23 -05:00
ad0273f5ae Set timeout on IPMI command 2023-02-23 11:10:09 -05:00
ef0b325ba0 Fix ZK check location 2023-02-23 11:04:02 -05:00
91ee397ed8 Adjust the main location too 2023-02-23 10:32:31 -05:00
adfb2da7d2 Show possible version minimum 2023-02-23 10:30:45 -05:00
1624af7c3f Handle old clusters in cluster detail list 2023-02-23 10:28:55 -05:00
93c24faf9b Better handle N/A health from old versions 2023-02-23 10:22:00 -05:00
5b853feb8e Correct bad health text call for old clusters 2023-02-23 10:19:18 -05:00
b90d0729c4 Fix status when connecting to old clusters 2023-02-23 10:16:29 -05:00
38ff55556f Set maintenance colour in cluster detail 2023-02-22 18:20:18 -05:00
646785b7f8 Bump version to 0.9.62 2023-02-22 18:13:45 -05:00
f9f3bbbb3d Merge branch 'revamp-health'
Add detailed health checking, status reporting, and enhancements to the
PVC system.

Closes #161 #154 #159
2023-02-22 18:12:35 -05:00
6561ca6f75 Add cluster detail list
Adds a command to show a list of details including health and item
counts for all configured clusters in the client.
2023-02-22 18:09:11 -05:00
0614e133fe Lower default connect timeout to 1s 2023-02-22 18:09:01 -05:00
879a844f28 Add PVC version to cluster status output 2023-02-22 16:09:24 -05:00
74f894913d Add additional plugins to manual 2023-02-22 15:02:08 -05:00
8a403e6a20 Add IPMI monitoring check 2023-02-22 15:02:08 -05:00
a9e7713abf Add health delta change to message output 2023-02-22 15:02:08 -05:00
0f3cd13da1 Fix bad string value for message 2023-02-22 15:02:08 -05:00
1451c480dc Use consistent connection with other checks 2023-02-22 15:02:08 -05:00
137b3010f2 Add Libvirtd monitoring check 2023-02-22 15:02:08 -05:00
e15b4f14ec Add Zookeeper monitoring check 2023-02-22 15:02:08 -05:00
e9e9d50ff6 Add PostgreSQL monitoring check 2023-02-22 15:02:08 -05:00
6fd341501b Adjust comment message 2023-02-22 15:02:08 -05:00
dcd7ac066c Correct lint error E741 2023-02-22 12:21:29 -05:00
da7394a8de Adjust Munin threshold values 2023-02-22 10:42:43 -05:00
8699c291ac Add documentation about new health and plugins 2023-02-22 01:40:48 -05:00
109654ba77 Remove obsolete LINKSPEED variable 2023-02-22 01:04:25 -05:00
ba6cb1371e Adjust health delta of load to 50
This is a very bad situation and should be critical.
2023-02-22 01:03:12 -05:00
8896c6914c Adjust health delta of EDAC Uncorrected to 50
This is a very bad situation and should be critical.
2023-02-22 01:01:54 -05:00
73e04ad2aa Add last item to swagger doc 2023-02-22 00:25:27 -05:00
6f5aecfa22 Add plugin directory and plugin details log fields 2023-02-22 00:19:05 -05:00
c834a3e9c8 Update API specification 2023-02-22 00:06:52 -05:00
a40de4b7f8 Update readme for Munin plugin 2023-02-18 00:00:04 -05:00
55f0aae2a7 Fix typo in var and flip conditional 2023-02-17 16:18:42 -05:00
f04f816e1b Fix various issues with PVC Munin plugin 2023-02-17 15:41:16 -05:00
3f9c1c735b Flip VM state condition to remove shutdown
Don't cause health degredation for shutdown state, and flip the list
around to make it clearer.
2023-02-16 20:32:33 -05:00
396f424f80 Update Munin plugin example 2023-02-16 16:06:00 -05:00
529e6d6878 Add CheckMK monitoring example plugins 2023-02-16 16:05:47 -05:00
75639c17d9 Format cluster health like node healths
Make a cleaner construct here.
2023-02-16 12:33:36 -05:00
3c6c33a326 Exclude monitoring examples from flake8 2023-02-16 12:33:18 -05:00
25d0fde5e4 Add JSON output format for node info 2023-02-15 21:35:44 -05:00
4ab0bdd9e8 Disallow health less than 0 2023-02-15 16:50:24 -05:00
21965d280c Fix comparison in maintenance check 2023-02-15 16:47:31 -05:00
3408e27355 Add per-node health entries for 3rd party checks 2023-02-15 16:44:49 -05:00
fa900f6212 Fix bugs and formatting of health messages 2023-02-15 16:28:56 -05:00
b236127dba Remove extra text from packages plugin 2023-02-15 16:28:41 -05:00
0ae77d7e77 Fix linting of cluster.py file 2023-02-15 15:48:31 -05:00
8b5011c266 Move Ceph health to global cluster health 2023-02-15 15:46:13 -05:00
6ac5b0d02f Modify cluster health to use new values 2023-02-15 15:45:43 -05:00
3a1b8f0e7a Add JSON health to cluster data 2023-02-15 15:26:57 -05:00
f6bea50a0a Add disk monitoring plugin 2023-02-15 11:30:49 -05:00
fc16e26f23 Run setup during plugin loads 2023-02-15 10:11:38 -05:00
8aa74aae62 Use percentage in keepalie output 2023-02-15 01:56:02 -05:00
265e1e29d7 Improve ethtool parsing speeds 2023-02-14 15:49:58 -05:00
c6a8c6d39b Add NIC monitoring plugin 2023-02-14 15:43:52 -05:00
8e6632bf10 Adjust text on log message 2023-02-13 22:21:23 -05:00
96d3aff7ad Add logging flag for montioring plugin output 2023-02-13 22:04:39 -05:00
134f59f9ee Flip condition in EDAC check 2023-02-13 21:58:56 -05:00
54373c5bec Fix bugs if plugins fail to load 2023-02-13 21:51:48 -05:00
7378affcb5 Add EDAC check plugin 2023-02-13 21:43:13 -05:00
8df189aa22 Fix several bugs and optimize output 2023-02-13 16:36:15 -05:00
af436a93cc Set node health to None when restarting 2023-02-13 15:54:46 -05:00
edb3aea990 Add node health value and send out API 2023-02-13 15:53:39 -05:00
4d786c11e3 Move Ceph cluster health reporting to plugin
Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line.
2023-02-13 13:29:40 -05:00
25f3faa08f Move Ceph cluster health reporting to plugin
Also removes several outputs from the normal keepalive that were
superfluous/static so that the main output fits on one line.
2023-02-13 12:13:56 -05:00
3ad6ff2d9c Initial implementation of monitoring plugin system 2023-02-13 12:06:26 -05:00
c7c47d9f86 Bump version to 0.9.61 2023-02-08 10:08:05 -05:00
3c5a5f08bc Allow rename in disable state 2023-01-30 11:48:43 -05:00
59b2dbeb5e Remove bad casting to int in string compare 2023-01-01 13:55:10 -05:00
0b8d26081b Bump version to 0.9.60 2022-12-06 15:42:55 -05:00
f076554b15 Disable RBD caching by default
Results in a massive (~2x) performance boost for random block I/O inside
VMs, and thus a worthwhile default change.
2022-12-05 17:56:59 -05:00
35f5219916 Fix bad ref in example scripts 2022-11-18 12:54:28 -05:00
f7eaa11a5f Update description 2022-11-16 22:48:40 -05:00
924a0b22ec Fix up remaining bugs in Rinse test script 2022-11-16 13:32:24 -05:00
6a5f54d169 Ensure transient dirs are cleaned up 2022-11-16 13:01:15 -05:00
7741400370 Ensure swap is skipped during cleanup too 2022-11-16 12:52:24 -05:00
5eafa475b9 Skip swap volumes during mounting 2022-11-16 12:42:28 -05:00
f3ba4b6294 Bump version to 0.9.59 2022-11-15 15:50:15 -05:00
faf9cc537f Flip behaviour of memory selectors
It didn't make any sense to me for mem(prov) to be the default selector,
since this has too many caveats versus mem(free). Switch to using
mem(free) as the default (i.e. "mem") and make memprov the alternative.
2022-11-15 15:45:59 -05:00
a28df75a5d Bump version to 0.9.58 2022-11-07 12:27:48 -05:00
13dab7a285 Remove extra lower() call where not needed 2022-11-07 12:26:50 -05:00
f89dbe802e Ensure equality of none and None for selector 2022-11-07 11:59:53 -05:00
d63e80675a Bump version to 0.9.57 2022-11-06 01:39:50 -04:00
263f3570ab Add module tag for daemon lib 2022-11-04 03:47:18 -04:00
90f9336041 Make benchmarker function as a module
1. Move the test_matrix, volume name, and size to module-level variables
so they can be accessed externally if this is imported.
2. Separate the volume creation and volume cleanup into functions.
3. Separate the individual benchmark runs into a function.

This should enable easier calling of the various subcomponents
externally, e.g. for external benchmark scripts.
2022-11-03 21:33:32 -04:00
5415985ed2 Better handle invalid nets in VMs
1. Error out when trying to add a new network to a VM if the network
doesn't exist on the cluster.
2. When showing the VM list, only show invalid networks in red, not the
whole list.
2022-11-01 10:24:24 -04:00
3384f24ef5 Remove VXLAN ref where it isn't correct 2022-11-01 09:40:13 -04:00
ef3c22d793 Bump version to 0.9.56 2022-10-27 14:21:04 -04:00
078f85b431 Add node autoready oneshot unit
This replicates some of the more important functionality of the defunct
pvc-flush.service unit. On presence of a trigger file (i.e.
/etc/pvc/autoready), it will trigger a "node ready" on boot. It does
nothing on shutdown as this must be handled by other mechanisms, though
a similar autoflush could be added as well.
2022-10-27 14:09:14 -04:00
bfb363c459 Ensure None filesystem is valid 2022-10-21 15:13:52 -04:00
13e6a0f0bd Move /dev umount to cleanup step 2022-10-21 14:47:48 -04:00
c1302cf8b6 Adjust help message text 2022-10-21 14:22:15 -04:00
9358949991 Add ova as valid name in addition to default_ova 2022-10-21 14:13:40 -04:00
cd0b8c23e6 Fix console config and domain argument 2022-10-21 14:04:17 -04:00
fb30263a41 Add cloud-init configuration to debootstrap script
Prevents errors trying to find the cloud-init metadata source.
2022-10-21 14:03:34 -04:00
172e3627d4 Add pfsense example provisioner script 2022-10-21 13:35:48 -04:00
53ffe6cd55 Include /proc in chroot mounts 2022-10-20 15:00:10 -04:00
df6e11ae7a Properly handle missing source_volume from OVAs 2022-10-19 13:18:12 -04:00
de2135db42 Add missing ceph import 2022-10-19 13:10:40 -04:00
72e093c2c4 Move conversion to install() step
Seems more clear to me than doing it in prepare()
2022-10-19 13:09:29 -04:00
60e32f7795 Add missing imports 2022-10-19 13:07:34 -04:00
23e7d84f53 Add output messages during OVA prepare 2022-10-19 12:58:11 -04:00
dd81594f26 Fix bad comparison 2022-10-19 12:46:15 -04:00
0d09f5d089 Remove reference to automatic upload of OVA script 2022-10-19 03:37:12 -04:00
365c70e873 Add missing flag 2022-10-19 03:34:37 -04:00
4f7e2fe146 Fix wording of initial script paragraphs 2022-10-19 03:27:14 -04:00
77f49654b9 Fix missing f-string marker 2022-10-15 16:26:47 -04:00
c158e4e0f5 Use own domain for docs links 2022-10-08 21:12:59 -04:00
31a5c8801f Add rinse example configuration
Provisions Rocky Linux 8 and 9 systems, and potentially older
CentOS/Fedora/Scientific Linux/SuSE systems. Depends on a custom build
of rinse (3.7.1) with Rocky 9 support.
2022-10-07 19:55:56 -04:00
0a4e4c7048 Add host-model to CPU config in VMs 2022-10-07 09:36:22 -04:00
de97f2f476 Add output message to debootstrap install 2022-10-07 02:27:20 -04:00
165ce15dfe Fix braces in fstring example 2022-10-06 15:57:31 -04:00
a81d419a2e Update copyright header year 2022-10-06 11:55:27 -04:00
85a7088e5a Fix titles 2022-10-06 11:54:36 -04:00
b58fa06f67 Add OVA script support
1. Ensure that system_template and script are not nullable in the DB.
2. Ensure that the CLI and API enforce the above and clean up CLI
arguments for profile add.
3. Ensure that, before uploading OVAs, a 'default_ova' provisioning
script is present.
4. Use the 'default_ova' script for new OVA uploads.
5. Ensure that OVA details are properly added to the vm_data dict in the
provisioner vmbuilder.
2022-10-06 10:48:12 -04:00
3b3d2e7f7e Reverse numbering of example scripts 2022-10-06 10:14:37 -04:00
72a5de800c Complete OVA provisioning script 2022-10-06 10:14:04 -04:00
f450d1d313 Remove lingering OVA references 2022-10-06 00:13:36 -04:00
2db58488a2 Update documentation to reflect script changes 2022-10-06 00:06:02 -04:00
1bbf8f6bf6 Reorganize and add more comments to examples 2022-10-05 23:35:53 -04:00
191f8780c9 Fix remaining bugs in example scripts 2022-10-05 22:37:11 -04:00
80c1f78864 Ensure inner cleanup and end message response 2022-10-05 22:36:42 -04:00
c8c0987fe7 Fix bad variable reference 2022-10-05 17:43:23 -04:00
67560c6457 Add additional import for config 2022-10-05 17:41:37 -04:00
79c9eba28c Add better exception handling with ctxtmgrs 2022-10-05 17:35:05 -04:00
36e924d339 Add additional missing import in examples 2022-10-05 17:29:34 -04:00
aeb1443410 Improve error messages 2022-10-05 17:26:09 -04:00
eccd2a98b2 Fix bad ref in examples 2022-10-05 17:25:56 -04:00
6e2c1fb45e Add proper imports to examples 2022-10-05 17:22:04 -04:00
b14ba9172c Better handle cleanups and fix chroot bug 2022-10-05 17:21:30 -04:00
e9235a627c Implement new provisioner setup 2022-10-05 16:03:05 -04:00
c84ee0f4f1 Bump version to 0.9.55 2022-10-04 13:21:40 -04:00
76c51460b0 Avoid raise/handle deadlocks
Can cause log flooding in some edge cases and isn't really needed any
longer. Use a proper conditional followed by an actual error handler.
2022-10-03 14:04:12 -04:00
6ed37f5b4a Try a literal eval first
This is a breakage between the older version of Celery (Deb10) and
newer. The hard removal broke Deb10 instances.

So try that first, and on failure, assume newer Celery format.
2022-09-06 10:34:50 -04:00
4b41ee2817 Bump version to 0.9.54 2022-08-23 11:01:05 -04:00
dc36c40690 Use proper SSLContext and enable TLSv1
It's bad, but sometimes you need to access the API from a very old
software version. So just enable it for now and clean it up later.
2022-08-23 10:58:47 -04:00
459b16386b Fix bad variable name 2022-08-18 11:37:57 -04:00
6146b062d6 Bump version to 0.9.53 2022-08-12 17:47:11 -04:00
74193c7e2a Actually fix VM sorting
Due to the executor the previous attempt did not work.
2022-08-12 17:46:29 -04:00
73c1ac732e Bump version to 0.9.52 2022-08-12 11:09:25 -04:00
58dd5830eb Add additional kb_ values to OSD stats
Allows for easier parsing later to get e.g. % values and more details on
the used amounts.
2022-08-11 11:06:36 -04:00
90e515c46f Always sort VM list
Same justification as previous commit.
2022-08-09 12:05:40 -04:00
a6a5f71226 Ensure the node list is sorted
Otherwise the node entries could come back in an arbitrary order; since
this is an ordered list of dictionaries that might not be expected by
the API consumers, so ensure it's always sorted.
2022-08-09 12:03:49 -04:00
60a3ef1604 Add reference to bootstrap in index 2022-08-03 20:22:16 -04:00
95807b23eb Add missing cluster_req for vm modify 2022-08-02 10:02:26 -04:00
5ae430e1c5 Bump version to 0.9.51 2022-07-25 23:25:41 -04:00
4731faa2f0 Remove pvc-flush service
This service caused more headaches than it was worth, so remove it.

The original goal was to cleanly flush nodes on shutdown and unflush
them on startup, but this is tightly controlled by Ansible playbooks at
this point, and this is something best left to the Administrator and
their particular situation anyways.
2022-07-25 23:21:34 -04:00
42f4907dec Add confirmation to disable command 2022-07-21 16:43:37 -04:00
02168a5ecf Remove faulty literal_eval 2022-07-18 13:35:15 -04:00
8cfcd02ac2 Fix bad changelog entries 2022-07-06 16:57:55 -04:00
e464dcb483 Bump version to 0.9.50 2022-07-06 16:01:14 -04:00
27214c8190 Fix bug with space-containing detect strings 2022-07-06 15:58:57 -04:00
f78669a175 Add selector help and adjust flag name
1. Add documentation on the node selector flags. In the API, reference
the daemon configuration manual which now includes details in this
section; in the CLI, provide the help in "pvc vm define" in detail and
then reference that command's help in the other commands that use this
field.

2. Ensure the naming is consistent in the CLI, using the flag name
"--node-selector" everywhere (was "--selector" for "pvc vm" commands and
"--node-selector" for "pvc provisioner" commands).
2022-06-10 02:42:06 -04:00
00a4a01517 Add memfree to selector and use proper defaults 2022-06-10 02:03:12 -04:00
a40a69816d Add migration selector via free memory
Closes #152
2022-05-18 03:47:16 -04:00
baf5a132ff Bump version to 0.9.49 2022-05-06 15:49:39 -04:00
584cb95b8d Use consistent language for primary mode
I didn't call it "router" anywhere else, but the state in the list is
called "coordinator" so, call it "coordinator mode".
2022-05-06 15:40:52 -04:00
21bbb0393f Add support for replacing/refreshing OSDs
Adds commands to both replace an OSD disk, and refresh (reimport) an
existing OSD disk on a new node. This handles the cases where an OSD
disk should be replaced (either due to upgrades or failures) or where a
node is rebuilt in-place and an existing OSD must be re-imported to it.

This should avoid the need to do a full remove/add sequence for either
case.

Also cleans up some aspects of OSD removal that are identical between
methods (e.g. using safe-to-destroy and sleeping after stopping) and
fixes a bug if an OSD does not truly exist when the daemon starts up.
2022-05-06 15:32:06 -04:00
d18e009b00 Improve handling of rounded values 2022-05-02 15:29:30 -04:00
1f8f3252a6 Fix bug with initial JSON for stats 2022-05-02 13:28:19 -04:00
b47c9832b7 Refactor OSD removal to use new ZK data
With the OSD LVM information stored in Zookeeper, we can use this to
determine the actual block device to zap rather than relying on runtime
determination and guestimation.
2022-05-02 12:52:22 -04:00
d2757004db Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
7323269775 Ensure initial OSD stats is populated
Values are all invalid but this ensures the client won't error out when
trying to show an OSD that has never checked in yet.
2022-04-29 16:50:30 -04:00
85463f9aec Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
19c37c3ed5 Fix bugs with forced removal 2022-04-29 14:03:07 -04:00
7d2ea494e7 Ensure unresponsive OSDs still display in list
It is still useful to see such dead OSDs even if they've never checked
in or have not checked in for quite some time.
2022-04-29 12:11:52 -04:00
cb50eee2a9 Add OSD removal force option
Ensures a removal can continue even in situations where some step(s)
might fail, for instance removing an obsolete OSD from a replaced node.
2022-04-29 11:16:33 -04:00
f3f4eaadf1 Use a singular configured cluster by default
If there is...
  1. No '--cluster' passed, and
  2. No 'local' cluster, and
  3. There is exactly one cluster configured
...then use that cluster by default in the CLI.
2022-01-13 18:36:20 -05:00
313a5d1c7d Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
b6d689b769 Add pool PGs count modification
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
2021-12-28 21:53:29 -05:00
a0fccf83f7 Add PGs count to pool list 2021-12-28 21:12:02 -05:00
46896c593e Fix issue if pool stats have not updated yet 2021-12-28 21:03:10 -05:00
02138974fa Add device class tiers to Ceph pools
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
2021-12-28 20:58:15 -05:00
c3d255be65 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
45fc8a47a3 Allow single-node clusters to restart and timeout
Prevents a daemon from waiting forever to terminate if it is primary,
and avoids this entirely if there is only a single node in the cluster.
2021-12-28 03:06:03 -05:00
07f2006f68 Fix bug when removing OSDs
Ensure the OSD is down as well as out or purge might fail.
2021-12-28 03:05:34 -05:00
f4c7fdffb8 Handle detect strings as arguments for blockdevs
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
2021-12-28 02:53:02 -05:00
be1b67b8f0 Allow bypassing confirm message for benchmarks 2021-12-23 21:00:42 -05:00
d68f6a945e Add auditing to local syslog from PVC client
This ensures that any client command is logged by the local system.
Helps ensure Accounting for users of the CLI. Currently logs the full
command executed along with the $USER environment variable contents.
2021-12-10 16:17:33 -05:00
c776aba8b3 Standardize fuzzy matching and use fullmatch
Solves two problems:

1. How match fuzziness was used was very inconsistent; make them all the
same, i.e. "if is_fuzzy and limit, apply .* to both sides".

2. Use re.fullmatch instead of re.match to ensure exact matching of the
regex to the value. Without fuzziness, this would sometimes cause
inconsistent behavior, for instance if a limit was non-fuzzy "vm",
expecting to match the actual "vm", but also matching "vm1" too.
2021-12-06 16:35:29 -05:00
2461941421 Remove "and started" from message text
This is not necessarily the case.
2021-11-29 16:42:26 -05:00
68954a79ec Fix bug with cloned image sizes 2021-11-29 14:56:50 -05:00
a2fa6ed450 Fix bugs with legacy benchmark format 2021-11-26 11:42:35 -05:00
02a2f6a27a Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
a75b951605 Ensure echo always has an argument 2021-11-25 09:33:26 -05:00
658e80350f Fix ordering of pvcnoded unit
We want to be after network.target and want network-online.target
2021-11-18 16:56:49 -05:00
3aa20fbaa3 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
6d101df1ff Add Munin plugin for Ceph utilization 2021-11-08 15:21:09 -05:00
be6a3992c1 Add 0.05s to connection timeout
This is recommended by the Python Requests documentation:

> It’s a good practice to set connect timeouts to slightly larger than a
  multiple of 3, which is the default TCP packet retransmission window.
2021-11-08 03:11:41 -05:00
d76da0f25a Use separate connect and data timeouts
This allows us to keep a very low connect timeout of 3 seconds, but also
ensure that long commands (e.g. --wait or VM disable) can take as long
as the API requires to complete.

Avoids having to explicitly set very long single-instance timeouts for
other functions which would block forever on an unreachable API.
2021-11-08 03:10:09 -05:00
bc722ce9b8 Fix quote in sed for unstable deb build 2021-11-08 02:54:27 -05:00
7890c32c59 Add sudo to deploy-package task 2021-11-08 02:41:10 -05:00
6febcfdd97 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
11d8ce70cd Fix sed commands after Black formatting change 2021-11-08 02:29:05 -05:00
a17d9439c0 Remove references to Ansible manual 2021-11-08 00:29:47 -05:00
9cd02eb148 Remove Ansible and Testing manuals
The Ansible manual can't keep up with the other repo, so it should live
there instead (eventually, after significant rewrites).

The Testing page is obsoleted by the "test-cluster" script.
2021-11-08 00:25:27 -05:00
459485c202 Allow American spelling for compatibility 2021-11-08 00:09:59 -05:00
9f92d5d822 Shorten help messages slightly to fit 2021-11-08 00:07:21 -05:00
947ac561c8 Add forced colour support
Allows preserving colour within e.g. watch, where Click would normally
determine that it is "not a terminal". This is done via the wrapper echo
which filters via the local config.
2021-11-08 00:04:20 -05:00
ca143c1968 Add funding configuration 2021-11-06 18:05:17 -04:00
6e110b178c Add start delineators to command output 2021-11-06 13:35:30 -04:00
d07d37d08e Revamp formatting and linting on commit
Remove the prepare script, and run the two stages manually. Better
handle Black reformatting by doing a check (for the errcode) then
reformat and abort commit to review.
2021-11-06 13:34:33 -04:00
0639b16c86 Apply more granular timeout formatting
We don't need to wait forever if state changes aren't waiting or disable
(which does a shutdown before returning).
2021-11-06 13:34:03 -04:00
1cf8706a52 Up timeout when setting VM state
Ensures the API won't time out immediately especially during a
wait-flagged or disable action.
2021-11-06 04:15:10 -04:00
dd8f07526f Use positive check rather than negative
Ensure the VM is start before doing shutdown/stop, rather than being
stopped. Prevents overwrite of existing disable state and other
weirdness.
2021-11-06 04:08:33 -04:00
5a5e5da663 Add disable forcing to CLI
References #148
2021-11-06 04:02:50 -04:00
739b60b91e Perform automatic shutdown/stop on VM disable
Instead of requiring the VM to already be stopped, instead allow disable
state changes to perform a shutdown first. Also add a force option which
will do a hard stop instead of a shutdown.

References #148
2021-11-06 03:57:24 -04:00
16544227eb Reformat recent changes with Black 2021-11-06 03:27:07 -04:00
73e3746885 Fix linting error F541 f-string placeholders 2021-11-06 03:26:03 -04:00
66230ce971 Fix linting errors F522/F523 unused args 2021-11-06 03:24:50 -04:00
fbfbd70461 Rename build-deb.sh to build-stable-deb.sh
Unifies the naming with the other build-unstable-deb.sh script.
2021-11-06 03:18:58 -04:00
2506098223 Remove obsolete gitlab-ci config 2021-11-06 03:18:22 -04:00
83e887c4ee Ensure all helper scripts pushd/popd
Make sure all of these move to the root of the repository first, then
return to where they were afterwards, using pushd/popd. This allows them
to be executed from anywhere in the repo.
2021-11-06 03:17:47 -04:00
4eb0f3bb8a Unify formatting and linting
Ensures optimal formatting in addition to linting during manual deploys
and during pre-commit actions.
2021-11-06 03:10:17 -04:00
adc767e32f Add newline to start of lint 2021-11-06 03:04:14 -04:00
2083fd824a Reformat code with Black code formatter
Unify the code style along PEP and Black principles using the tool.
2021-11-06 03:02:43 -04:00
3aa74a3940 Add safe mode to Black 2021-11-06 02:59:54 -04:00
71d94bbeab Move Flake configuration into dedicated file
Avoid passing arguments in the script.
2021-11-06 02:55:37 -04:00
718f689df9 Clean up linter after Black add (pass two) 2021-11-06 02:51:14 -04:00
268b5c0b86 Exclude Alembic migrations from Black
These files are autogenerated with their own formats, so we don't want
to override that.
2021-11-06 02:46:06 -04:00
b016b9bf3d Clean up linter after Black add (pass one) 2021-11-06 02:44:24 -04:00
7604b9611f Add black formatter to project root 2021-11-06 02:44:05 -04:00
b21278fd80 Add Basic Builder configuration
Configuration for my new CI system under Gitea.
2021-10-31 00:09:55 -04:00
3b02034b70 Add some delay and additional tries to fencing 2021-10-27 16:24:17 -04:00
c7a5b41b1e Fix ordering to show correct message 2021-10-27 13:37:52 -04:00
48b0091d3e Support adding the same network to a VM again
This is a supported configuration for some edge cases and should be
allowed.
2021-10-27 13:33:27 -04:00
2e94516ee2 Reorder linting on build-and-deploy 2021-10-27 13:25:14 -04:00
d7f26b27ea More gracefully handle restart + live
Instead of erroring, just use the implication that restarting a VM does
not want a live modification, and proceed from there. Update the help
text to match.
2021-10-27 13:23:39 -04:00
872f35a7ee Support removing VM interfaces by MAC
Provides a way to handle multiple interfaces in the same network
gracefully, while making the previous behaviour explicit.
2021-10-27 13:20:05 -04:00
52c3e8ced3 Fix bad test in postinst 2021-10-19 00:27:12 -04:00
1d7acf62bf Fix bad location of config sets 2021-10-12 17:23:04 -04:00
c790c331a7 Also validate on failures 2021-10-12 17:11:03 -04:00
23165482df Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
057071a7b7 Go back to passing if exception
Validation already happened and the set happens again later.
2021-10-12 14:21:52 -04:00
554fa9f412 Use current live value for bridge_mtu
This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are.
2021-10-12 12:24:03 -04:00
5a5f924268 Use power off in fence instead of reset
Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
2021-10-12 11:04:27 -04:00
cc309fc021 Validate network MTU after initial read 2021-10-12 10:53:17 -04:00
5f783f1663 Make cluster example images clickable 2021-10-12 03:15:04 -04:00
bc89bb5b68 Mention fencing only in run state 2021-10-12 03:05:01 -04:00
eb233ef588 Adjust more wording and fix typos 2021-10-12 03:00:21 -04:00
d3efb54cb4 Adjust some wording 2021-10-12 02:54:16 -04:00
da15357c8a Remove codeql setup
I don't use this for anything useful, so disable it since a run takes
ages.
2021-10-12 02:51:19 -04:00
b6939a28c0 Fix formatting of subsection 2021-10-12 02:49:40 -04:00
a1da479a4c Add reference to Ansible manual 2021-10-12 02:48:47 -04:00
ace4082820 Fix spelling errors 2021-10-12 02:47:31 -04:00
4036af6045 Fix link to cluster architecture docs 2021-10-12 02:41:22 -04:00
f96de97861 Adjust getting started docs
Update the docs with the current information on setting up a cluster,
including simplifying the Ansible configuration to use the new
create-local-repo.sh script, and simplifying some other sections.
2021-10-12 02:39:25 -04:00
04cad46305 Default to removing build artifacts in b-a-d.sh 2021-10-11 16:41:00 -04:00
e9dea4d2d1 Add explicit 3 second timeout to requests 2021-10-11 16:31:18 -04:00
39fd85fcc3 Add version function support to CLI 2021-10-11 15:34:41 -04:00
cbbab46b55 Add new configs for Ansible 2021-10-11 14:44:18 -04:00
d1f2ce0b0a Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
2f01edca14 Add bridge_mtu config to docs 2021-10-09 19:28:50 -04:00
12a3a3a6a6 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
c44732be83 Avoid duplicate runs of MTU set
It wasn't the validator duplicating, but the update duplicating, so
avoid that happening properly this time.
2021-10-09 19:21:47 -04:00
a8b68e0968 Revert "Avoid duplicate runs of MTU validator"
This reverts commit 56021c443a.
2021-10-09 19:11:42 -04:00
e59152afee Set all log messages to information state
None of these were "success" messages and thus shouldn't have been ok
state.
2021-10-09 19:09:38 -04:00
56021c443a Avoid duplicate runs of MTU validator 2021-10-09 19:07:41 -04:00
ebdea165f1 Use correct isinstance instead of type 2021-10-09 19:03:31 -04:00
fb0651fb05 Move MTU validation to function
Prevents code duplication and ensures validation runs when an MTU is
updated, not just on network creation.
2021-10-09 19:01:45 -04:00
35e7e11403 Add logger message when setting MTU 2021-10-09 18:56:18 -04:00
b7555468eb Ensure vx_mtu is always an int() 2021-10-09 18:52:50 -04:00
f1b4ee02ba Fix bad header length in network list 2021-10-09 18:50:32 -04:00
4698edc98e Add MTU value checking and log messages
Ensures that if a specified MTU is more than the maximum it is set to
the maximum instead, and adds warning messages for both situations.
2021-10-09 18:48:56 -04:00
40e7e04aad Fix invalid schema key
Addresses #144
2021-10-09 18:42:33 -04:00
7f074847c4 Add MTU support to network add/modify commands
Addresses #144
2021-10-09 18:06:21 -04:00
b0b0b75605 Have VXNetworkInstance set MTU if unset
Makes this explicit in Zookeeper if a network is unset, post-migration
(schema version 6).

Addresses #144
2021-10-09 17:52:57 -04:00
89f62318bd Add MTU to network creation/modification
Addresses #144
2021-10-09 17:51:32 -04:00
925141ed65 Fix migration bugs and invalid vx_mtu
Addresses #144
2021-10-09 17:35:10 -04:00
f7a826bf52 Add handlers for client network MTUs
Refactors some of the code in VXNetworkInterface to handle MTUs in a
more streamlined fashion. Also fixes a bug whereby bridge client
networks were being explicitly given the cluster dev MTU which might not
be correct. Now adds support for this option explicitly in the configs,
and defaults to 1500 for safety (the standard Ethernet MTU).

Addresses #144
2021-10-09 17:02:27 -04:00
e176f3b2f6 Make n-1 values clearer 2021-10-07 18:11:15 -04:00
b339d5e641 Correct levels in TOC 2021-10-07 18:08:28 -04:00
d476b13cc0 Correct spelling errors 2021-10-07 18:07:06 -04:00
ce8b2c22cc Add documentation sections on IPMI and fencing 2021-10-07 18:05:47 -04:00
feab5d3479 Correct flawed conditional in verify_ipmi 2021-10-07 15:11:19 -04:00
ee348593c9 Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
e403146bcf Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
bde684dd3a Remove redundant wording from header 2021-10-07 12:20:04 -04:00
992e003500 Replace headers with links in CHANGELOG.md 2021-10-07 12:17:44 -04:00
eaeb860a83 Add missing period to changelog sentence 2021-10-07 12:10:35 -04:00
1198ca9f5c Move changelog into dedicated file
The changelog was getting far too long for the README/docs index to
support, so move it into CHANGELOG.md and link to it instead.
2021-10-07 12:09:26 -04:00
e79d200244 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
5b3bb9f306 Add linting to build-and-deploy
Ensures that bad code isn't deployed during testing.
2021-10-07 11:51:05 -04:00
5501586a47 Add limit negation to VM list
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
2021-10-07 11:50:52 -04:00
c160648c5c Add note about fencing at remote sites 2021-10-04 19:58:08 -04:00
fa37227127 Correct TOC in architecture page 2021-10-04 01:54:22 -04:00
2cac98963c Correct spelling errors 2021-10-04 01:51:58 -04:00
8e50428707 Double image sizes for example clusters 2021-10-04 01:47:35 -04:00
a4953bc6ef Adjust toc_depth for RTD theme 2021-10-04 01:45:05 -04:00
3c10d57148 Revamp about and architecture docs
Makes these a little simpler to follow and provides some more up-to-date
information based on recent tests and developments.
2021-10-04 01:42:08 -04:00
26d8551388 Adjust bump-version changelog heading level 2021-10-04 01:41:48 -04:00
57342541dd Move changelog headers down one more level 2021-10-04 01:41:22 -04:00
50f8afd749 Adjust indent of index/README versions 2021-10-04 00:33:24 -04:00
3449069e3d Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
cb66b16045 Correct latency units and format name 2021-10-03 17:06:34 -04:00
8edce74b85 Revamp test result display
Instead of showing CLAT percentiles, which are very hard to interpret
and understand, instead use the main latency buckets.
2021-10-03 15:49:01 -04:00
e9b69c4124 Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
3948206225 Tweak fio tests for benchmarks
1. Remove ramp_time as this was giving very strange results.

2. Up the runtime to 75 seconds to compensate.

3. Print the fio command to the console to validate.
2021-10-03 15:06:18 -04:00
a09578fcf5 Add benchmark format to list 2021-10-03 15:05:58 -04:00
73be807b84 Adjust ETA for benchmarks 2021-10-02 04:51:01 -04:00
4a9805578e Add format parsing for format 1 storage benchmarks 2021-10-02 04:46:44 -04:00
f70f052df1 Add version 2 benchmark list formatting 2021-10-02 02:47:17 -04:00
1e8841ce69 Handle benchmark running state properly 2021-10-02 01:54:51 -04:00
9c7d39d523 Fix missing argument in database insert 2021-10-02 01:49:47 -04:00
011490bcca Update to storage benchmark format 1
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.

2. Adds a 15-second ramp time to minimize wonky ramp-up results.

3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
2021-10-02 01:41:08 -04:00
8de63b2785 Fix handling of array of information
With a benchmark info we only ever want test one, so pass only that to
the formatter. Simplifies the format function.
2021-10-02 01:28:39 -04:00
8f8f00b2e9 Avoid versioning benchmark lists
This wouldn't work since each individual test is versioned. Instead add
a placeholder for later once additional format(s) are defined.
2021-10-02 01:25:18 -04:00
1daab49b50 Add format option to benchmark info
Allows specifying of raw json or json-pretty formats in addition to the
"pretty" formatted option.
2021-10-02 01:13:50 -04:00
9f6041b9cf Add benchmark format function support
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
2021-10-02 01:07:25 -04:00
5b27e438a9 Add test format versioning to storage benchmarks
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.

References #143
2021-10-02 00:55:27 -04:00
3e8a85b029 Load benchmark results as JSON
Load the JSON at the API side instead of client side, because that's
what the API doc says it is and it just makes more sense.
2021-09-30 23:40:24 -04:00
19ac1e17c3 Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
252175fb6f Revamp benchmark tests
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.

2. Eliminate the 256k random benchmarks, since they don't really add
anything.

3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.

4. Adjust the output to reflect the above changes.

While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
2021-09-29 20:51:30 -04:00
f39b041471 Add primary node to benchmark job name
Ensures tracking of the current primary node the job was run on, since
this may be relevant for performance reasons.
2021-09-28 09:58:22 -04:00
3b41759262 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
e514eed414 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
b81e70ec18 Fix missing character in log message 2021-09-27 00:49:43 -04:00
c2a473ed8b Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
5355f6ff48 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
bf7823deb5 Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
8ba371723e Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
e10ac52116 Track status of VM state thread 2021-09-26 22:55:21 -04:00
341073521b Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
16c38da5ef Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
c8134d3a1c Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
9f41373324 Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
8e62d5b30b Fix typo in log message 2021-09-26 03:35:30 -04:00
7a8eee244a Tweak CLI helptext around OSD actions
Adds some more detail about OSD commands and their values.
2021-09-26 01:29:23 -04:00
7df5b8e52e Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00
6f96219023 Use re.search instead of re.match
Required since we're not matching the start of the string.
2021-09-26 00:55:29 -04:00
51967e164b Raise basic exceptions in CephInstance
Avoids no exception to reraise errors on failures.
2021-09-26 00:50:10 -04:00
7a3a44d47c Fix OSD creation for partition paths and fix gdisk
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.

The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
2021-09-26 00:12:28 -04:00
44491dd988 Add support for configurable OSD DB ratios
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
2021-09-24 01:06:39 -04:00
eba142f470 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
6cef68d157 Add separate OSD DB device support
Adds in three parts:

1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).

2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.

3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.

While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.

Closes #142
2021-09-23 13:59:49 -04:00
e8caf3369e Move console watcher stop try up
Could cause an exception if d_domain is not defined yet.
2021-09-22 16:02:04 -04:00
3e3776a25b Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
6e0d0e264e Add memory and vCPU checks to VM define/modify
Ensures that a VM won't:

(a) Have provisioned more RAM than there is available on a given node.
Due to memory overprovisioning, this is simply a "is the VM memory count
more than the node count", and doesn't factor in free or used memory on
a node, total cluster usage, etc. So if a node has 64GB total RAM, the
VM limit is 64GB. It is up to an administrator to ensure sanity *below*
that value.

(b) Have provisioned more vCPUs than there are CPU cores on the node,
minus 2 to account for hypervisor/storage processes. Will ensure there
is no severe CPU contention caused by a single VM having more vCPUs than
there are actual execution threads available.

Closes #139
2021-09-13 01:51:21 -04:00
1855d03a36 Add pool size check when resizing volumes
Closes #140
2021-09-12 19:54:51 -04:00
1a286dc8dd Increase build-and-deploy sleep 2021-09-12 19:50:58 -04:00
1b6d10e03a Handle VM disk/network stats gathering exceptions 2021-09-12 19:41:07 -04:00
73c96d1e93 Add VM device hot attach/detach support
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.

The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.

Closes #141
2021-09-12 19:33:00 -04:00
5841c98a59 Adjust lint script for newer linter 2021-09-12 15:40:38 -04:00
bc6395c959 Don't crash cleanup if no this_node 2021-08-29 03:52:18 -04:00
d582f87472 Change default node object state to flushed 2021-08-29 03:34:08 -04:00
e9735113af Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
722fd0a65d Properly handle =-separated fsargs 2021-08-24 11:40:22 -04:00
3b41beb0f3 Convert argument elements of task status to types 2021-08-23 14:28:12 -04:00
d3392c0282 Fix typo in output message 2021-08-23 00:39:19 -04:00
560c013e95 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
384c6320ef Avoid failing if no provisioner tasks 2021-08-21 03:25:16 -04:00
445dec1c38 Ensure pycache files are removed on deb creation 2021-08-21 03:19:18 -04:00
534c7cd7f0 Refactor pvcnoded to reduce Daemon.py size
This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.

This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.

Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.

Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.

An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
2021-08-21 03:14:22 -04:00
4014ef7714 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
180f0445ac Properly handle exceptions getting VM stats 2021-08-19 12:36:31 -04:00
Joshua Boniface
074664d4c1 Fix image dimensions and size 2021-08-18 19:51:55 -04:00
Joshua Boniface
418ac23d40 Add screenshots to docs 2021-08-18 19:49:53 -04:00
14 changed files with 178 additions and 348 deletions

View File

@@ -1 +1 @@
0.9.68 0.9.63

View File

@@ -1,44 +1,5 @@
## PVC Changelog ## PVC Changelog
###### [v0.9.68](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.68)
* [CLI] Fixes another bug with network info view
###### [v0.9.67](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.67)
* [CLI] Fixes several more bugs in the refactored CLI
###### [v0.9.66](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.66)
* [CLI] Fixes a missing YAML import in CLI
###### [v0.9.65](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.65)
* [CLI] Fixes a bug in the node list filtering command
* [CLI] Fixes a bug/default when no connection is specified
###### [v0.9.64](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.64)
**Breaking Change [CLI]**: The CLI client root commands have been reorganized. The following commands have changed:
* `pvc cluster` -> `pvc connection` (all subcommands)
* `pvc task` -> `pvc cluster` (all subcommands)
* `pvc maintenance` -> `pvc cluster maintenance`
* `pvc status` -> `pvc cluster status`
Ensure you have updated to the latest version of the PVC Ansible repository before deploying this version or using PVC Ansible oneshot playbooks for management.
**Breaking Change [CLI]**: The `--restart` option for VM configuration changes now has an explicit `--no-restart` to disable restarting, or a prompt if neither is specified; `--unsafe` no longer bypasses this prompt which was a bug. Applies to most `vm <cmd> set` commands like `vm vcpu set`, `vm memory set`, etc. All instances also feature restart confirmation afterwards, which, if `--restart` is provided, will prompt for confirmation unless `--yes` or `--unsafe` is specified.
**Breaking Change [CLI]**: The `--long` option previously on some `info` commands no longer exists; use `-f long`/`--format long` instead.
* [CLI] Significantly refactors the CLI client code for consistency and cleanliness
* [CLI] Implements `-f`/`--format` options for all `list` and `info` commands in a consistent way
* [CLI] Changes the behaviour of VM modification options with "--restart" to provide a "--no-restart"; defaults to a prompt if neither is specified and ignores the "--unsafe" global entirely
* [API] Fixes several bugs in the 3-debootstrap.py provisioner example script
* [Node] Fixes some bugs around VM shutdown on node flush
* [Documentation] Adds mentions of Ganeti and Harvester
###### [v0.9.63](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.63) ###### [v0.9.63](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.63)
* Mentions Ganeti in the docs * Mentions Ganeti in the docs

View File

@@ -441,7 +441,7 @@ class VMBuilderScript(VMBuilder):
# The directory we mounted things on earlier during prepare(); this could very well # The directory we mounted things on earlier during prepare(); this could very well
# be exposed as a module-level variable if you so choose # be exposed as a module-level variable if you so choose
temp_dir = "/tmp/target" temporary_directory = "/tmp/target"
# Use these convenient aliases for later (avoiding lots of "self.vm_data" everywhere) # Use these convenient aliases for later (avoiding lots of "self.vm_data" everywhere)
vm_name = self.vm_name vm_name = self.vm_name
@@ -469,8 +469,6 @@ class VMBuilderScript(VMBuilder):
"grub-pc", "grub-pc",
"cloud-init", "cloud-init",
"python3-cffi-backend", "python3-cffi-backend",
"acpid",
"acpi-support-base",
"wget", "wget",
] ]
@@ -484,17 +482,17 @@ class VMBuilderScript(VMBuilder):
# Perform a debootstrap installation # Perform a debootstrap installation
print( print(
f"Installing system with debootstrap: debootstrap --include={','.join(deb_packages)} {deb_release} {temp_dir} {deb_mirror}" f"Installing system with debootstrap: debootstrap --include={','.join(deb_packages)} {deb_release} {temporary_directory} {deb_mirror}"
) )
os.system( os.system(
f"debootstrap --include={','.join(deb_packages)} {deb_release} {temp_dir} {deb_mirror}" f"debootstrap --include={','.join(deb_packages)} {deb_release} {temporary_directory} {deb_mirror}"
) )
# Bind mount the devfs so we can grub-install later # Bind mount the devfs so we can grub-install later
os.system("mount --bind /dev {}/dev".format(temp_dir)) os.system("mount --bind /dev {}/dev".format(temporary_directory))
# Create an fstab entry for each volume # Create an fstab entry for each volume
fstab_file = "{}/etc/fstab".format(temp_dir) fstab_file = "{}/etc/fstab".format(temporary_directory)
# The volume ID starts at zero and increments by one for each volume in the fixed-order # The volume ID starts at zero and increments by one for each volume in the fixed-order
# volume list. This lets us work around the insanity of Libvirt IDs not matching guest IDs, # volume list. This lets us work around the insanity of Libvirt IDs not matching guest IDs,
# while still letting us have some semblance of control here without enforcing things # while still letting us have some semblance of control here without enforcing things
@@ -539,13 +537,13 @@ class VMBuilderScript(VMBuilder):
volume_id += 1 volume_id += 1
# Write the hostname; you could also take an FQDN argument for this as an example # Write the hostname; you could also take an FQDN argument for this as an example
hostname_file = "{}/etc/hostname".format(temp_dir) hostname_file = "{}/etc/hostname".format(temporary_directory)
with open(hostname_file, "w") as fh: with open(hostname_file, "w") as fh:
fh.write("{}".format(vm_name)) fh.write("{}".format(vm_name))
# Fix the cloud-init.target since it's broken by default in Debian 11 # Fix the cloud-init.target since it's broken by default in Debian 11
cloudinit_target_file = "{}/etc/systemd/system/cloud-init.target".format( cloudinit_target_file = "{}/etc/systemd/system/cloud-init.target".format(
temp_dir temporary_directory
) )
with open(cloudinit_target_file, "w") as fh: with open(cloudinit_target_file, "w") as fh:
# We lose our indent on these raw blocks to preserve the apperance of the files # We lose our indent on these raw blocks to preserve the apperance of the files
@@ -559,7 +557,7 @@ After=multi-user.target
fh.write(data) fh.write(data)
# Write the cloud-init configuration # Write the cloud-init configuration
ci_cfg_file = "{}/etc/cloud/cloud.cfg".format(temp_dir) ci_cfg_file = "{}/etc/cloud/cloud.cfg".format(temporary_directory)
with open(ci_cfg_file, "w") as fh: with open(ci_cfg_file, "w") as fh:
fh.write( fh.write(
""" """
@@ -620,15 +618,15 @@ After=multi-user.target
- arches: [default] - arches: [default]
failsafe: failsafe:
primary: {deb_mirror} primary: {deb_mirror}
""".format( """
deb_mirror=deb_mirror ).format(deb_mirror=deb_mirror)
)
)
# Due to device ordering within the Libvirt XML configuration, the first Ethernet interface # Due to device ordering within the Libvirt XML configuration, the first Ethernet interface
# will always be on PCI bus ID 2, hence the name "ens2". # will always be on PCI bus ID 2, hence the name "ens2".
# Write a DHCP stanza for ens2 # Write a DHCP stanza for ens2
ens2_network_file = "{}/etc/network/interfaces.d/ens2".format(temp_dir) ens2_network_file = "{}/etc/network/interfaces.d/ens2".format(
temporary_directory
)
with open(ens2_network_file, "w") as fh: with open(ens2_network_file, "w") as fh:
data = """auto ens2 data = """auto ens2
iface ens2 inet dhcp iface ens2 inet dhcp
@@ -636,7 +634,7 @@ iface ens2 inet dhcp
fh.write(data) fh.write(data)
# Write the DHCP config for ens2 # Write the DHCP config for ens2
dhclient_file = "{}/etc/dhcp/dhclient.conf".format(temp_dir) dhclient_file = "{}/etc/dhcp/dhclient.conf".format(temporary_directory)
with open(dhclient_file, "w") as fh: with open(dhclient_file, "w") as fh:
# We can use fstrings too, since PVC will always have Python 3.6+, though # We can use fstrings too, since PVC will always have Python 3.6+, though
# using format() might be preferable for clarity in some situations # using format() might be preferable for clarity in some situations
@@ -656,7 +654,7 @@ interface "ens2" {{
fh.write(data) fh.write(data)
# Write the GRUB configuration # Write the GRUB configuration
grubcfg_file = "{}/etc/default/grub".format(temp_dir) grubcfg_file = "{}/etc/default/grub".format(temporary_directory)
with open(grubcfg_file, "w") as fh: with open(grubcfg_file, "w") as fh:
data = """# Written by the PVC provisioner data = """# Written by the PVC provisioner
GRUB_DEFAULT=0 GRUB_DEFAULT=0
@@ -673,7 +671,7 @@ GRUB_DISABLE_LINUX_UUID=false
fh.write(data) fh.write(data)
# Do some tasks inside the chroot using the provided context manager # Do some tasks inside the chroot using the provided context manager
with chroot(temp_dir): with chroot(temporary_directory):
# Install and update GRUB # Install and update GRUB
os.system( os.system(
"grub-install --force /dev/rbd/{}/{}_{}".format( "grub-install --force /dev/rbd/{}/{}_{}".format(
@@ -706,17 +704,16 @@ GRUB_DISABLE_LINUX_UUID=false
""" """
# Run any imports first # Run any imports first
import os
from pvcapid.vmbuilder import open_zk from pvcapid.vmbuilder import open_zk
from pvcapid.Daemon import config from pvcapid.Daemon import config
import daemon_lib.common as pvc_common import daemon_lib.common as pvc_common
import daemon_lib.ceph as pvc_ceph import daemon_lib.ceph as pvc_ceph
# Set the temp_dir we used in the prepare() and install() steps # Set the tempdir we used in the prepare() and install() steps
temp_dir = "/tmp/target" temp_dir = "/tmp/target"
# Unmount the bound devfs # Unmount the bound devfs
os.system("umount {}/dev".format(temp_dir)) os.system("umount {}/dev".format(temporary_directory))
# Use this construct for reversing the list, as the normal reverse() messes with the list # Use this construct for reversing the list, as the normal reverse() messes with the list
for volume in list(reversed(self.vm_data["volumes"])): for volume in list(reversed(self.vm_data["volumes"])):

View File

@@ -27,7 +27,7 @@ from ssl import SSLContext, TLSVersion
from distutils.util import strtobool as dustrtobool from distutils.util import strtobool as dustrtobool
# Daemon version # Daemon version
version = "0.9.68" version = "0.9.63"
# API version # API version
API_VERSION = 1.0 API_VERSION = 1.0

View File

@@ -19,8 +19,6 @@
# #
############################################################################### ###############################################################################
from colorama import Fore
from difflib import unified_diff
from functools import wraps from functools import wraps
from json import dump as jdump from json import dump as jdump
from json import dumps as jdumps from json import dumps as jdumps
@@ -28,9 +26,7 @@ from json import loads as jloads
from os import environ, makedirs, path from os import environ, makedirs, path
from pkg_resources import get_distribution from pkg_resources import get_distribution
from lxml.etree import fromstring, tostring from lxml.etree import fromstring, tostring
from re import sub, match from re import sub
from yaml import load as yload
from yaml import SafeLoader as SafeYAMLLoader
from pvc.cli.helpers import * from pvc.cli.helpers import *
from pvc.cli.waiters import * from pvc.cli.waiters import *
@@ -170,24 +166,32 @@ def connection_req(function):
def restart_opt(function): def restart_opt(function):
""" """
Click Option Decorator: Click Option Decorator:
Wraps a Click command which requires a VM domain restart, to provide options for/against restart or prompt Wraps a Click command which requires confirm_flag or unsafe option or asks for VM restart confirmation
""" """
@click.option( @click.option(
"-r/-R", "-r",
"--restart/--no-restart", "--restart",
"restart_flag", "restart_flag",
is_flag=True, is_flag=True,
default=None, default=False,
show_default=False, help="Immediately restart VM to apply changes.",
help="Immediately restart VM to apply changes or do not restart VM, or prompt if unspecified.",
) )
@wraps(function) @wraps(function)
def confirm_action(*args, **kwargs): def confirm_action(*args, **kwargs):
restart_state = kwargs.get("restart_flag", None) confirm_action = True
if "restart_flag" in kwargs:
if not kwargs.get("restart_flag", False):
if not CLI_CONFIG.get("unsafe", False):
confirm_action = True
else:
confirm_action = False
else:
confirm_action = False
else:
confirm_action = False
if restart_state is None: if confirm_action:
# Neither "--restart" or "--no-restart" was passed: prompt for restart or restart if "--unsafe"
try: try:
click.confirm( click.confirm(
f"Restart VM {kwargs.get('domain')} to apply changes", f"Restart VM {kwargs.get('domain')} to apply changes",
@@ -198,12 +202,6 @@ def restart_opt(function):
except Exception: except Exception:
echo(CLI_CONFIG, "Changes will be applied on next VM start/restart.") echo(CLI_CONFIG, "Changes will be applied on next VM start/restart.")
kwargs["restart_flag"] = False kwargs["restart_flag"] = False
elif restart_state is True:
# "--restart" was passed: allow restart without confirming
kwargs["restart_flag"] = True
elif restart_state is False:
# "--no-restart" was passed: skip confirming and skip restart
kwargs["restart_flag"] = False
return function(*args, **kwargs) return function(*args, **kwargs)
@@ -854,14 +852,14 @@ def cli_node_info(
help="Limit list to nodes in the specified daemon state.", help="Limit list to nodes in the specified daemon state.",
) )
@click.option( @click.option(
"-cs", "-ds",
"--coordinator-state", "--coordinator-state",
"coordinator_state_filter", "coordinator_state_filter",
default=None, default=None,
help="Limit list to nodes in the specified coordinator state.", help="Limit list to nodes in the specified coordinator state.",
) )
@click.option( @click.option(
"-vs", "-ds",
"--domain-state", "--domain-state",
"domain_state_filter", "domain_state_filter",
default=None, default=None,
@@ -1198,7 +1196,7 @@ def cli_vm_modify(
text=current_vm_cfgfile, require_save=True, extension=".xml" text=current_vm_cfgfile, require_save=True, extension=".xml"
) )
if new_vm_cfgfile is None: if new_vm_cfgfile is None:
echo(CLI_CONFIG, "Aborting with no modifications.") echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_vm_cfgfile = new_vm_cfgfile.strip() new_vm_cfgfile = new_vm_cfgfile.strip()
@@ -1210,14 +1208,16 @@ def cli_vm_modify(
cfgfile.close() cfgfile.close()
echo( echo(
CLI_CONFIG,
'Replacing configuration of VM "{}" with file "{}".'.format( 'Replacing configuration of VM "{}" with file "{}".'.format(
dom_name, cfgfile.name dom_name, cfgfile.name
), )
) )
# Show a diff and confirm
echo("Pending modifications:")
echo("")
diff = list( diff = list(
unified_diff( difflib.unified_diff(
current_vm_cfgfile.split("\n"), current_vm_cfgfile.split("\n"),
new_vm_cfgfile.split("\n"), new_vm_cfgfile.split("\n"),
fromfile="current", fromfile="current",
@@ -1228,23 +1228,16 @@ def cli_vm_modify(
lineterm="", lineterm="",
) )
) )
if len(diff) < 1:
echo(CLI_CONFIG, "Aborting with no modifications.")
exit(0)
# Show a diff and confirm
echo(CLI_CONFIG, "Pending modifications:")
echo(CLI_CONFIG, "")
for line in diff: for line in diff:
if match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(CLI_CONFIG, Fore.GREEN + line + Fore.RESET) echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(CLI_CONFIG, Fore.RED + line + Fore.RESET) echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(CLI_CONFIG, Fore.BLUE + line + Fore.RESET) echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(CLI_CONFIG, line) echo(line)
echo(CLI_CONFIG, "") echo("")
# Verify our XML is sensible # Verify our XML is sensible
try: try:
@@ -3601,7 +3594,7 @@ def cli_storage_volume_upload(pool, name, image_format, image_file):
""" """
if not os.path.exists(image_file): if not os.path.exists(image_file):
echo(CLI_CONFIG, "ERROR: File '{}' does not exist!".format(image_file)) echo("ERROR: File '{}' does not exist!".format(image_file))
exit(1) exit(1)
retcode, retmsg = pvc.lib.storage.ceph_volume_upload( retcode, retmsg = pvc.lib.storage.ceph_volume_upload(
@@ -4439,8 +4432,7 @@ def cli_provisioner_template_storage_disk_add(
if source_volume and (size or filesystem or mountpoint): if source_volume and (size or filesystem or mountpoint):
echo( echo(
CLI_CONFIG, 'The "--source-volume" option is not compatible with the "--size", "--filesystem", or "--mountpoint" options.'
'The "--source-volume" option is not compatible with the "--size", "--filesystem", or "--mountpoint" options.',
) )
exit(1) exit(1)
@@ -4520,9 +4512,9 @@ def cli_provisioner_userdata_add(name, filename):
userdata = filename.read() userdata = filename.read()
filename.close() filename.close()
try: try:
yload(userdata, Loader=SafeYAMLLoader) yaml.load(userdata, Loader=yaml.SafeLoader)
except Exception as e: except Exception as e:
echo(CLI_CONFIG, "Error: Userdata document is malformed") echo("Error: Userdata document is malformed")
cleanup(False, e) cleanup(False, e)
params = dict() params = dict()
@@ -4559,7 +4551,7 @@ def cli_provisioner_userdata_modify(name, filename, editor):
# Grab the current config # Grab the current config
retcode, retdata = pvc.lib.provisioner.userdata_info(CLI_CONFIG, name) retcode, retdata = pvc.lib.provisioner.userdata_info(CLI_CONFIG, name)
if not retcode: if not retcode:
echo(CLI_CONFIG, retdata) echo(retdata)
exit(1) exit(1)
current_userdata = retdata["userdata"].strip() current_userdata = retdata["userdata"].strip()
@@ -4567,14 +4559,16 @@ def cli_provisioner_userdata_modify(name, filename, editor):
text=current_userdata, require_save=True, extension=".yaml" text=current_userdata, require_save=True, extension=".yaml"
) )
if new_userdata is None: if new_userdata is None:
echo(CLI_CONFIG, "Aborting with no modifications.") echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_userdata = new_userdata.strip() new_userdata = new_userdata.strip()
# Show a diff and confirm # Show a diff and confirm
echo("Pending modifications:")
echo("")
diff = list( diff = list(
unified_diff( difflib.unified_diff(
current_userdata.split("\n"), current_userdata.split("\n"),
new_userdata.split("\n"), new_userdata.split("\n"),
fromfile="current", fromfile="current",
@@ -4585,22 +4579,16 @@ def cli_provisioner_userdata_modify(name, filename, editor):
lineterm="", lineterm="",
) )
) )
if len(diff) < 1:
echo(CLI_CONFIG, "Aborting with no modifications.")
exit(0)
echo(CLI_CONFIG, "Pending modifications:")
echo(CLI_CONFIG, "")
for line in diff: for line in diff:
if match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(CLI_CONFIG, Fore.GREEN + line + Fore.RESET) echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(CLI_CONFIG, Fore.RED + line + Fore.RESET) echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(CLI_CONFIG, Fore.BLUE + line + Fore.RESET) echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(CLI_CONFIG, line) echo(line)
echo(CLI_CONFIG, "") echo("")
click.confirm("Write modifications to cluster?", abort=True) click.confirm("Write modifications to cluster?", abort=True)
@@ -4613,9 +4601,9 @@ def cli_provisioner_userdata_modify(name, filename, editor):
filename.close() filename.close()
try: try:
yload(userdata, Loader=SafeYAMLLoader) yaml.load(userdata, Loader=yaml.SafeLoader)
except Exception as e: except Exception as e:
echo(CLI_CONFIG, "Error: Userdata document is malformed") echo("Error: Userdata document is malformed")
cleanup(False, e) cleanup(False, e)
params = dict() params = dict()
@@ -4753,20 +4741,22 @@ def cli_provisioner_script_modify(name, filename, editor):
# Grab the current config # Grab the current config
retcode, retdata = pvc.lib.provisioner.script_info(CLI_CONFIG, name) retcode, retdata = pvc.lib.provisioner.script_info(CLI_CONFIG, name)
if not retcode: if not retcode:
echo(CLI_CONFIG, retdata) echo(retdata)
exit(1) exit(1)
current_script = retdata["script"].strip() current_script = retdata["script"].strip()
new_script = click.edit(text=current_script, require_save=True, extension=".py") new_script = click.edit(text=current_script, require_save=True, extension=".py")
if new_script is None: if new_script is None:
echo(CLI_CONFIG, "Aborting with no modifications.") echo("Aborting with no modifications.")
exit(0) exit(0)
else: else:
new_script = new_script.strip() new_script = new_script.strip()
# Show a diff and confirm # Show a diff and confirm
echo("Pending modifications:")
echo("")
diff = list( diff = list(
unified_diff( difflib.unified_diff(
current_script.split("\n"), current_script.split("\n"),
new_script.split("\n"), new_script.split("\n"),
fromfile="current", fromfile="current",
@@ -4777,22 +4767,16 @@ def cli_provisioner_script_modify(name, filename, editor):
lineterm="", lineterm="",
) )
) )
if len(diff) < 1:
echo(CLI_CONFIG, "Aborting with no modifications.")
exit(0)
echo(CLI_CONFIG, "Pending modifications:")
echo(CLI_CONFIG, "")
for line in diff: for line in diff:
if match(r"^\+", line) is not None: if re.match(r"^\+", line) is not None:
echo(CLI_CONFIG, Fore.GREEN + line + Fore.RESET) echo(colorama.Fore.GREEN + line + colorama.Fore.RESET)
elif match(r"^\-", line) is not None: elif re.match(r"^\-", line) is not None:
echo(CLI_CONFIG, Fore.RED + line + Fore.RESET) echo(colorama.Fore.RED + line + colorama.Fore.RESET)
elif match(r"^\^", line) is not None: elif re.match(r"^\^", line) is not None:
echo(CLI_CONFIG, Fore.BLUE + line + Fore.RESET) echo(colorama.Fore.BLUE + line + colorama.Fore.RESET)
else: else:
echo(CLI_CONFIG, line) echo(line)
echo(CLI_CONFIG, "") echo("")
click.confirm("Write modifications to cluster?", abort=True) click.confirm("Write modifications to cluster?", abort=True)
@@ -4913,7 +4897,7 @@ def cli_provisioner_ova_upload(name, filename, pool):
""" """
if not os.path.exists(filename): if not os.path.exists(filename):
echo(CLI_CONFIG, "ERROR: File '{}' does not exist!".format(filename)) echo("ERROR: File '{}' does not exist!".format(filename))
exit(1) exit(1)
params = dict() params = dict()
@@ -5277,7 +5261,54 @@ def cli_provisioner_create(
if retcode and wait_flag: if retcode and wait_flag:
task_id = retdata task_id = retdata
retdata = wait_for_provisioner(CLI_CONFIG, task_id)
echo("Task ID: {}".format(task_id))
echo("")
# Wait for the task to start
echo("Waiting for task to start...", nl=False)
while True:
time.sleep(1)
task_status = pvc.lib.provisioner.task_status(
CLI_CONFIG, task_id, is_watching=True
)
if task_status.get("state") != "PENDING":
break
echo(".", nl=False)
echo(" done.")
echo("")
# Start following the task state, updating progress as we go
total_task = task_status.get("total")
with click.progressbar(length=total_task, show_eta=False) as bar:
last_task = 0
maxlen = 0
while True:
time.sleep(1)
if task_status.get("state") != "RUNNING":
break
if task_status.get("current") > last_task:
current_task = int(task_status.get("current"))
bar.update(current_task - last_task)
last_task = current_task
# The extensive spaces at the end cause this to overwrite longer previous messages
curlen = len(str(task_status.get("status")))
if curlen > maxlen:
maxlen = curlen
lendiff = maxlen - curlen
overwrite_whitespace = " " * lendiff
echo(
" " + task_status.get("status") + overwrite_whitespace,
nl=False,
)
task_status = pvc.lib.provisioner.task_status(
CLI_CONFIG, task_id, is_watching=True
)
if task_status.get("state") == "SUCCESS":
bar.update(total_task - last_task)
echo("")
retdata = task_status.get("state") + ": " + task_status.get("status")
finish(retcode, retdata) finish(retcode, retdata)
@@ -5566,7 +5597,7 @@ def cli_connection_detail(
envvar="PVC_UNSAFE", envvar="PVC_UNSAFE",
is_flag=True, is_flag=True,
default=False, default=False,
help='Perform unsafe operations without confirmation/"--yes" argument.', help='Allow unsafe operations without confirmation/"--yes" argument.',
) )
@click.option( @click.option(
"--colour", "--colour",
@@ -5618,8 +5649,11 @@ def cli(
global CLI_CONFIG global CLI_CONFIG
store_data = get_store(store_path) store_data = get_store(store_path)
# If no connection is specified, use the first connection in the store
if _connection is None:
CLI_CONFIG = get_config(store_data, list(store_data.keys())[0])
# If the connection isn't in the store, mark it bad but pass the value # If the connection isn't in the store, mark it bad but pass the value
if _connection is not None and _connection not in store_data.keys(): elif _connection not in store_data.keys():
CLI_CONFIG = {"badcfg": True, "connection": _connection} CLI_CONFIG = {"badcfg": True, "connection": _connection}
else: else:
CLI_CONFIG = get_config(store_data, _connection) CLI_CONFIG = get_config(store_data, _connection)

View File

@@ -159,8 +159,6 @@ def cli_cluster_status_format_pretty(CLI_CONFIG, data):
vms_strings = list() vms_strings = list()
for state in vm_states: for state in vm_states:
if data.get("vms", {}).get(state) is None:
continue
if state in ["start"]: if state in ["start"]:
state_colour = ansii["green"] state_colour = ansii["green"]
elif state in ["migrate", "disable"]: elif state in ["migrate", "disable"]:

View File

@@ -20,7 +20,6 @@
############################################################################### ###############################################################################
from click import echo as click_echo from click import echo as click_echo
from click import progressbar
from distutils.util import strtobool from distutils.util import strtobool
from json import load as jload from json import load as jload
from json import dump as jdump from json import dump as jdump
@@ -28,12 +27,9 @@ from os import chmod, environ, getpid, path
from socket import gethostname from socket import gethostname
from sys import argv from sys import argv
from syslog import syslog, openlog, closelog, LOG_AUTH from syslog import syslog, openlog, closelog, LOG_AUTH
from time import sleep
from yaml import load as yload from yaml import load as yload
from yaml import BaseLoader from yaml import BaseLoader
import pvc.lib.provisioner
DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvcapid.yaml"} DEFAULT_STORE_DATA = {"cfgfile": "/etc/pvc/pvcapid.yaml"}
DEFAULT_STORE_FILENAME = "pvc.json" DEFAULT_STORE_FILENAME = "pvc.json"
@@ -182,60 +178,3 @@ def update_store(store_path, store_data):
with open(store_file, "w") as fh: with open(store_file, "w") as fh:
jdump(store_data, fh, sort_keys=True, indent=4) jdump(store_data, fh, sort_keys=True, indent=4)
def wait_for_provisioner(CLI_CONFIG, task_id):
"""
Wait for a provisioner task to complete
"""
echo(CLI_CONFIG, f"Task ID: {task_id}")
echo(CLI_CONFIG, "")
# Wait for the task to start
echo(CLI_CONFIG, "Waiting for task to start...", newline=False)
while True:
sleep(1)
task_status = pvc.lib.provisioner.task_status(
CLI_CONFIG, task_id, is_watching=True
)
if task_status.get("state") != "PENDING":
break
echo(".", newline=False)
echo(CLI_CONFIG, " done.")
echo(CLI_CONFIG, "")
# Start following the task state, updating progress as we go
total_task = task_status.get("total")
with progressbar(length=total_task, show_eta=False) as bar:
last_task = 0
maxlen = 0
while True:
sleep(1)
if task_status.get("state") != "RUNNING":
break
if task_status.get("current") > last_task:
current_task = int(task_status.get("current"))
bar.update(current_task - last_task)
last_task = current_task
# The extensive spaces at the end cause this to overwrite longer previous messages
curlen = len(str(task_status.get("status")))
if curlen > maxlen:
maxlen = curlen
lendiff = maxlen - curlen
overwrite_whitespace = " " * lendiff
echo(
CLI_CONFIG,
" " + task_status.get("status") + overwrite_whitespace,
newline=False,
)
task_status = pvc.lib.provisioner.task_status(
CLI_CONFIG, task_id, is_watching=True
)
if task_status.get("state") == "SUCCESS":
bar.update(total_task - last_task)
echo(CLI_CONFIG, "")
retdata = task_status.get("state") + ": " + task_status.get("status")
return retdata

View File

@@ -700,7 +700,7 @@ def format_info(config, network_information, long_output):
ainformation.append("") ainformation.append("")
if retcode: if retcode:
dhcp4_reservations_string = format_list_dhcp( dhcp4_reservations_string = format_list_dhcp(
config, dhcp4_reservations_list dhcp4_reservations_list
) )
for line in dhcp4_reservations_string.split("\n"): for line in dhcp4_reservations_string.split("\n"):
ainformation.append(line) ainformation.append(line)

View File

@@ -1017,13 +1017,13 @@ def vm_volumes_add(config, vm, volume, disk_id, bus, disk_type, live, restart):
from lxml.objectify import fromstring from lxml.objectify import fromstring
from lxml.etree import tostring from lxml.etree import tostring
from copy import deepcopy from copy import deepcopy
import pvc.lib.storage as pvc_storage import pvc.lib.ceph as pvc_ceph
if disk_type == "rbd": if disk_type == "rbd":
# Verify that the provided volume is valid # Verify that the provided volume is valid
vpool = volume.split("/")[0] vpool = volume.split("/")[0]
vname = volume.split("/")[1] vname = volume.split("/")[1]
retcode, retdata = pvc_storage.ceph_volume_info(config, vpool, vname) retcode, retdata = pvc_ceph.ceph_volume_info(config, vpool, vname)
if not retcode: if not retcode:
return False, "Volume {} is not present in the cluster.".format(volume) return False, "Volume {} is not present in the cluster.".format(volume)

View File

@@ -2,7 +2,7 @@ from setuptools import setup
setup( setup(
name="pvc", name="pvc",
version="0.9.68", version="0.9.63",
packages=["pvc.cli", "pvc.lib"], packages=["pvc.cli", "pvc.lib"],
install_requires=[ install_requires=[
"Click", "Click",

49
debian/changelog vendored
View File

@@ -1,52 +1,3 @@
pvc (0.9.68-0) unstable; urgency=high
* [CLI] Fixes another bug with network info view
-- Joshua M. Boniface <joshua@boniface.me> Sun, 27 Aug 2023 20:59:23 -0400
pvc (0.9.67-0) unstable; urgency=high
* [CLI] Fixes several more bugs in the refactored CLI
-- Joshua M. Boniface <joshua@boniface.me> Sun, 27 Aug 2023 14:47:20 -0400
pvc (0.9.66-0) unstable; urgency=high
* [CLI] Fixes a missing YAML import in CLI
-- Joshua M. Boniface <joshua@boniface.me> Sun, 27 Aug 2023 11:36:05 -0400
pvc (0.9.65-0) unstable; urgency=high
* [CLI] Fixes a bug in the node list filtering command
* [CLI] Fixes a bug/default when no connection is specified
-- Joshua M. Boniface <joshua@boniface.me> Wed, 23 Aug 2023 01:56:57 -0400
pvc (0.9.64-0) unstable; urgency=high
**Breaking Change [CLI]**: The CLI client root commands have been reorganized. The following commands have changed:
* `pvc cluster` -> `pvc connection` (all subcommands)
* `pvc task` -> `pvc cluster` (all subcommands)
* `pvc maintenance` -> `pvc cluster maintenance`
* `pvc status` -> `pvc cluster status`
Ensure you have updated to the latest version of the PVC Ansible repository before deploying this version or using PVC Ansible oneshot playbooks for management.
**Breaking Change [CLI]**: The `--restart` option for VM configuration changes now has an explicit `--no-restart` to disable restarting, or a prompt if neither is specified; `--unsafe` no longer bypasses this prompt which was a bug. Applies to most `vm <cmd> set` commands like `vm vcpu set`, `vm memory set`, etc. All instances also feature restart confirmation afterwards, which, if `--restart` is provided, will prompt for confirmation unless `--yes` or `--unsafe` is specified.
**Breaking Change [CLI]**: The `--long` option previously on some `info` commands no longer exists; use `-f long`/`--format long` instead.
* [CLI] Significantly refactors the CLI client code for consistency and cleanliness
* [CLI] Implements `-f`/`--format` options for all `list` and `info` commands in a consistent way
* [CLI] Changes the behaviour of VM modification options with "--restart" to provide a "--no-restart"; defaults to a prompt if neither is specified and ignores the "--unsafe" global entirely
* [API] Fixes several bugs in the 3-debootstrap.py provisioner example script
* [Node] Fixes some bugs around VM shutdown on node flush
* [Documentation] Adds mentions of Ganeti and Harvester
-- Joshua M. Boniface <joshua@boniface.me> Fri, 18 Aug 2023 12:20:43 -0400
pvc (0.9.63-0) unstable; urgency=high pvc (0.9.63-0) unstable; urgency=high
* Mentions Ganeti in the docs * Mentions Ganeti in the docs

View File

@@ -49,7 +49,7 @@ import re
import json import json
# Daemon version # Daemon version
version = "0.9.68" version = "0.9.63"
########################################################## ##########################################################

View File

@@ -790,19 +790,6 @@ class NodeInstance(object):
self.flush_stopper = False self.flush_stopper = False
return return
# Wait for a VM in "restart" or "shutdown" state to complete transition
while self.zkhandler.read(("domain.state", dom_uuid)) in [
"restart",
"shutdown",
]:
self.logger.out(
'Waiting 2s for VM state change completion for VM "{}"'.format(
dom_uuid
),
state="i",
)
time.sleep(2)
self.logger.out( self.logger.out(
'Selecting target to migrate VM "{}"'.format(dom_uuid), state="i" 'Selecting target to migrate VM "{}"'.format(dom_uuid), state="i"
) )
@@ -819,13 +806,11 @@ class NodeInstance(object):
if target_node is None: if target_node is None:
self.logger.out( self.logger.out(
'Failed to find migration target for running VM "{}"; shutting down and setting autostart flag'.format( 'Failed to find migration target for VM "{}"; shutting down and setting autostart flag'.format(
dom_uuid dom_uuid
), ),
state="e", state="e",
) )
if self.zkhandler.read(("domain.state", dom_uuid)) in ["start"]:
self.zkhandler.write( self.zkhandler.write(
[ [
(("domain.state", dom_uuid), "shutdown"), (("domain.state", dom_uuid), "shutdown"),

View File

@@ -1,54 +1,31 @@
#!/usr/bin/env bash #!/usr/bin/env bash
set -o errexit
if [[ -z ${1} ]]; then if [[ -z ${1} ]]; then
echo "Please specify a cluster to run tests against." echo "Please specify a cluster to run tests against."
exit 1 exit 1
fi fi
test_cluster="${1}" test_cluster="${1}"
shift
if [[ ${1} == "--test-dangerously" ]]; then
test_dangerously="y"
else
test_dangerously=""
fi
_pvc() { _pvc() {
echo "> pvc --connection ${test_cluster} $@" echo "> pvc --cluster ${test_cluster} $@"
pvc --quiet --connection ${test_cluster} "$@" pvc --quiet --cluster ${test_cluster} "$@"
sleep 1 sleep 1
} }
time_start=$(date +%s) time_start=$(date +%s)
set -o errexit
pushd $( git rev-parse --show-toplevel ) &>/dev/null
# Cluster tests # Cluster tests
_pvc connection list _pvc maintenance on
_pvc connection detail _pvc maintenance off
_pvc cluster maintenance on
_pvc cluster maintenance off
_pvc cluster status
backup_tmp=$(mktemp) backup_tmp=$(mktemp)
_pvc cluster backup --file ${backup_tmp} _pvc task backup --file ${backup_tmp}
if [[ -n ${test_dangerously} ]]; then _pvc task restore --yes --file ${backup_tmp}
# This is dangerous, so don't test it unless option given
_pvc cluster restore --yes --file ${backup_tmp}
fi
rm ${backup_tmp} || true rm ${backup_tmp} || true
# Provisioner tests # Provisioner tests
_pvc provisioner profile list test || true _pvc provisioner profile list test
_pvc provisioner template system add --vcpus 1 --vram 1024 --serial --vnc --vnc-bind 0.0.0.0 --node-limit hv1 --node-selector mem --node-autostart --migration-method live system-test || true
_pvc provisioner template network add network-test || true
_pvc provisioner template network vni add network-test 10000 || true
_pvc provisioner template storage add storage-test || true
_pvc provisioner template storage disk add --pool vms --size 8 --filesystem ext4 --mountpoint / storage-test sda || true
_pvc provisioner script add script-test $( find . -name "3-debootstrap.py" ) || true
_pvc provisioner profile add --profile-type provisioner --system-template system-test --network-template network-test --storage-template storage-test --userdata empty --script script-test --script-arg deb_release=bullseye test || true
_pvc provisioner create --wait testx test _pvc provisioner create --wait testx test
sleep 30 sleep 30
@@ -59,7 +36,7 @@ _pvc vm shutdown --yes --wait testx
_pvc vm start testx _pvc vm start testx
sleep 30 sleep 30
_pvc vm stop --yes testx _pvc vm stop --yes testx
_pvc vm disable --yes testx _pvc vm disable testx
_pvc vm undefine --yes testx _pvc vm undefine --yes testx
_pvc vm define --target hv3 --tag pvc-test ${vm_tmp} _pvc vm define --target hv3 --tag pvc-test ${vm_tmp}
_pvc vm start testx _pvc vm start testx
@@ -72,21 +49,21 @@ _pvc vm unmigrate --wait testx
sleep 5 sleep 5
_pvc vm move --wait --target hv1 testx _pvc vm move --wait --target hv1 testx
sleep 5 sleep 5
_pvc vm meta testx --limit hv1 --node-selector vms --method live --profile test --no-autostart _pvc vm meta testx --limit hv1 --selector vms --method live --profile test --no-autostart
_pvc vm tag add testx mytag _pvc vm tag add testx mytag
_pvc vm tag get testx _pvc vm tag get testx
_pvc vm list --tag mytag _pvc vm list --tag mytag
_pvc vm tag remove testx mytag _pvc vm tag remove testx mytag
_pvc vm network get testx _pvc vm network get testx
_pvc vm vcpu set --no-restart testx 4 _pvc vm vcpu set testx 4
_pvc vm vcpu get testx _pvc vm vcpu get testx
_pvc vm memory set --no-restart testx 4096 _pvc vm memory set testx 4096
_pvc vm memory get testx _pvc vm memory get testx
_pvc vm vcpu set --no-restart testx 2 _pvc vm vcpu set testx 2
_pvc vm memory set testx 2048 --restart --yes _pvc vm memory set testx 2048 --restart --yes
sleep 15 sleep 5
_pvc vm list testx _pvc vm list testx
_pvc vm info --format long testx _pvc vm info --long testx
rm ${vm_tmp} || true rm ${vm_tmp} || true
# Node tests # Node tests
@@ -100,7 +77,6 @@ _pvc node flush --wait hv1
_pvc node ready --wait hv1 _pvc node ready --wait hv1
_pvc node list hv1 _pvc node list hv1
_pvc node info hv1 _pvc node info hv1
sleep 15
# Network tests # Network tests
_pvc network add 10001 --description testing --type managed --domain testing.local --ipnet 10.100.100.0/24 --gateway 10.100.100.1 --dhcp --dhcp-start 10.100.100.100 --dhcp-end 10.100.100.199 _pvc network add 10001 --description testing --type managed --domain testing.local --ipnet 10.100.100.0/24 --gateway 10.100.100.1 --dhcp --dhcp-start 10.100.100.100 --dhcp-end 10.100.100.199
@@ -108,7 +84,7 @@ sleep 5
_pvc vm network add --restart --yes testx 10001 _pvc vm network add --restart --yes testx 10001
sleep 30 sleep 30
_pvc vm network remove --restart --yes testx 10001 _pvc vm network remove --restart --yes testx 10001
sleep 15 sleep 5
_pvc network acl add 10001 --in --description test-acl --order 0 --rule "'ip daddr 10.0.0.0/8 counter'" _pvc network acl add 10001 --in --description test-acl --order 0 --rule "'ip daddr 10.0.0.0/8 counter'"
_pvc network acl list 10001 _pvc network acl list 10001
@@ -119,34 +95,31 @@ _pvc network dhcp remove --yes 10001 12:34:56:78:90:ab
_pvc network modify --domain test10001.local 10001 _pvc network modify --domain test10001.local 10001
_pvc network list _pvc network list
_pvc network info --format long 10001 _pvc network info --long 10001
# Network-VM interaction tests # Network-VM interaction tests
_pvc vm network add testx 10001 --model virtio --restart --yes _pvc vm network add testx 10001 --model virtio --restart --yes
sleep 30 sleep 30
_pvc vm network get testx _pvc vm network get testx
_pvc vm network remove testx 10001 --restart --yes _pvc vm network remove testx 10001 --restart --yes
sleep 15 sleep 5
_pvc network remove --yes 10001 _pvc network remove --yes 10001
# Storage tests # Storage tests
_pvc storage status _pvc storage status
_pvc storage util _pvc storage util
if [[ -n ${test_dangerously} ]]; then _pvc storage osd set noout
# This is dangerous, so don't test it unless option given _pvc storage osd out 0
_pvc storage osd set noout _pvc storage osd in 0
_pvc storage osd out 0 _pvc storage osd unset noout
_pvc storage osd in 0
_pvc storage osd unset noout
fi
_pvc storage osd list _pvc storage osd list
_pvc storage pool add testing 64 --replcfg "copies=3,mincopies=2" _pvc storage pool add testing 64 --replcfg "copies=3,mincopies=2"
sleep 5 sleep 5
_pvc storage pool list _pvc storage pool list
_pvc storage volume add testing testx 1G _pvc storage volume add testing testx 1G
_pvc storage volume resize --yes testing testx 2G _pvc storage volume resize testing testx 2G
_pvc storage volume rename --yes testing testx testerX _pvc storage volume rename testing testx testerX
_pvc storage volume clone testing testerX testerY _pvc storage volume clone testing testerX testerY
_pvc storage volume list --pool testing _pvc storage volume list --pool testing
_pvc storage volume snapshot add testing testerX asnapshotX _pvc storage volume snapshot add testing testerX asnapshotX
@@ -159,7 +132,7 @@ _pvc vm volume add testx --type rbd --disk-id sdh --bus scsi testing/testerY --r
sleep 30 sleep 30
_pvc vm volume get testx _pvc vm volume get testx
_pvc vm volume remove testx testing/testerY --restart --yes _pvc vm volume remove testx testing/testerY --restart --yes
sleep 15 sleep 5
_pvc storage volume remove --yes testing testerY _pvc storage volume remove --yes testing testerY
_pvc storage volume remove --yes testing testerX _pvc storage volume remove --yes testing testerX
@@ -169,14 +142,6 @@ _pvc storage pool remove --yes testing
_pvc vm stop --yes testx _pvc vm stop --yes testx
_pvc vm remove --yes testx _pvc vm remove --yes testx
_pvc provisioner profile remove --yes test
_pvc provisioner script remove --yes script-test
_pvc provisioner template system remove --yes system-test
_pvc provisioner template network remove --yes network-test
_pvc provisioner template storage remove --yes storage-test
popd
time_end=$(date +%s) time_end=$(date +%s)
echo echo