Compare commits

..

196 Commits

Author SHA1 Message Date
d2757004db Store additional OSD information in ZK
Ensures that information like the FSIDs and the OSD LVM volume are
stored in Zookeeper at creation time and updated at daemon start time
(to ensure the data is populated at least once, or if the /dev/sdX
path changes).

This will allow safer operation of OSD removals and the potential
implementation of re-activation after node replacements.
2022-05-02 12:11:39 -04:00
7323269775 Ensure initial OSD stats is populated
Values are all invalid but this ensures the client won't error out when
trying to show an OSD that has never checked in yet.
2022-04-29 16:50:30 -04:00
85463f9aec Bump version to 0.9.48 2022-04-29 15:03:52 -04:00
19c37c3ed5 Fix bugs with forced removal 2022-04-29 14:03:07 -04:00
7d2ea494e7 Ensure unresponsive OSDs still display in list
It is still useful to see such dead OSDs even if they've never checked
in or have not checked in for quite some time.
2022-04-29 12:11:52 -04:00
cb50eee2a9 Add OSD removal force option
Ensures a removal can continue even in situations where some step(s)
might fail, for instance removing an obsolete OSD from a replaced node.
2022-04-29 11:16:33 -04:00
f3f4eaadf1 Use a singular configured cluster by default
If there is...
  1. No '--cluster' passed, and
  2. No 'local' cluster, and
  3. There is exactly one cluster configured
...then use that cluster by default in the CLI.
2022-01-13 18:36:20 -05:00
313a5d1c7d Bump version to 0.9.47 2021-12-28 22:03:08 -05:00
b6d689b769 Add pool PGs count modification
Allows an administrator to adjust the PG count of a given pool. This can
be used to increase the PGs (for example after adding more OSDs) or
decrease it (to remove OSDs, reduce CPU load, etc.).
2021-12-28 21:53:29 -05:00
a0fccf83f7 Add PGs count to pool list 2021-12-28 21:12:02 -05:00
46896c593e Fix issue if pool stats have not updated yet 2021-12-28 21:03:10 -05:00
02138974fa Add device class tiers to Ceph pools
Allows specifying a particular device class ("tier") for a given pool,
for instance SSD-only or NVMe-only. This is implemented with Crush
rules on the Ceph side, and via an additional new key in the pool
Zookeeper schema which is defaulted to "default".
2021-12-28 20:58:15 -05:00
c3d255be65 Bump version to 0.9.46 2021-12-28 15:02:14 -05:00
45fc8a47a3 Allow single-node clusters to restart and timeout
Prevents a daemon from waiting forever to terminate if it is primary,
and avoids this entirely if there is only a single node in the cluster.
2021-12-28 03:06:03 -05:00
07f2006f68 Fix bug when removing OSDs
Ensure the OSD is down as well as out or purge might fail.
2021-12-28 03:05:34 -05:00
f4c7fdffb8 Handle detect strings as arguments for blockdevs
Allows specifying blockdevs in the OSD and OSD-DB addition commands as
detect strings rather than actual block device paths. This provides
greater flexibility for automation with pvcbootstrapd (which originates
the concept of detect strings) and in general usage as well.
2021-12-28 02:53:02 -05:00
be1b67b8f0 Allow bypassing confirm message for benchmarks 2021-12-23 21:00:42 -05:00
d68f6a945e Add auditing to local syslog from PVC client
This ensures that any client command is logged by the local system.
Helps ensure Accounting for users of the CLI. Currently logs the full
command executed along with the $USER environment variable contents.
2021-12-10 16:17:33 -05:00
c776aba8b3 Standardize fuzzy matching and use fullmatch
Solves two problems:

1. How match fuzziness was used was very inconsistent; make them all the
same, i.e. "if is_fuzzy and limit, apply .* to both sides".

2. Use re.fullmatch instead of re.match to ensure exact matching of the
regex to the value. Without fuzziness, this would sometimes cause
inconsistent behavior, for instance if a limit was non-fuzzy "vm",
expecting to match the actual "vm", but also matching "vm1" too.
2021-12-06 16:35:29 -05:00
2461941421 Remove "and started" from message text
This is not necessarily the case.
2021-11-29 16:42:26 -05:00
68954a79ec Fix bug with cloned image sizes 2021-11-29 14:56:50 -05:00
a2fa6ed450 Fix bugs with legacy benchmark format 2021-11-26 11:42:35 -05:00
02a2f6a27a Bump version to 0.9.45 2021-11-25 09:34:20 -05:00
a75b951605 Ensure echo always has an argument 2021-11-25 09:33:26 -05:00
658e80350f Fix ordering of pvcnoded unit
We want to be after network.target and want network-online.target
2021-11-18 16:56:49 -05:00
3aa20fbaa3 Bump version to 0.9.44 2021-11-11 16:20:38 -05:00
6d101df1ff Add Munin plugin for Ceph utilization 2021-11-08 15:21:09 -05:00
be6a3992c1 Add 0.05s to connection timeout
This is recommended by the Python Requests documentation:

> It’s a good practice to set connect timeouts to slightly larger than a
  multiple of 3, which is the default TCP packet retransmission window.
2021-11-08 03:11:41 -05:00
d76da0f25a Use separate connect and data timeouts
This allows us to keep a very low connect timeout of 3 seconds, but also
ensure that long commands (e.g. --wait or VM disable) can take as long
as the API requires to complete.

Avoids having to explicitly set very long single-instance timeouts for
other functions which would block forever on an unreachable API.
2021-11-08 03:10:09 -05:00
bc722ce9b8 Fix quote in sed for unstable deb build 2021-11-08 02:54:27 -05:00
7890c32c59 Add sudo to deploy-package task 2021-11-08 02:41:10 -05:00
6febcfdd97 Bump version to 0.9.43 2021-11-08 02:29:17 -05:00
11d8ce70cd Fix sed commands after Black formatting change 2021-11-08 02:29:05 -05:00
a17d9439c0 Remove references to Ansible manual 2021-11-08 00:29:47 -05:00
9cd02eb148 Remove Ansible and Testing manuals
The Ansible manual can't keep up with the other repo, so it should live
there instead (eventually, after significant rewrites).

The Testing page is obsoleted by the "test-cluster" script.
2021-11-08 00:25:27 -05:00
459485c202 Allow American spelling for compatibility 2021-11-08 00:09:59 -05:00
9f92d5d822 Shorten help messages slightly to fit 2021-11-08 00:07:21 -05:00
947ac561c8 Add forced colour support
Allows preserving colour within e.g. watch, where Click would normally
determine that it is "not a terminal". This is done via the wrapper echo
which filters via the local config.
2021-11-08 00:04:20 -05:00
ca143c1968 Add funding configuration 2021-11-06 18:05:17 -04:00
6e110b178c Add start delineators to command output 2021-11-06 13:35:30 -04:00
d07d37d08e Revamp formatting and linting on commit
Remove the prepare script, and run the two stages manually. Better
handle Black reformatting by doing a check (for the errcode) then
reformat and abort commit to review.
2021-11-06 13:34:33 -04:00
0639b16c86 Apply more granular timeout formatting
We don't need to wait forever if state changes aren't waiting or disable
(which does a shutdown before returning).
2021-11-06 13:34:03 -04:00
1cf8706a52 Up timeout when setting VM state
Ensures the API won't time out immediately especially during a
wait-flagged or disable action.
2021-11-06 04:15:10 -04:00
dd8f07526f Use positive check rather than negative
Ensure the VM is start before doing shutdown/stop, rather than being
stopped. Prevents overwrite of existing disable state and other
weirdness.
2021-11-06 04:08:33 -04:00
5a5e5da663 Add disable forcing to CLI
References #148
2021-11-06 04:02:50 -04:00
739b60b91e Perform automatic shutdown/stop on VM disable
Instead of requiring the VM to already be stopped, instead allow disable
state changes to perform a shutdown first. Also add a force option which
will do a hard stop instead of a shutdown.

References #148
2021-11-06 03:57:24 -04:00
16544227eb Reformat recent changes with Black 2021-11-06 03:27:07 -04:00
73e3746885 Fix linting error F541 f-string placeholders 2021-11-06 03:26:03 -04:00
66230ce971 Fix linting errors F522/F523 unused args 2021-11-06 03:24:50 -04:00
fbfbd70461 Rename build-deb.sh to build-stable-deb.sh
Unifies the naming with the other build-unstable-deb.sh script.
2021-11-06 03:18:58 -04:00
2506098223 Remove obsolete gitlab-ci config 2021-11-06 03:18:22 -04:00
83e887c4ee Ensure all helper scripts pushd/popd
Make sure all of these move to the root of the repository first, then
return to where they were afterwards, using pushd/popd. This allows them
to be executed from anywhere in the repo.
2021-11-06 03:17:47 -04:00
4eb0f3bb8a Unify formatting and linting
Ensures optimal formatting in addition to linting during manual deploys
and during pre-commit actions.
2021-11-06 03:10:17 -04:00
adc767e32f Add newline to start of lint 2021-11-06 03:04:14 -04:00
2083fd824a Reformat code with Black code formatter
Unify the code style along PEP and Black principles using the tool.
2021-11-06 03:02:43 -04:00
3aa74a3940 Add safe mode to Black 2021-11-06 02:59:54 -04:00
71d94bbeab Move Flake configuration into dedicated file
Avoid passing arguments in the script.
2021-11-06 02:55:37 -04:00
718f689df9 Clean up linter after Black add (pass two) 2021-11-06 02:51:14 -04:00
268b5c0b86 Exclude Alembic migrations from Black
These files are autogenerated with their own formats, so we don't want
to override that.
2021-11-06 02:46:06 -04:00
b016b9bf3d Clean up linter after Black add (pass one) 2021-11-06 02:44:24 -04:00
7604b9611f Add black formatter to project root 2021-11-06 02:44:05 -04:00
b21278fd80 Add Basic Builder configuration
Configuration for my new CI system under Gitea.
2021-10-31 00:09:55 -04:00
3b02034b70 Add some delay and additional tries to fencing 2021-10-27 16:24:17 -04:00
c7a5b41b1e Fix ordering to show correct message 2021-10-27 13:37:52 -04:00
48b0091d3e Support adding the same network to a VM again
This is a supported configuration for some edge cases and should be
allowed.
2021-10-27 13:33:27 -04:00
2e94516ee2 Reorder linting on build-and-deploy 2021-10-27 13:25:14 -04:00
d7f26b27ea More gracefully handle restart + live
Instead of erroring, just use the implication that restarting a VM does
not want a live modification, and proceed from there. Update the help
text to match.
2021-10-27 13:23:39 -04:00
872f35a7ee Support removing VM interfaces by MAC
Provides a way to handle multiple interfaces in the same network
gracefully, while making the previous behaviour explicit.
2021-10-27 13:20:05 -04:00
52c3e8ced3 Fix bad test in postinst 2021-10-19 00:27:12 -04:00
1d7acf62bf Fix bad location of config sets 2021-10-12 17:23:04 -04:00
c790c331a7 Also validate on failures 2021-10-12 17:11:03 -04:00
23165482df Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
057071a7b7 Go back to passing if exception
Validation already happened and the set happens again later.
2021-10-12 14:21:52 -04:00
554fa9f412 Use current live value for bridge_mtu
This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are.
2021-10-12 12:24:03 -04:00
5a5f924268 Use power off in fence instead of reset
Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
2021-10-12 11:04:27 -04:00
cc309fc021 Validate network MTU after initial read 2021-10-12 10:53:17 -04:00
5f783f1663 Make cluster example images clickable 2021-10-12 03:15:04 -04:00
bc89bb5b68 Mention fencing only in run state 2021-10-12 03:05:01 -04:00
eb233ef588 Adjust more wording and fix typos 2021-10-12 03:00:21 -04:00
d3efb54cb4 Adjust some wording 2021-10-12 02:54:16 -04:00
da15357c8a Remove codeql setup
I don't use this for anything useful, so disable it since a run takes
ages.
2021-10-12 02:51:19 -04:00
b6939a28c0 Fix formatting of subsection 2021-10-12 02:49:40 -04:00
a1da479a4c Add reference to Ansible manual 2021-10-12 02:48:47 -04:00
ace4082820 Fix spelling errors 2021-10-12 02:47:31 -04:00
4036af6045 Fix link to cluster architecture docs 2021-10-12 02:41:22 -04:00
f96de97861 Adjust getting started docs
Update the docs with the current information on setting up a cluster,
including simplifying the Ansible configuration to use the new
create-local-repo.sh script, and simplifying some other sections.
2021-10-12 02:39:25 -04:00
04cad46305 Default to removing build artifacts in b-a-d.sh 2021-10-11 16:41:00 -04:00
e9dea4d2d1 Add explicit 3 second timeout to requests 2021-10-11 16:31:18 -04:00
39fd85fcc3 Add version function support to CLI 2021-10-11 15:34:41 -04:00
cbbab46b55 Add new configs for Ansible 2021-10-11 14:44:18 -04:00
d1f2ce0b0a Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
2f01edca14 Add bridge_mtu config to docs 2021-10-09 19:28:50 -04:00
12a3a3a6a6 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
c44732be83 Avoid duplicate runs of MTU set
It wasn't the validator duplicating, but the update duplicating, so
avoid that happening properly this time.
2021-10-09 19:21:47 -04:00
a8b68e0968 Revert "Avoid duplicate runs of MTU validator"
This reverts commit 56021c443a.
2021-10-09 19:11:42 -04:00
e59152afee Set all log messages to information state
None of these were "success" messages and thus shouldn't have been ok
state.
2021-10-09 19:09:38 -04:00
56021c443a Avoid duplicate runs of MTU validator 2021-10-09 19:07:41 -04:00
ebdea165f1 Use correct isinstance instead of type 2021-10-09 19:03:31 -04:00
fb0651fb05 Move MTU validation to function
Prevents code duplication and ensures validation runs when an MTU is
updated, not just on network creation.
2021-10-09 19:01:45 -04:00
35e7e11403 Add logger message when setting MTU 2021-10-09 18:56:18 -04:00
b7555468eb Ensure vx_mtu is always an int() 2021-10-09 18:52:50 -04:00
f1b4ee02ba Fix bad header length in network list 2021-10-09 18:50:32 -04:00
4698edc98e Add MTU value checking and log messages
Ensures that if a specified MTU is more than the maximum it is set to
the maximum instead, and adds warning messages for both situations.
2021-10-09 18:48:56 -04:00
40e7e04aad Fix invalid schema key
Addresses #144
2021-10-09 18:42:33 -04:00
7f074847c4 Add MTU support to network add/modify commands
Addresses #144
2021-10-09 18:06:21 -04:00
b0b0b75605 Have VXNetworkInstance set MTU if unset
Makes this explicit in Zookeeper if a network is unset, post-migration
(schema version 6).

Addresses #144
2021-10-09 17:52:57 -04:00
89f62318bd Add MTU to network creation/modification
Addresses #144
2021-10-09 17:51:32 -04:00
925141ed65 Fix migration bugs and invalid vx_mtu
Addresses #144
2021-10-09 17:35:10 -04:00
f7a826bf52 Add handlers for client network MTUs
Refactors some of the code in VXNetworkInterface to handle MTUs in a
more streamlined fashion. Also fixes a bug whereby bridge client
networks were being explicitly given the cluster dev MTU which might not
be correct. Now adds support for this option explicitly in the configs,
and defaults to 1500 for safety (the standard Ethernet MTU).

Addresses #144
2021-10-09 17:02:27 -04:00
e176f3b2f6 Make n-1 values clearer 2021-10-07 18:11:15 -04:00
b339d5e641 Correct levels in TOC 2021-10-07 18:08:28 -04:00
d476b13cc0 Correct spelling errors 2021-10-07 18:07:06 -04:00
ce8b2c22cc Add documentation sections on IPMI and fencing 2021-10-07 18:05:47 -04:00
feab5d3479 Correct flawed conditional in verify_ipmi 2021-10-07 15:11:19 -04:00
ee348593c9 Bump version to 0.9.40 2021-10-07 14:42:04 -04:00
e403146bcf Correct bad stop_keepalive_timer call 2021-10-07 14:41:12 -04:00
bde684dd3a Remove redundant wording from header 2021-10-07 12:20:04 -04:00
992e003500 Replace headers with links in CHANGELOG.md 2021-10-07 12:17:44 -04:00
eaeb860a83 Add missing period to changelog sentence 2021-10-07 12:10:35 -04:00
1198ca9f5c Move changelog into dedicated file
The changelog was getting far too long for the README/docs index to
support, so move it into CHANGELOG.md and link to it instead.
2021-10-07 12:09:26 -04:00
e79d200244 Bump version to 0.9.39 2021-10-07 11:52:38 -04:00
5b3bb9f306 Add linting to build-and-deploy
Ensures that bad code isn't deployed during testing.
2021-10-07 11:51:05 -04:00
5501586a47 Add limit negation to VM list
When using the "state", "node", or "tag" arguments to a VM list, add
support for a "negate" flag to look for all VMs *not in* the state,
node, or tag state.
2021-10-07 11:50:52 -04:00
c160648c5c Add note about fencing at remote sites 2021-10-04 19:58:08 -04:00
fa37227127 Correct TOC in architecture page 2021-10-04 01:54:22 -04:00
2cac98963c Correct spelling errors 2021-10-04 01:51:58 -04:00
8e50428707 Double image sizes for example clusters 2021-10-04 01:47:35 -04:00
a4953bc6ef Adjust toc_depth for RTD theme 2021-10-04 01:45:05 -04:00
3c10d57148 Revamp about and architecture docs
Makes these a little simpler to follow and provides some more up-to-date
information based on recent tests and developments.
2021-10-04 01:42:08 -04:00
26d8551388 Adjust bump-version changelog heading level 2021-10-04 01:41:48 -04:00
57342541dd Move changelog headers down one more level 2021-10-04 01:41:22 -04:00
50f8afd749 Adjust indent of index/README versions 2021-10-04 00:33:24 -04:00
3449069e3d Bump version to 0.9.38 2021-10-03 22:32:41 -04:00
cb66b16045 Correct latency units and format name 2021-10-03 17:06:34 -04:00
8edce74b85 Revamp test result display
Instead of showing CLAT percentiles, which are very hard to interpret
and understand, instead use the main latency buckets.
2021-10-03 15:49:01 -04:00
e9b69c4124 Revamp postinst for the API daemon
Ensures that the worker is always restarted and make the NOTE
conditional more specific.
2021-10-03 15:15:26 -04:00
3948206225 Tweak fio tests for benchmarks
1. Remove ramp_time as this was giving very strange results.

2. Up the runtime to 75 seconds to compensate.

3. Print the fio command to the console to validate.
2021-10-03 15:06:18 -04:00
a09578fcf5 Add benchmark format to list 2021-10-03 15:05:58 -04:00
73be807b84 Adjust ETA for benchmarks 2021-10-02 04:51:01 -04:00
4a9805578e Add format parsing for format 1 storage benchmarks 2021-10-02 04:46:44 -04:00
f70f052df1 Add version 2 benchmark list formatting 2021-10-02 02:47:17 -04:00
1e8841ce69 Handle benchmark running state properly 2021-10-02 01:54:51 -04:00
9c7d39d523 Fix missing argument in database insert 2021-10-02 01:49:47 -04:00
011490bcca Update to storage benchmark format 1
1. Runs `fio` with the `--format=json` option and removes all terse
format parsing from the results.

2. Adds a 15-second ramp time to minimize wonky ramp-up results.

3. Sets group_reporting, which isn't necessary with only a single job,
but is here for consistency.
2021-10-02 01:41:08 -04:00
8de63b2785 Fix handling of array of information
With a benchmark info we only ever want test one, so pass only that to
the formatter. Simplifies the format function.
2021-10-02 01:28:39 -04:00
8f8f00b2e9 Avoid versioning benchmark lists
This wouldn't work since each individual test is versioned. Instead add
a placeholder for later once additional format(s) are defined.
2021-10-02 01:25:18 -04:00
1daab49b50 Add format option to benchmark info
Allows specifying of raw json or json-pretty formats in addition to the
"pretty" formatted option.
2021-10-02 01:13:50 -04:00
9f6041b9cf Add benchmark format function support
Allows choosing different list and info functions based on the benchmark
version found. Currently only implements "legacy" version 0 with more to
be added.
2021-10-02 01:07:25 -04:00
5b27e438a9 Add test format versioning to storage benchmarks
Adds a test_format database column and a value in the API return for the
test format version, starting at 0 for the existing format as of 0.9.37.

References #143
2021-10-02 00:55:27 -04:00
3e8a85b029 Load benchmark results as JSON
Load the JSON at the API side instead of client side, because that's
what the API doc says it is and it just makes more sense.
2021-09-30 23:40:24 -04:00
19ac1e17c3 Bump version to 0.9.37 2021-09-30 02:08:14 -04:00
252175fb6f Revamp benchmark tests
1. Move to a time-based (60s) benchmark to avoid these taking an absurd
amount of time to show the same information.

2. Eliminate the 256k random benchmarks, since they don't really add
anything.

3. Add in a 4k single-queue benchmark as this might provide valuable
insight into latency.

4. Adjust the output to reflect the above changes.

While this does change the benchmarking, this should not invalidate any
existing benchmarks since most of the test suit is unchanged (especially
the most important 4M sequential and 4K random tests). It simply removes
an unused entry and adds a more helpful one. The time-based change
should not significantly affect the results either, just reduces the
total runtime for long-tests and increase the runtime for quick tests to
provide a better picture.
2021-09-29 20:51:30 -04:00
f39b041471 Add primary node to benchmark job name
Ensures tracking of the current primary node the job was run on, since
this may be relevant for performance reasons.
2021-09-28 09:58:22 -04:00
3b41759262 Add timeouts to queue gets and adjust
Ensure that all keepalive timeouts are set (prevent the queue.get()
actions from blocking forever) and set the thread timeouts to line up as
well. Everything here is thus limited to keepalive_interval seconds
(default 5s) to keep it uniform.
2021-09-27 16:10:27 -04:00
e514eed414 Re-add success log output during migration 2021-09-27 11:50:55 -04:00
b81e70ec18 Fix missing character in log message 2021-09-27 00:49:43 -04:00
c2a473ed8b Simplify VM migration down to 3 steps
Remove two superfluous synchronization steps which are not needed here,
since the exclusive lock handles that situation anyways.

Still does not fix the weird flush->unflush lock timeout bug, but is
better worked-around now due to the cancelling of the other wait freeing
this up and continuing.
2021-09-27 00:03:20 -04:00
5355f6ff48 Work around synchronization lock issues
Make the block on stage C only wait for 900 seconds (15 minutes) to
prevent indefinite blocking.

The issue comes if a VM is being received, and the current unflush is
cancelled for a flush. When this happens, this lock acquisition seems to
block for no obvious reason, and no other changes seem to affect it.
This is certainly some sort of locking bug within Kazoo but I can't
diagnose it as-is. Leave a TODO to look into this again in the future.
2021-09-26 23:26:21 -04:00
bf7823deb5 Improve log messages during VM migration 2021-09-26 23:15:38 -04:00
8ba371723e Use event to non-block wait and fix inf wait 2021-09-26 22:55:39 -04:00
e10ac52116 Track status of VM state thread 2021-09-26 22:55:21 -04:00
341073521b Simplify locking process for VM migration
Rather than using a cumbersome and overly complex ping-pong of read and
write locks, instead move to a much simpler process using exclusive
locks.

Describing the process in ASCII or narrative is cumbersome, but the
process ping-pongs via a set of exclusive locks and wait timers, so that
the two sides are able to synchronize via blocking the exclusive lock.
The end result is a much more streamlined migration (takes about half
the time all things considered) which should be less error-prone.
2021-09-26 22:08:07 -04:00
16c38da5ef Fix failure to connect to libvirt in keepalive
This should be caught and abort the thread rather than failing and
holding up keepalives.
2021-09-26 20:42:01 -04:00
c8134d3a1c Fix several bugs in fence handling
1. Output from ipmitool was not being stripped, and stray newlines were
throwing off the comparisons. Fixes this.

2. Several stages were lacking meaningful messages. Adds these in so the
output is more clear about what is going on.

3. Reduce the sleep time after a fence to just 1x the
keepalive_interval, rather than 2x, because this seemed like excessively
long even for slow IPMI interfaces, especially since we're checking the
power state now anyways.

4. Set the node daemon state to an explicit 'fenced' state after a
successful fence to indicate to users that the node was indeed fenced
successfully and not still 'dead'.
2021-09-26 20:07:30 -04:00
9f41373324 Ensure pvc-flush is after network-online 2021-09-26 17:40:42 -04:00
8e62d5b30b Fix typo in log message 2021-09-26 03:35:30 -04:00
7a8eee244a Tweak CLI helptext around OSD actions
Adds some more detail about OSD commands and their values.
2021-09-26 01:29:23 -04:00
7df5b8e52e Fix typo in sgdisk command options 2021-09-26 00:59:05 -04:00
6f96219023 Use re.search instead of re.match
Required since we're not matching the start of the string.
2021-09-26 00:55:29 -04:00
51967e164b Raise basic exceptions in CephInstance
Avoids no exception to reraise errors on failures.
2021-09-26 00:50:10 -04:00
7a3a44d47c Fix OSD creation for partition paths and fix gdisk
The previous implementation did not work with /dev/nvme devices or any
/dev/disk/by-* devices due to some logical failures in the partition
naming scheme, so fix these, and be explicit about what is supported in
the PVC CLI command output.

The 'echo | gdisk' implementation of partition creation also did not
work due to limitations of subprocess.run; instead, use sgdisk which
allows these commands to be written out explicitly and is included in
the same package as gdisk.
2021-09-26 00:12:28 -04:00
44491dd988 Add support for configurable OSD DB ratios
The default of 0.05 (5%) is likely ideal in the initial implementation,
but allow this to be set explicitly for maximum flexibility in
space-constrained or performance-critical use-cases.
2021-09-24 01:06:39 -04:00
eba142f470 Bump version to 0.9.36 2021-09-23 14:01:38 -04:00
6cef68d157 Add separate OSD DB device support
Adds in three parts:

1. Create an API endpoint to create OSD DB volume groups on a device.
Passed through to the node via the same command pipeline as
creating/removing OSDs, and creates a volume group with a fixed name
(osd-db).

2. Adds API support for specifying whether or not to use this DB volume
group when creating a new OSD via the "ext_db" flag. Naming and sizing
is fixed for simplicity and based on Ceph recommendations (5% of OSD
size). The Zookeeper schema tracks the block device to use during
removal.

3. Adds CLI support for the new and modified API endpoints, as well as
displaying the block device and DB block device in the OSD list.

While I debated supporting adding a DB device to an existing OSD, in
practice this ended up being a very complex operation involving stopping
the OSD and setting some options, so this is not supported; this can be
specified during OSD creation only.

Closes #142
2021-09-23 13:59:49 -04:00
e8caf3369e Move console watcher stop try up
Could cause an exception if d_domain is not defined yet.
2021-09-22 16:02:04 -04:00
3e3776a25b Bump version to 0.9.35 2021-09-13 02:20:46 -04:00
6e0d0e264e Add memory and vCPU checks to VM define/modify
Ensures that a VM won't:

(a) Have provisioned more RAM than there is available on a given node.
Due to memory overprovisioning, this is simply a "is the VM memory count
more than the node count", and doesn't factor in free or used memory on
a node, total cluster usage, etc. So if a node has 64GB total RAM, the
VM limit is 64GB. It is up to an administrator to ensure sanity *below*
that value.

(b) Have provisioned more vCPUs than there are CPU cores on the node,
minus 2 to account for hypervisor/storage processes. Will ensure there
is no severe CPU contention caused by a single VM having more vCPUs than
there are actual execution threads available.

Closes #139
2021-09-13 01:51:21 -04:00
1855d03a36 Add pool size check when resizing volumes
Closes #140
2021-09-12 19:54:51 -04:00
1a286dc8dd Increase build-and-deploy sleep 2021-09-12 19:50:58 -04:00
1b6d10e03a Handle VM disk/network stats gathering exceptions 2021-09-12 19:41:07 -04:00
73c96d1e93 Add VM device hot attach/detach support
Adds a new API endpoint to support hot attach/detach of devices, and the
corresponding client-side logic to use this endpoint when doing VM
network/storage add/remove actions.

The live attach is now the default behaviour for these types of
additions and removals, and can be disabled if needed.

Closes #141
2021-09-12 19:33:00 -04:00
5841c98a59 Adjust lint script for newer linter 2021-09-12 15:40:38 -04:00
bc6395c959 Don't crash cleanup if no this_node 2021-08-29 03:52:18 -04:00
d582f87472 Change default node object state to flushed 2021-08-29 03:34:08 -04:00
e9735113af Bump version to 0.9.34 2021-08-24 16:15:25 -04:00
722fd0a65d Properly handle =-separated fsargs 2021-08-24 11:40:22 -04:00
3b41beb0f3 Convert argument elements of task status to types 2021-08-23 14:28:12 -04:00
d3392c0282 Fix typo in output message 2021-08-23 00:39:19 -04:00
560c013e95 Bump version to 0.9.33 2021-08-21 03:28:48 -04:00
384c6320ef Avoid failing if no provisioner tasks 2021-08-21 03:25:16 -04:00
445dec1c38 Ensure pycache files are removed on deb creation 2021-08-21 03:19:18 -04:00
534c7cd7f0 Refactor pvcnoded to reduce Daemon.py size
This branch commit refactors the pvcnoded component to better adhere to
good programming practices. The previous Daemon.py was a massive file
which contained almost 2000 lines of direct, root-level code which was
directly imported. Not only was this poor practice, but this resulted
in a nigh-unmaintainable file which was hard even for me to understand.

This refactoring splits a large section of the code from Daemon.py into
separate small modules and functions in the `util/` directory. This will
hopefully make most of the functionality easy to find and modify without
having to dig through a single large file.

Further the existing subcomponents have been moved to the `objects/`
directory which clearly separates them.

Finally, the Daemon.py code has mostly been moved into a function,
`entrypoint()`, which is then called from the `pvcnoded.py` stub.

An additional item is that most format strings have been replaced by
f-strings to make use of the Python 3.6 features in Daemon.py and the
utility files.
2021-08-21 03:14:22 -04:00
4014ef7714 Bump version to 0.9.32 2021-08-19 12:37:58 -04:00
180f0445ac Properly handle exceptions getting VM stats 2021-08-19 12:36:31 -04:00
Joshua Boniface
074664d4c1 Fix image dimensions and size 2021-08-18 19:51:55 -04:00
Joshua Boniface
418ac23d40 Add screenshots to docs 2021-08-18 19:49:53 -04:00
17 changed files with 129 additions and 1046 deletions

View File

@@ -1 +1 @@
0.9.50
0.9.48

View File

@@ -1,18 +1,5 @@
## PVC Changelog
###### [v0.9.50](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.50)
* [Node Daemon/API/CLI] Adds free memory node selector
* [Node Daemon] Fixes bug sending space-containing detect disk strings
###### [v0.9.49](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.49)
* [Node Daemon] Fixes bugs with OSD stat population on creation
* [Node Daemon/API] Adds additional information to Zookeeper about OSDs
* [Node Daemon] Refactors OSD removal for improved safety
* [Node Daemon/API/CLI] Adds explicit support for replacing and refreshing (reimporting) OSDs
* [API/CLI] Fixes a language inconsistency about "router mode"
###### [v0.9.48](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.48)
* [CLI] Fixes situation where only a single cluster is available

View File

@@ -25,7 +25,7 @@ import yaml
from distutils.util import strtobool as dustrtobool
# Daemon version
version = "0.9.50"
version = "0.9.48"
# API version
API_VERSION = 1.0

View File

@@ -1002,7 +1002,7 @@ class API_VM_Root(Resource):
type: string
node_selector:
type: string
description: The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference
description: The selector used to determine candidate nodes during migration
node_autostart:
type: boolean
description: Whether to autostart the VM when its node returns to ready domain state
@@ -1252,7 +1252,7 @@ class API_VM_Root(Resource):
{"name": "node"},
{
"name": "selector",
"choices": ("mem", "memfree", "vcpus", "load", "vms", "none"),
"choices": ("mem", "vcpus", "load", "vms", "none"),
"helptext": "A valid selector must be specified",
},
{"name": "autostart"},
@@ -1297,15 +1297,13 @@ class API_VM_Root(Resource):
name: selector
type: string
required: false
description: The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference
default: none
description: The selector used to determine candidate nodes during migration
default: mem
enum:
- mem
- memfree
- vcpus
- load
- vms
- none (cluster default)
- in: query
name: autostart
type: boolean
@@ -1399,7 +1397,7 @@ class API_VM_Element(Resource):
{"name": "node"},
{
"name": "selector",
"choices": ("mem", "memfree", "vcpus", "load", "vms", "none"),
"choices": ("mem", "vcpus", "load", "vms", "none"),
"helptext": "A valid selector must be specified",
},
{"name": "autostart"},
@@ -1446,11 +1444,10 @@ class API_VM_Element(Resource):
name: selector
type: string
required: false
description: The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference
default: none
description: The selector used to determine candidate nodes during migration
default: mem
enum:
- mem
- memfree
- vcpus
- load
- vms
@@ -1629,7 +1626,7 @@ class API_VM_Metadata(Resource):
type: string
node_selector:
type: string
description: The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference
description: The selector used to determine candidate nodes during migration
node_autostart:
type: string
description: Whether to autostart the VM when its node returns to ready domain state
@@ -1649,7 +1646,7 @@ class API_VM_Metadata(Resource):
{"name": "limit"},
{
"name": "selector",
"choices": ("mem", "memfree", "vcpus", "load", "vms", "none"),
"choices": ("mem", "vcpus", "load", "vms", "none"),
"helptext": "A valid selector must be specified",
},
{"name": "autostart"},
@@ -1678,14 +1675,12 @@ class API_VM_Metadata(Resource):
name: selector
type: string
required: false
description: The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference
description: The selector used to determine candidate nodes during migration
enum:
- mem
- memfree
- vcpus
- load
- vms
- none (cluster default)
- in: query
name: autostart
type: boolean
@@ -4102,102 +4097,6 @@ class API_Storage_Ceph_OSD_Element(Resource):
"""
return api_helper.ceph_osd_list(osdid)
@RequestParser(
[
{
"name": "device",
"required": True,
"helptext": "A valid device or detect string must be specified.",
},
{
"name": "weight",
"required": True,
"helptext": "An OSD weight must be specified.",
},
{
"name": "yes-i-really-mean-it",
"required": True,
"helptext": "Please confirm that 'yes-i-really-mean-it'.",
},
]
)
@Authenticator
def post(self, osdid, reqargs):
"""
Replace a Ceph OSD in the cluster
Note: This task may take up to 30s to complete and return
---
tags:
- storage / ceph
parameters:
- in: query
name: device
type: string
required: true
description: The block device (e.g. "/dev/sdb", "/dev/disk/by-path/...", etc.) or detect string ("detect:NAME:SIZE:ID") to replace the OSD onto
- in: query
name: weight
type: number
required: true
description: The Ceph CRUSH weight for the replaced OSD
responses:
200:
description: OK
schema:
type: object
id: Message
400:
description: Bad request
schema:
type: object
id: Message
"""
return api_helper.ceph_osd_replace(
osdid,
reqargs.get("device", None),
reqargs.get("weight", None),
)
@RequestParser(
[
{
"name": "device",
"required": True,
"helptext": "A valid device or detect string must be specified.",
},
]
)
@Authenticator
def put(self, osdid, reqargs):
"""
Refresh (reimport) a Ceph OSD in the cluster
Note: This task may take up to 30s to complete and return
---
tags:
- storage / ceph
parameters:
- in: query
name: device
type: string
required: true
description: The block device (e.g. "/dev/sdb", "/dev/disk/by-path/...", etc.) or detect string ("detect:NAME:SIZE:ID") that the OSD should be using
responses:
200:
description: OK
schema:
type: object
id: Message
400:
description: Bad request
schema:
type: object
id: Message
"""
return api_helper.ceph_osd_refresh(
osdid,
reqargs.get("device", None),
)
@RequestParser(
[
{

View File

@@ -224,7 +224,7 @@ def node_domain_state(zkhandler, node):
@ZKConnection(config)
def node_secondary(zkhandler, node):
"""
Take NODE out of primary coordinator mode.
Take NODE out of primary router mode.
"""
retflag, retdata = pvc_node.secondary_node(zkhandler, node)
@@ -240,7 +240,7 @@ def node_secondary(zkhandler, node):
@ZKConnection(config)
def node_primary(zkhandler, node):
"""
Set NODE to primary coordinator mode.
Set NODE to primary router mode.
"""
retflag, retdata = pvc_node.primary_node(zkhandler, node)
@@ -1301,38 +1301,6 @@ def ceph_osd_add(zkhandler, node, device, weight, ext_db_flag=False, ext_db_rati
return output, retcode
@ZKConnection(config)
def ceph_osd_replace(zkhandler, osd_id, device, weight):
"""
Replace a Ceph OSD in the PVC Ceph storage cluster.
"""
retflag, retdata = pvc_ceph.replace_osd(zkhandler, osd_id, device, weight)
if retflag:
retcode = 200
else:
retcode = 400
output = {"message": retdata.replace('"', "'")}
return output, retcode
@ZKConnection(config)
def ceph_osd_refresh(zkhandler, osd_id, device):
"""
Refresh (reimport) a Ceph OSD in the PVC Ceph storage cluster.
"""
retflag, retdata = pvc_ceph.refresh_osd(zkhandler, osd_id, device)
if retflag:
retcode = 200
else:
retcode = 400
output = {"message": retdata.replace('"', "'")}
return output, retcode
@ZKConnection(config)
def ceph_osd_remove(zkhandler, osd_id, force_flag):
"""

View File

@@ -255,46 +255,6 @@ def ceph_osd_add(config, node, device, weight, ext_db_flag, ext_db_ratio):
return retstatus, response.json().get("message", "")
def ceph_osd_replace(config, osdid, device, weight):
"""
Replace an existing Ceph OSD with a new device
API endpoint: POST /api/v1/storage/ceph/osd/{osdid}
API arguments: device={device}, weight={weight}
API schema: {"message":"{data}"}
"""
params = {"device": device, "weight": weight, "yes-i-really-mean-it": "yes"}
response = call_api(config, "post", f"/storage/ceph/osd/{osdid}", params=params)
if response.status_code == 200:
retstatus = True
else:
retstatus = False
return retstatus, response.json().get("message", "")
def ceph_osd_refresh(config, osdid, device):
"""
Refresh (reimport) an existing Ceph OSD with device {device}
API endpoint: PUT /api/v1/storage/ceph/osd/{osdid}
API arguments: device={device}
API schema: {"message":"{data}"}
"""
params = {
"device": device,
}
response = call_api(config, "put", f"/storage/ceph/osd/{osdid}", params=params)
if response.status_code == 200:
retstatus = True
else:
retstatus = False
return retstatus, response.json().get("message", "")
def ceph_osd_remove(config, osdid, force_flag):
"""
Remove Ceph OSD
@@ -410,22 +370,13 @@ def format_list_osd(osd_list):
# If this happens, the node hasn't checked in fully yet, so use some dummy data
if osd_information["stats"]["node"] == "|":
for key in osd_information["stats"].keys():
if (
osd_information["stats"][key] == "|"
or osd_information["stats"][key] is None
):
if osd_information["stats"][key] == "|":
osd_information["stats"][key] = "N/A"
elif osd_information["stats"][key] is None:
osd_information["stats"][key] = "N/A"
for key in osd_information.keys():
if osd_information[key] is None:
osd_information[key] = "N/A"
else:
for key in osd_information["stats"].keys():
if key in ["utilization", "var"] and isinstance(
osd_information["stats"][key], float
):
osd_information["stats"][key] = round(
osd_information["stats"][key], 2
)
except KeyError:
print(
f"Details for OSD {osd_information['id']} missing required keys, skipping."
@@ -498,11 +449,13 @@ def format_list_osd(osd_list):
if _osd_free_length > osd_free_length:
osd_free_length = _osd_free_length
_osd_util_length = len(str(osd_information["stats"]["utilization"])) + 1
osd_util = round(osd_information["stats"]["utilization"], 2)
_osd_util_length = len(str(osd_util)) + 1
if _osd_util_length > osd_util_length:
osd_util_length = _osd_util_length
_osd_var_length = len(str(osd_information["stats"]["var"])) + 1
osd_var = round(osd_information["stats"]["var"], 2)
_osd_var_length = len(str(osd_var)) + 1
if _osd_var_length > osd_var_length:
osd_var_length = _osd_var_length
@@ -652,6 +605,8 @@ def format_list_osd(osd_list):
osd_up_flag, osd_up_colour, osd_in_flag, osd_in_colour = getOutputColoursOSD(
osd_information
)
osd_util = round(osd_information["stats"]["utilization"], 2)
osd_var = round(osd_information["stats"]["var"], 2)
osd_db_device = osd_information["db_device"]
if not osd_db_device:
@@ -714,8 +669,8 @@ def format_list_osd(osd_list):
osd_reweight=osd_information["stats"]["reweight"],
osd_used=osd_information["stats"]["used"],
osd_free=osd_information["stats"]["avail"],
osd_util=osd_information["stats"]["utilization"],
osd_var=osd_information["stats"]["var"],
osd_util=osd_util,
osd_var=osd_var,
osd_wrops=osd_information["stats"]["wr_ops"],
osd_wrdata=osd_information["stats"]["wr_data"],
osd_rdops=osd_information["stats"]["rd_ops"],

View File

@@ -486,7 +486,7 @@ def cli_node():
@cluster_req
def node_secondary(node, wait):
"""
Take NODE out of primary coordinator mode.
Take NODE out of primary router mode.
"""
task_retcode, task_retdata = pvc_provisioner.task_status(config, None)
@@ -539,7 +539,7 @@ def node_secondary(node, wait):
@cluster_req
def node_primary(node, wait):
"""
Put NODE into primary coordinator mode.
Put NODE into primary router mode.
"""
task_retcode, task_retdata = pvc_provisioner.task_status(config, None)
@@ -803,11 +803,11 @@ def cli_vm():
)
@click.option(
"-s",
"--node-selector",
"--selector",
"node_selector",
default="none",
default="mem",
show_default=True,
type=click.Choice(["mem", "memfree", "load", "vcpus", "vms", "none"]),
type=click.Choice(["mem", "load", "vcpus", "vms", "none"]),
help='Method to determine optimal target node during autoselect; "none" will use the default for the cluster.',
)
@click.option(
@@ -857,18 +857,6 @@ def vm_define(
):
"""
Define a new virtual machine from Libvirt XML configuration file VMCONFIG.
The target node selector ("--node-selector"/"-s") can be "none" to use the cluster default, or one of the following values:
* "mem": choose the node with the least provisioned VM memory
* "memfree": choose the node with the most (real) free memory
* "vcpus": choose the node with the least allocated VM vCPUs
* "load": choose the node with the lowest current load average
* "vms": choose the node with the least number of provisioned VMs
For most clusters, "mem" should be sufficient, but others may be used based on the cluster workload and available resources. The following caveats should be considered:
* "mem" looks at the provisioned memory, not the allocated memory; thus, stopped or disabled VMs are counted towards a node's memory for this selector, even though their memory is not actively in use.
* "memfree" looks at the free memory of the node in general, ignoring the amount provisioned to VMs; if any VM's internal memory usage changes, this value would be affected. This might be preferable to "mem" on clusters with very high memory utilization versus total capacity or if many VMs are stopped/disabled.
* "load" looks at the system load of the node in general, ignoring load in any particular VMs; if any VM's CPU usage changes, this value would be affected. This might be preferable on clusters with some very CPU intensive VMs.
"""
# Open the XML file
@@ -910,11 +898,11 @@ def vm_define(
)
@click.option(
"-s",
"--node-selector",
"--selector",
"node_selector",
default=None,
show_default=False,
type=click.Choice(["mem", "memfree", "load", "vcpus", "vms", "none"]),
type=click.Choice(["mem", "load", "vcpus", "vms", "none"]),
help='Method to determine optimal target node during autoselect; "none" will use the default for the cluster.',
)
@click.option(
@@ -954,8 +942,6 @@ def vm_meta(
):
"""
Modify the PVC metadata of existing virtual machine DOMAIN. At least one option to update must be specified. DOMAIN may be a UUID or name.
For details on the "--node-selector"/"-s" values, please see help for the command "pvc vm define".
"""
if (
@@ -3385,74 +3371,6 @@ def ceph_osd_add(node, device, weight, ext_db_flag, ext_db_ratio, confirm_flag):
cleanup(retcode, retmsg)
###############################################################################
# pvc storage osd replace
###############################################################################
@click.command(name="replace", short_help="Replace OSD block device.")
@click.argument("osdid")
@click.argument("device")
@click.option(
"-w",
"--weight",
"weight",
default=1.0,
show_default=True,
help="New weight of the OSD within the CRUSH map.",
)
@click.option(
"-y",
"--yes",
"confirm_flag",
is_flag=True,
default=False,
help="Confirm the removal",
)
@cluster_req
def ceph_osd_replace(osdid, device, weight, confirm_flag):
"""
Replace the block device of an existing OSD with ID OSDID with DEVICE. Use this command to replace a failed or smaller OSD block device with a new one.
DEVICE must be a valid raw block device (e.g. '/dev/sda', '/dev/nvme0n1', '/dev/disk/by-path/...', '/dev/disk/by-id/...') or a "detect" string. Using partitions is not supported. A "detect" string is a string in the form "detect:<NAME>:<HUMAN-SIZE>:<ID>". For details, see 'pvc storage osd add --help'.
The weight of an OSD should reflect the ratio of the OSD to other OSDs in the storage cluster. For details, see 'pvc storage osd add --help'. Note that the current weight must be explicitly specified if it differs from the default.
Existing IDs, external DB devices, etc. of the OSD will be preserved; data will be lost and rebuilt from the remaining healthy OSDs.
"""
if not confirm_flag and not config["unsafe"]:
try:
click.confirm(
"Replace OSD {} with block device {}".format(osdid, device),
prompt_suffix="? ",
abort=True,
)
except Exception:
exit(0)
retcode, retmsg = pvc_ceph.ceph_osd_replace(config, osdid, device, weight)
cleanup(retcode, retmsg)
###############################################################################
# pvc storage osd refresh
###############################################################################
@click.command(name="refresh", short_help="Refresh (reimport) OSD device.")
@click.argument("osdid")
@click.argument("device")
@cluster_req
def ceph_osd_refresh(osdid, device):
"""
Refresh (reimport) the block DEVICE of an existing OSD with ID OSDID. Use this command to reimport a working OSD into a rebuilt/replaced node.
DEVICE must be a valid raw block device (e.g. '/dev/sda', '/dev/nvme0n1', '/dev/disk/by-path/...', '/dev/disk/by-id/...') or a "detect" string. Using partitions is not supported. A "detect" string is a string in the form "detect:<NAME>:<HUMAN-SIZE>:<ID>". For details, see 'pvc storage osd add --help'.
Existing data, IDs, weights, etc. of the OSD will be preserved.
NOTE: If a device had an external DB device, this is not automatically handled at this time. It is best to remove and re-add the OSD instead.
"""
retcode, retmsg = pvc_ceph.ceph_osd_refresh(config, osdid, device)
cleanup(retcode, retmsg)
###############################################################################
# pvc storage osd remove
###############################################################################
@@ -4116,9 +4034,7 @@ def provisioner_template_system_list(limit):
@click.option(
"--node-selector",
"node_selector",
type=click.Choice(
["mem", "memfree", "vcpus", "vms", "load", "none"], case_sensitive=False
),
type=click.Choice(["mem", "vcpus", "vms", "load", "none"], case_sensitive=False),
default="none",
help='Method to determine optimal target node during autoselect; "none" will use the default for the cluster.',
)
@@ -4151,8 +4067,6 @@ def provisioner_template_system_add(
):
"""
Add a new system template NAME to the PVC cluster provisioner.
For details on the possible "--node-selector" values, please see help for the command "pvc vm define".
"""
params = dict()
params["name"] = name
@@ -4212,9 +4126,7 @@ def provisioner_template_system_add(
@click.option(
"--node-selector",
"node_selector",
type=click.Choice(
["mem", "memfree", "vcpus", "vms", "load", "none"], case_sensitive=False
),
type=click.Choice(["mem", "vcpus", "vms", "load", "none"], case_sensitive=False),
help='Method to determine optimal target node during autoselect; "none" will use the default for the cluster.',
)
@click.option(
@@ -4246,8 +4158,6 @@ def provisioner_template_system_modify(
):
"""
Add a new system template NAME to the PVC cluster provisioner.
For details on the possible "--node-selector" values, please see help for the command "pvc vm define".
"""
params = dict()
params["vcpus"] = vcpus
@@ -5962,8 +5872,6 @@ ceph_benchmark.add_command(ceph_benchmark_list)
ceph_osd.add_command(ceph_osd_create_db_vg)
ceph_osd.add_command(ceph_osd_add)
ceph_osd.add_command(ceph_osd_replace)
ceph_osd.add_command(ceph_osd_refresh)
ceph_osd.add_command(ceph_osd_remove)
ceph_osd.add_command(ceph_osd_in)
ceph_osd.add_command(ceph_osd_out)

View File

@@ -2,7 +2,7 @@ from setuptools import setup
setup(
name="pvc",
version="0.9.50",
version="0.9.48",
packages=["pvc", "pvc.cli_lib"],
install_requires=[
"Click",

View File

@@ -236,7 +236,7 @@ def add_osd_db_vg(zkhandler, node, device):
return success, message
# OSD actions use the /cmd/ceph pipe
# OSD addition and removal uses the /cmd/ceph pipe
# These actions must occur on the specific node they reference
def add_osd(zkhandler, node, device, weight, ext_db_flag=False, ext_db_ratio=0.05):
# Verify the target node exists
@@ -288,103 +288,6 @@ def add_osd(zkhandler, node, device, weight, ext_db_flag=False, ext_db_ratio=0.0
return success, message
def replace_osd(zkhandler, osd_id, new_device, weight):
# Get current OSD information
osd_information = getOSDInformation(zkhandler, osd_id)
node = osd_information["node"]
old_device = osd_information["device"]
ext_db_flag = True if osd_information["db_device"] else False
# Verify target block device isn't in use
block_osd = verifyOSDBlock(zkhandler, node, new_device)
if block_osd and block_osd != osd_id:
return (
False,
'ERROR: Block device "{}" on node "{}" is used by OSD "{}"'.format(
new_device, node, block_osd
),
)
# Tell the cluster to create a new OSD for the host
replace_osd_string = "osd_replace {},{},{},{},{},{}".format(
node, osd_id, old_device, new_device, weight, ext_db_flag
)
zkhandler.write([("base.cmd.ceph", replace_osd_string)])
# Wait 1/2 second for the cluster to get the message and start working
time.sleep(0.5)
# Acquire a read lock, so we get the return exclusively
with zkhandler.readlock("base.cmd.ceph"):
try:
result = zkhandler.read("base.cmd.ceph").split()[0]
if result == "success-osd_replace":
message = 'Replaced OSD {} with block device "{}" on node "{}".'.format(
osd_id, new_device, node
)
success = True
else:
message = "ERROR: Failed to replace OSD; check node logs for details."
success = False
except Exception:
message = "ERROR: Command ignored by node."
success = False
# Acquire a write lock to ensure things go smoothly
with zkhandler.writelock("base.cmd.ceph"):
time.sleep(0.5)
zkhandler.write([("base.cmd.ceph", "")])
return success, message
def refresh_osd(zkhandler, osd_id, device):
# Get current OSD information
osd_information = getOSDInformation(zkhandler, osd_id)
node = osd_information["node"]
ext_db_flag = True if osd_information["db_device"] else False
# Verify target block device isn't in use
block_osd = verifyOSDBlock(zkhandler, node, device)
if not block_osd or block_osd != osd_id:
return (
False,
'ERROR: Block device "{}" on node "{}" is not used by OSD "{}"; use replace instead'.format(
device, node, osd_id
),
)
# Tell the cluster to create a new OSD for the host
refresh_osd_string = "osd_refresh {},{},{},{}".format(
node, osd_id, device, ext_db_flag
)
zkhandler.write([("base.cmd.ceph", refresh_osd_string)])
# Wait 1/2 second for the cluster to get the message and start working
time.sleep(0.5)
# Acquire a read lock, so we get the return exclusively
with zkhandler.readlock("base.cmd.ceph"):
try:
result = zkhandler.read("base.cmd.ceph").split()[0]
if result == "success-osd_refresh":
message = (
'Refreshed OSD {} with block device "{}" on node "{}".'.format(
osd_id, device, node
)
)
success = True
else:
message = "ERROR: Failed to refresh OSD; check node logs for details."
success = False
except Exception:
message = "ERROR: Command ignored by node."
success = False
# Acquire a write lock to ensure things go smoothly
with zkhandler.writelock("base.cmd.ceph"):
time.sleep(0.5)
zkhandler.write([("base.cmd.ceph", "")])
return success, message
def remove_osd(zkhandler, osd_id, force_flag):
if not verifyOSD(zkhandler, osd_id):
return False, 'ERROR: No OSD with ID "{}" is present in the cluster.'.format(

View File

@@ -639,8 +639,6 @@ def findTargetNode(zkhandler, dom_uuid):
# Execute the search
if search_field == "mem":
return findTargetNodeMem(zkhandler, node_limit, dom_uuid)
if search_field == "memfree":
return findTargetNodeMemFree(zkhandler, node_limit, dom_uuid)
if search_field == "load":
return findTargetNodeLoad(zkhandler, node_limit, dom_uuid)
if search_field == "vcpus":
@@ -679,7 +677,7 @@ def getNodes(zkhandler, node_limit, dom_uuid):
#
# via provisioned memory
# via free memory (relative to allocated memory)
#
def findTargetNodeMem(zkhandler, node_limit, dom_uuid):
most_provfree = 0
@@ -700,24 +698,6 @@ def findTargetNodeMem(zkhandler, node_limit, dom_uuid):
return target_node
#
# via free memory
#
def findTargetNodeMemFree(zkhandler, node_limit, dom_uuid):
most_memfree = 0
target_node = None
node_list = getNodes(zkhandler, node_limit, dom_uuid)
for node in node_list:
memfree = int(zkhandler.read(("node.memory.free", node)))
if memfree > most_memfree:
most_memfree = memfree
target_node = node
return target_node
#
# via load average
#

View File

@@ -91,7 +91,7 @@ def secondary_node(zkhandler, node):
if daemon_mode == "hypervisor":
return (
False,
'ERROR: Cannot change coordinator mode on non-coordinator node "{}"'.format(
'ERROR: Cannot change router mode on non-coordinator node "{}"'.format(
node
),
)
@@ -104,9 +104,9 @@ def secondary_node(zkhandler, node):
# Get current state
current_state = zkhandler.read(("node.state.router", node))
if current_state == "secondary":
return True, 'Node "{}" is already in secondary coordinator mode.'.format(node)
return True, 'Node "{}" is already in secondary router mode.'.format(node)
retmsg = "Setting node {} in secondary coordinator mode.".format(node)
retmsg = "Setting node {} in secondary router mode.".format(node)
zkhandler.write([("base.config.primary_node", "none")])
return True, retmsg
@@ -124,7 +124,7 @@ def primary_node(zkhandler, node):
if daemon_mode == "hypervisor":
return (
False,
'ERROR: Cannot change coordinator mode on non-coordinator node "{}"'.format(
'ERROR: Cannot change router mode on non-coordinator node "{}"'.format(
node
),
)
@@ -137,9 +137,9 @@ def primary_node(zkhandler, node):
# Get current state
current_state = zkhandler.read(("node.state.router", node))
if current_state == "primary":
return True, 'Node "{}" is already in primary coordinator mode.'.format(node)
return True, 'Node "{}" is already in primary router mode.'.format(node)
retmsg = "Setting node {} in primary coordinator mode.".format(node)
retmsg = "Setting node {} in primary router mode.".format(node)
zkhandler.write([("base.config.primary_node", node)])
return True, retmsg

17
debian/changelog vendored
View File

@@ -1,20 +1,3 @@
pvc (0.9.50-0) unstable; urgency=high
* [Node Daemon/API/CLI] Adds free memory node selector
* [Node Daemon] Fixes bug sending space-containing detect disk strings
-- Joshua M. Boniface <joshua@boniface.me> Wed, 06 Jul 2022 16:01:14 -0400
pvc (0.9.49-0) unstable; urgency=high
* [Node Daemon] Fixes bugs with OSD stat population on creation
* [Node Daemon/API] Adds additional information to Zookeeper about OSDs
* [Node Daemon] Refactors OSD removal for improved safety
* [Node Daemon/API/CLI] Adds explicit support for replacing and refreshing (reimporting) OSDs
* [API/CLI] Fixes a language inconsistency about "router mode"
-- Joshua M. Boniface <joshua@boniface.me> Fri, 06 May 2022 15:49:39 -0400
pvc (0.9.48-0) unstable; urgency=high
* [CLI] Fixes situation where only a single cluster is available

View File

@@ -353,19 +353,7 @@ The password for the PVC node daemon to log in to the IPMI interface.
* *required*
The default selector algorithm to use when migrating VMs away from a node; individual VMs can override this default.
Valid `target_selector` values are:
* `mem`: choose the node with the least provisioned VM memory
* `memfree`: choose the node with the most (real) free memory
* `vcpus`: choose the node with the least allocated VM vCPUs
* `load`: choose the node with the lowest current load average
* `vms`: choose the node with the least number of provisioned VMs
For most clusters, `mem` should be sufficient, but others may be used based on the cluster workload and available resources. The following caveats should be considered:
* `mem` looks at the provisioned memory, not the allocated memory; thus, stopped or disabled VMs are counted towards a node's memory for this selector, even though their memory is not actively in use.
* `memfree` looks at the free memory of the node in general, ignoring the amount provisioned to VMs; if any VM's internal memory usage changes, this value would be affected. This might be preferable to `mem` on clusters with very high memory utilization versus total capacity or if many VMs are stopped/disabled.
* `load` looks at the system load of the node in general, ignoring load in any particular VMs; if any VM's CPU usage changes, this value would be affected. This might be preferable on clusters with some very CPU intensive VMs.
The selector algorithm to use when migrating hosts away from the node. Valid `selector` values are: `mem`: the node with the least allocated VM memory; `vcpus`: the node with the least allocated VM vCPUs; `load`: the node with the least current load average; `vms`: the node with the least number of provisioned VMs.
#### `system` → `configuration` → `directories` → `dynamic_directory`

View File

@@ -192,7 +192,7 @@
"type": "array"
},
"node_selector": {
"description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
"description": "The selector used to determine candidate nodes during migration",
"type": "string"
}
},
@@ -1414,7 +1414,7 @@
"type": "array"
},
"node_selector": {
"description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
"description": "The selector used to determine candidate nodes during migration",
"type": "string"
},
"profile": {
@@ -5094,13 +5094,6 @@
"delete": {
"description": "Note: This task may take up to 30s to complete and return<br/>Warning: This operation may have unintended consequences for the storage cluster; ensure the cluster can support removing the OSD before proceeding",
"parameters": [
{
"description": "Force removal even if some step(s) fail",
"in": "query",
"name": "force",
"required": "flase",
"type": "boolean"
},
{
"description": "A confirmation string to ensure that the API consumer really means it",
"in": "query",
@@ -5148,73 +5141,6 @@
"tags": [
"storage / ceph"
]
},
"post": {
"description": "Note: This task may take up to 30s to complete and return",
"parameters": [
{
"description": "The block device (e.g. \"/dev/sdb\", \"/dev/disk/by-path/...\", etc.) or detect string (\"detect:NAME:SIZE:ID\") to replace the OSD onto",
"in": "query",
"name": "device",
"required": true,
"type": "string"
},
{
"description": "The Ceph CRUSH weight for the replaced OSD",
"in": "query",
"name": "weight",
"required": true,
"type": "number"
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/Message"
}
},
"400": {
"description": "Bad request",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Replace a Ceph OSD in the cluster",
"tags": [
"storage / ceph"
]
},
"put": {
"description": "Note: This task may take up to 30s to complete and return",
"parameters": [
{
"description": "The block device (e.g. \"/dev/sdb\", \"/dev/disk/by-path/...\", etc.) or detect string (\"detect:NAME:SIZE:ID\") that the OSD should be using",
"in": "query",
"name": "device",
"required": true,
"type": "string"
}
],
"responses": {
"200": {
"description": "OK",
"schema": {
"$ref": "#/definitions/Message"
}
},
"400": {
"description": "Bad request",
"schema": {
"$ref": "#/definitions/Message"
}
}
},
"summary": "Refresh (reimport) a Ceph OSD in the cluster",
"tags": [
"storage / ceph"
]
}
},
"/api/v1/storage/ceph/osd/{osdid}/state": {
@@ -6173,15 +6099,13 @@
"type": "string"
},
{
"default": "none",
"description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
"default": "mem",
"description": "The selector used to determine candidate nodes during migration",
"enum": [
"mem",
"memfree",
"vcpus",
"load",
"vms",
"none (cluster default)"
"vms"
],
"in": "query",
"name": "selector",
@@ -6332,11 +6256,10 @@
"type": "string"
},
{
"default": "none",
"description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
"default": "mem",
"description": "The selector used to determine candidate nodes during migration",
"enum": [
"mem",
"memfree",
"vcpus",
"load",
"vms",
@@ -6594,14 +6517,12 @@
"type": "string"
},
{
"description": "The selector used to determine candidate nodes during migration; see 'target_selector' in the node daemon configuration reference",
"description": "The selector used to determine candidate nodes during migration",
"enum": [
"mem",
"memfree",
"vcpus",
"load",
"vms",
"none (cluster default)"
"vms"
],
"in": "query",
"name": "selector",

View File

@@ -122,7 +122,7 @@ pvc:
pass: Passw0rd
# migration: Migration option configuration
migration:
# target_selector: Criteria to select the ideal migration target, options: mem, memfree, load, vcpus, vms
# target_selector: Criteria to select the ideal migration target, options: mem, load, vcpus, vms
target_selector: mem
# configuration: Local system configurations
configuration:

View File

@@ -48,7 +48,7 @@ import re
import json
# Daemon version
version = "0.9.50"
version = "0.9.48"
##########################################################

View File

@@ -21,6 +21,7 @@
import time
import json
import psutil
import daemon_lib.common as common
@@ -216,10 +217,6 @@ class CephOSDInstance(object):
retcode, stdout, stderr = common.run_os_command(
f"ceph-volume lvm list {find_device}"
)
osd_blockdev = None
osd_fsid = None
osd_clusterfsid = None
osd_device = None
for line in stdout.split("\n"):
if "block device" in line:
osd_blockdev = line.split()[-1]
@@ -230,7 +227,7 @@ class CephOSDInstance(object):
if "devices" in line:
osd_device = line.split()[-1]
if not osd_blockdev or not osd_fsid or not osd_clusterfsid or not osd_device:
if not osd_fsid:
self.logger.out(
f"Failed to find updated OSD information via ceph-volume for {find_device}",
state="e",
@@ -407,7 +404,6 @@ class CephOSDInstance(object):
print(stdout)
print(stderr)
raise Exception
time.sleep(0.5)
# 6. Verify it started
@@ -438,7 +434,7 @@ class CephOSDInstance(object):
(("osd.lv", osd_id), osd_lv),
(
("osd.stats", osd_id),
'{"uuid": "|", "up": 0, "in": 0, "primary_affinity": "|", "utilization": "|", "var": "|", "pgs": "|", "kb": "|", "weight": "|", "reweight": "|", "node": "|", "used": "|", "avail": "|", "wr_ops": "|", "wr_data": "|", "rd_ops": "|", "rd_data": "|", "state": "|"}',
f'{"uuid": "|", "up": 0, "in": 0, "primary_affinity": "|", "utilization": "|", "var": "|", "pgs": "|", "kb": "|", "weight": "|", "reweight": "|", "node": "{node}", "used": "|", "avail": "|", "wr_ops": "|", "wr_data": "|", "rd_ops": "|", "rd_data": "|", state="|" }',
),
]
)
@@ -451,381 +447,6 @@ class CephOSDInstance(object):
logger.out("Failed to create new OSD disk: {}".format(e), state="e")
return False
@staticmethod
def replace_osd(
zkhandler,
logger,
node,
osd_id,
old_device,
new_device,
weight,
ext_db_flag=False,
):
# Handle a detect device if that is passed
if match(r"detect:", new_device):
ddevice = get_detect_device(new_device)
if ddevice is None:
logger.out(
f"Failed to determine block device from detect string {new_device}",
state="e",
)
return False
else:
logger.out(
f"Determined block device {ddevice} from detect string {new_device}",
state="i",
)
new_device = ddevice
# We are ready to create a new OSD on this node
logger.out(
"Replacing OSD {} disk with block device {}".format(osd_id, new_device),
state="i",
)
try:
# Verify the OSD is present
retcode, stdout, stderr = common.run_os_command("ceph osd ls")
osd_list = stdout.split("\n")
if osd_id not in osd_list:
logger.out(
"Could not find OSD {} in the cluster".format(osd_id), state="e"
)
return True
# 1. Set the OSD down and out so it will flush
logger.out("Setting down OSD disk with ID {}".format(osd_id), state="i")
retcode, stdout, stderr = common.run_os_command(
"ceph osd down {}".format(osd_id)
)
if retcode:
print("ceph osd down")
print(stdout)
print(stderr)
raise Exception
logger.out("Setting out OSD disk with ID {}".format(osd_id), state="i")
retcode, stdout, stderr = common.run_os_command(
"ceph osd out {}".format(osd_id)
)
if retcode:
print("ceph osd out")
print(stdout)
print(stderr)
raise Exception
# 2. Wait for the OSD to be safe to remove (but don't wait for rebalancing to complete)
logger.out("Waiting for OSD {osd_id} to be safe to remove", state="i")
while True:
retcode, stdout, stderr = common.run_os_command(
f"ceph osd safe-to-destroy osd.{osd_id}"
)
if retcode in [0, 11]:
# Code 0 = success
# Code 11 = "Error EAGAIN: OSD(s) 5 have no reported stats, and not all PGs are active+clean; we cannot draw any conclusions." which means all PGs have been remappped but backfill is still occurring
break
else:
time.sleep(5)
# 3. Stop the OSD process
logger.out("Stopping OSD disk with ID {}".format(osd_id), state="i")
retcode, stdout, stderr = common.run_os_command(
"systemctl stop ceph-osd@{}".format(osd_id)
)
if retcode:
print("systemctl stop")
print(stdout)
print(stderr)
raise Exception
time.sleep(2)
# 4. Destroy the OSD
logger.out("Destroying OSD with ID {osd_id}", state="i")
retcode, stdout, stderr = common.run_os_command(
f"ceph osd destroy {osd_id} --yes-i-really-mean-it"
)
if retcode:
print("ceph osd destroy")
print(stdout)
print(stderr)
raise Exception
# 5. Adjust the weight
logger.out(
"Adjusting weight of OSD disk with ID {} in CRUSH map".format(osd_id),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph osd crush reweight osd.{osdid} {weight}".format(
osdid=osd_id, weight=weight
)
)
if retcode:
print("ceph osd crush reweight")
print(stdout)
print(stderr)
raise Exception
# 6a. Zap the new disk to ensure it is ready to go
logger.out("Zapping disk {}".format(new_device), state="i")
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm zap --destroy {}".format(new_device)
)
if retcode:
print("ceph-volume lvm zap")
print(stdout)
print(stderr)
raise Exception
dev_flags = "--data {}".format(new_device)
# 6b. Prepare the logical volume if ext_db_flag
if ext_db_flag:
db_device = "osd-db/osd-{}".format(osd_id)
dev_flags += " --block.db {}".format(db_device)
else:
db_device = ""
# 6c. Replace the OSD
logger.out(
"Preparing LVM for replaced OSD {} disk on {}".format(
osd_id, new_device
),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm prepare --osd-id {osdid} --bluestore {devices}".format(
osdid=osd_id, devices=dev_flags
)
)
if retcode:
print("ceph-volume lvm prepare")
print(stdout)
print(stderr)
raise Exception
# 7a. Get OSD information
logger.out(
"Getting OSD information for ID {} on {}".format(osd_id, new_device),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm list {device}".format(device=new_device)
)
for line in stdout.split("\n"):
if "block device" in line:
osd_blockdev = line.split()[-1]
if "osd fsid" in line:
osd_fsid = line.split()[-1]
if "cluster fsid" in line:
osd_clusterfsid = line.split()[-1]
if "devices" in line:
osd_device = line.split()[-1]
if not osd_fsid:
print("ceph-volume lvm list")
print("Could not find OSD information in data:")
print(stdout)
print(stderr)
raise Exception
# Split OSD blockdev into VG and LV components
# osd_blockdev = /dev/ceph-<uuid>/osd-block-<uuid>
_, _, osd_vg, osd_lv = osd_blockdev.split("/")
# Reset whatever we were given to Ceph's /dev/xdX naming
if new_device != osd_device:
new_device = osd_device
# 7b. Activate the OSD
logger.out("Activating new OSD disk with ID {}".format(osd_id), state="i")
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm activate --bluestore {osdid} {osdfsid}".format(
osdid=osd_id, osdfsid=osd_fsid
)
)
if retcode:
print("ceph-volume lvm activate")
print(stdout)
print(stderr)
raise Exception
time.sleep(0.5)
# 8. Verify it started
retcode, stdout, stderr = common.run_os_command(
"systemctl status ceph-osd@{osdid}".format(osdid=osd_id)
)
if retcode:
print("systemctl status")
print(stdout)
print(stderr)
raise Exception
# 9. Update Zookeeper information
logger.out(
"Adding new OSD disk with ID {} to Zookeeper".format(osd_id), state="i"
)
zkhandler.write(
[
(("osd", osd_id), ""),
(("osd.node", osd_id), node),
(("osd.device", osd_id), new_device),
(("osd.db_device", osd_id), db_device),
(("osd.fsid", osd_id), ""),
(("osd.ofsid", osd_id), osd_fsid),
(("osd.cfsid", osd_id), osd_clusterfsid),
(("osd.lvm", osd_id), ""),
(("osd.vg", osd_id), osd_vg),
(("osd.lv", osd_id), osd_lv),
(
("osd.stats", osd_id),
'{"uuid": "|", "up": 0, "in": 0, "primary_affinity": "|", "utilization": "|", "var": "|", "pgs": "|", "kb": "|", "weight": "|", "reweight": "|", "node": "|", "used": "|", "avail": "|", "wr_ops": "|", "wr_data": "|", "rd_ops": "|", "rd_data": "|", "state": "|"}',
),
]
)
# Log it
logger.out(
"Replaced OSD {} disk with device {}".format(osd_id, new_device),
state="o",
)
return True
except Exception as e:
# Log it
logger.out("Failed to replace OSD {} disk: {}".format(osd_id, e), state="e")
return False
@staticmethod
def refresh_osd(zkhandler, logger, node, osd_id, device, ext_db_flag):
# Handle a detect device if that is passed
if match(r"detect:", device):
ddevice = get_detect_device(device)
if ddevice is None:
logger.out(
f"Failed to determine block device from detect string {device}",
state="e",
)
return False
else:
logger.out(
f"Determined block device {ddevice} from detect string {device}",
state="i",
)
device = ddevice
# We are ready to create a new OSD on this node
logger.out(
"Refreshing OSD {} disk on block device {}".format(osd_id, device),
state="i",
)
try:
# 1. Verify the OSD is present
retcode, stdout, stderr = common.run_os_command("ceph osd ls")
osd_list = stdout.split("\n")
if osd_id not in osd_list:
logger.out(
"Could not find OSD {} in the cluster".format(osd_id), state="e"
)
return True
dev_flags = "--data {}".format(device)
if ext_db_flag:
db_device = "osd-db/osd-{}".format(osd_id)
dev_flags += " --block.db {}".format(db_device)
else:
db_device = ""
# 2. Get OSD information
logger.out(
"Getting OSD information for ID {} on {}".format(osd_id, device),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm list {device}".format(device=device)
)
for line in stdout.split("\n"):
if "block device" in line:
osd_blockdev = line.split()[-1]
if "osd fsid" in line:
osd_fsid = line.split()[-1]
if "cluster fsid" in line:
osd_clusterfsid = line.split()[-1]
if "devices" in line:
osd_device = line.split()[-1]
if not osd_fsid:
print("ceph-volume lvm list")
print("Could not find OSD information in data:")
print(stdout)
print(stderr)
raise Exception
# Split OSD blockdev into VG and LV components
# osd_blockdev = /dev/ceph-<uuid>/osd-block-<uuid>
_, _, osd_vg, osd_lv = osd_blockdev.split("/")
# Reset whatever we were given to Ceph's /dev/xdX naming
if device != osd_device:
device = osd_device
# 3. Activate the OSD
logger.out("Activating new OSD disk with ID {}".format(osd_id), state="i")
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm activate --bluestore {osdid} {osdfsid}".format(
osdid=osd_id, osdfsid=osd_fsid
)
)
if retcode:
print("ceph-volume lvm activate")
print(stdout)
print(stderr)
raise Exception
time.sleep(0.5)
# 4. Verify it started
retcode, stdout, stderr = common.run_os_command(
"systemctl status ceph-osd@{osdid}".format(osdid=osd_id)
)
if retcode:
print("systemctl status")
print(stdout)
print(stderr)
raise Exception
# 5. Update Zookeeper information
logger.out(
"Adding new OSD disk with ID {} to Zookeeper".format(osd_id), state="i"
)
zkhandler.write(
[
(("osd", osd_id), ""),
(("osd.node", osd_id), node),
(("osd.device", osd_id), device),
(("osd.db_device", osd_id), db_device),
(("osd.fsid", osd_id), ""),
(("osd.ofsid", osd_id), osd_fsid),
(("osd.cfsid", osd_id), osd_clusterfsid),
(("osd.lvm", osd_id), ""),
(("osd.vg", osd_id), osd_vg),
(("osd.lv", osd_id), osd_lv),
(
("osd.stats", osd_id),
'{"uuid": "|", "up": 0, "in": 0, "primary_affinity": "|", "utilization": "|", "var": "|", "pgs": "|", "kb": "|", "weight": "|", "reweight": "|", "node": "|", "used": "|", "avail": "|", "wr_ops": "|", "wr_data": "|", "rd_ops": "|", "rd_data": "|", "state": "|"}',
),
]
)
# Log it
logger.out("Refreshed OSD {} disk on {}".format(osd_id, device), state="o")
return True
except Exception as e:
# Log it
logger.out("Failed to refresh OSD {} disk: {}".format(osd_id, e), state="e")
return False
@staticmethod
def remove_osd(zkhandler, logger, osd_id, osd_obj, force_flag):
logger.out("Removing OSD disk {}".format(osd_id), state="i")
@@ -869,16 +490,29 @@ class CephOSDInstance(object):
else:
raise Exception
# 2. Wait for the OSD to be safe to remove (but don't wait for rebalancing to complete)
logger.out("Waiting for OSD {osd_id} to be safe to remove", state="i")
# 2. Wait for the OSD to flush
logger.out("Flushing OSD disk with ID {}".format(osd_id), state="i")
osd_string = str()
while True:
retcode, stdout, stderr = common.run_os_command(
f"ceph osd safe-to-destroy osd.{osd_id}"
)
if int(retcode) in [0, 11]:
try:
retcode, stdout, stderr = common.run_os_command(
"ceph pg dump osds --format json"
)
dump_string = json.loads(stdout)
for osd in dump_string:
if str(osd["osd"]) == osd_id:
osd_string = osd
num_pgs = osd_string["num_pgs"]
if num_pgs > 0:
time.sleep(5)
else:
if force_flag:
logger.out("Ignoring error due to force flag", state="i")
else:
raise Exception
except Exception:
break
else:
time.sleep(5)
# 3. Stop the OSD process and wait for it to be terminated
logger.out("Stopping OSD disk with ID {}".format(osd_id), state="i")
@@ -893,51 +527,62 @@ class CephOSDInstance(object):
logger.out("Ignoring error due to force flag", state="i")
else:
raise Exception
time.sleep(2)
# FIXME: There has to be a better way to do this /shrug
while True:
is_osd_up = False
# Find if there is a process named ceph-osd with arg '--id {id}'
for p in psutil.process_iter(attrs=["name", "cmdline"]):
if "ceph-osd" == p.info["name"] and "--id {}".format(
osd_id
) in " ".join(p.info["cmdline"]):
is_osd_up = True
# If there isn't, continue
if not is_osd_up:
break
# 4. Determine the block devices
osd_vg = zkhandler.read(("osd.vg", osd_id))
osd_lv = zkhandler.read(("osd.lv", osd_id))
osd_lvm = f"/dev/{osd_vg}/{osd_lv}"
osd_device = None
device_zk = zkhandler.read(("osd.device", osd_id))
try:
retcode, stdout, stderr = common.run_os_command(
"readlink /var/lib/ceph/osd/ceph-{}/block".format(osd_id)
)
vg_name = stdout.split("/")[
-2
] # e.g. /dev/ceph-<uuid>/osd-block-<uuid>
retcode, stdout, stderr = common.run_os_command(
"vgs --separator , --noheadings -o pv_name {}".format(vg_name)
)
pv_block = stdout.strip()
except Exception as e:
print(e)
pv_block = device_zk
# 5a. Verify that the blockdev actually has a ceph volume that matches the ID, otherwise don't zap it
logger.out(
f"Getting disk info for OSD {osd_id} LV {osd_lvm}",
f"Check OSD disk {pv_block} for OSD signature with ID osd.{osd_id}",
state="i",
)
retcode, stdout, stderr = common.run_os_command(
f"ceph-volume lvm list {osd_lvm}"
f"ceph-volume lvm list {pv_block}"
)
for line in stdout.split("\n"):
if "devices" in line:
osd_device = line.split()[-1]
if not osd_device:
print("ceph-volume lvm list")
print("Could not find OSD information in data:")
print(stdout)
print(stderr)
if force_flag:
logger.out("Ignoring error due to force flag", state="i")
else:
raise Exception
# 5. Zap the volumes
logger.out(
"Zapping OSD {} disk on {}".format(osd_id, osd_device),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm zap --destroy {}".format(osd_device)
)
if retcode:
print("ceph-volume lvm zap")
print(stdout)
print(stderr)
if force_flag:
logger.out("Ignoring error due to force flag", state="i")
else:
raise Exception
if f"====== osd.{osd_id} =======" in stdout:
# 5b. Zap the volumes
logger.out(
"Zapping OSD disk with ID {} on {}".format(osd_id, pv_block),
state="i",
)
retcode, stdout, stderr = common.run_os_command(
"ceph-volume lvm zap --destroy {}".format(pv_block)
)
if retcode:
print("ceph-volume lvm zap")
print(stdout)
print(stderr)
if force_flag:
logger.out("Ignoring error due to force flag", state="i")
else:
raise Exception
# 6. Purge the OSD from Ceph
logger.out("Purging OSD disk with ID {}".format(osd_id), state="i")
@@ -1239,9 +884,8 @@ class CephSnapshotInstance(object):
# Primary command function
# This command pipe is only used for OSD adds and removes
def ceph_command(zkhandler, logger, this_node, data, d_osd):
# Get the command and args; the * + join ensures arguments with spaces (e.g. detect strings) are recombined right
command, *args = data.split()
args = " ".join(args)
# Get the command and args
command, args = data.split()
# Adding a new OSD
if command == "osd_add":
@@ -1267,59 +911,6 @@ def ceph_command(zkhandler, logger, this_node, data, d_osd):
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)
# Replacing an OSD
if command == "osd_replace":
node, osd_id, old_device, new_device, weight, ext_db_flag = args.split(",")
ext_db_flag = bool(strtobool(ext_db_flag))
if node == this_node.name:
# Lock the command queue
zk_lock = zkhandler.writelock("base.cmd.ceph")
with zk_lock:
# Add the OSD
result = CephOSDInstance.replace_osd(
zkhandler,
logger,
node,
osd_id,
old_device,
new_device,
weight,
ext_db_flag,
)
# Command succeeded
if result:
# Update the command queue
zkhandler.write([("base.cmd.ceph", "success-{}".format(data))])
# Command failed
else:
# Update the command queue
zkhandler.write([("base.cmd.ceph", "failure-{}".format(data))])
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)
# Refreshing an OSD
if command == "osd_refresh":
node, osd_id, device, ext_db_flag = args.split(",")
ext_db_flag = bool(strtobool(ext_db_flag))
if node == this_node.name:
# Lock the command queue
zk_lock = zkhandler.writelock("base.cmd.ceph")
with zk_lock:
# Add the OSD
result = CephOSDInstance.refresh_osd(
zkhandler, logger, node, osd_id, device, ext_db_flag
)
# Command succeeded
if result:
# Update the command queue
zkhandler.write([("base.cmd.ceph", "success-{}".format(data))])
# Command failed
else:
# Update the command queue
zkhandler.write([("base.cmd.ceph", "failure-{}".format(data))])
# Wait 1 seconds before we free the lock, to ensure the client hits the lock
time.sleep(1)
# Removing an OSD
elif command == "osd_remove":
osd_id, force = args.split(",")