Compare commits

...

45 Commits

Author SHA1 Message Date
55f397a347 Fix bad location of config sets 2021-10-12 17:23:04 -04:00
dfebb2d3e5 Also validate on failures 2021-10-12 17:11:03 -04:00
e88147db4a Bump version to 0.9.42 2021-10-12 15:25:42 -04:00
b8204d89ac Go back to passing if exception
Validation already happened and the set happens again later.
2021-10-12 14:21:52 -04:00
fe73dfbdc9 Use current live value for bridge_mtu
This will ensure that upgrading without the bridge_mtu config key set
will keep things as they are.
2021-10-12 12:24:03 -04:00
8f906c1f81 Use power off in fence instead of reset
Use a power off (and then make the power on a requirement) during a node
fence. Removes some potential ambiguity in the power state, since we
will know for certain if it is off.
2021-10-12 11:04:27 -04:00
2d9fb9688d Validate network MTU after initial read 2021-10-12 10:53:17 -04:00
fb84685c2a Make cluster example images clickable 2021-10-12 03:15:04 -04:00
032ba44d9c Mention fencing only in run state 2021-10-12 03:05:01 -04:00
b7761877e7 Adjust more wording and fix typos 2021-10-12 03:00:21 -04:00
1fe07640b3 Adjust some wording 2021-10-12 02:54:16 -04:00
b8d843ebe4 Remove codeql setup
I don't use this for anything useful, so disable it since a run takes
ages.
2021-10-12 02:51:19 -04:00
95d983ddff Fix formatting of subsection 2021-10-12 02:49:40 -04:00
4c5da1b6a8 Add reference to Ansible manual 2021-10-12 02:48:47 -04:00
be6b1e02e3 Fix spelling errors 2021-10-12 02:47:31 -04:00
ec2a72ed4b Fix link to cluster architecture docs 2021-10-12 02:41:22 -04:00
b06e327add Adjust getting started docs
Update the docs with the current information on setting up a cluster,
including simplifying the Ansible configuration to use the new
create-local-repo.sh script, and simplifying some other sections.
2021-10-12 02:39:25 -04:00
d1f32d2b9c Default to removing build artifacts in b-a-d.sh 2021-10-11 16:41:00 -04:00
3f78ca1cc9 Add explicit 3 second timeout to requests 2021-10-11 16:31:18 -04:00
e866335918 Add version function support to CLI 2021-10-11 15:34:41 -04:00
221494ed1b Add new configs for Ansible 2021-10-11 14:44:18 -04:00
f13cc04b89 Bump version to 0.9.41 2021-10-09 19:39:21 -04:00
4ed537ee3b Add bridge_mtu config to docs 2021-10-09 19:28:50 -04:00
95e01f38d5 Adjust log type of object setup message 2021-10-09 19:23:12 -04:00
3122d73bf5 Avoid duplicate runs of MTU set
It wasn't the validator duplicating, but the update duplicating, so
avoid that happening properly this time.
2021-10-09 19:21:47 -04:00
7ed8ef179c Revert "Avoid duplicate runs of MTU validator"
This reverts commit 56021c443a.
2021-10-09 19:11:42 -04:00
caead02b2a Set all log messages to information state
None of these were "success" messages and thus shouldn't have been ok
state.
2021-10-09 19:09:38 -04:00
87bc5f93e6 Avoid duplicate runs of MTU validator 2021-10-09 19:07:41 -04:00
203893559e Use correct isinstance instead of type 2021-10-09 19:03:31 -04:00
2c51bb0705 Move MTU validation to function
Prevents code duplication and ensures validation runs when an MTU is
updated, not just on network creation.
2021-10-09 19:01:45 -04:00
46d3daf686 Add logger message when setting MTU 2021-10-09 18:56:18 -04:00
e9d05aa24e Ensure vx_mtu is always an int() 2021-10-09 18:52:50 -04:00
d2c18d7b46 Fix bad header length in network list 2021-10-09 18:50:32 -04:00
6ce28c43af Add MTU value checking and log messages
Ensures that if a specified MTU is more than the maximum it is set to
the maximum instead, and adds warning messages for both situations.
2021-10-09 18:48:56 -04:00
87cda72ca9 Fix invalid schema key
Addresses #144
2021-10-09 18:42:33 -04:00
8f71a6d2f6 Add MTU support to network add/modify commands
Addresses #144
2021-10-09 18:06:21 -04:00
c45f8f5bd5 Have VXNetworkInstance set MTU if unset
Makes this explicit in Zookeeper if a network is unset, post-migration
(schema version 6).

Addresses #144
2021-10-09 17:52:57 -04:00
24de0f4189 Add MTU to network creation/modification
Addresses #144
2021-10-09 17:51:32 -04:00
3690a2c1e0 Fix migration bugs and invalid vx_mtu
Addresses #144
2021-10-09 17:35:10 -04:00
50d8aa0586 Add handlers for client network MTUs
Refactors some of the code in VXNetworkInterface to handle MTUs in a
more streamlined fashion. Also fixes a bug whereby bridge client
networks were being explicitly given the cluster dev MTU which might not
be correct. Now adds support for this option explicitly in the configs,
and defaults to 1500 for safety (the standard Ethernet MTU).

Addresses #144
2021-10-09 17:02:27 -04:00
db6e65712d Make n-1 values clearer 2021-10-07 18:11:15 -04:00
cf8e16543c Correct levels in TOC 2021-10-07 18:08:28 -04:00
1a4fcdcc2d Correct spelling errors 2021-10-07 18:07:06 -04:00
9a71db0800 Add documentation sections on IPMI and fencing 2021-10-07 18:05:47 -04:00
6ee4c55071 Correct flawed conditional in verify_ipmi 2021-10-07 15:11:19 -04:00
25 changed files with 489 additions and 220 deletions

View File

@ -1,68 +0,0 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
# ******** NOTE ********
name: "CodeQL"
on:
push:
branches: [ master ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
schedule:
- cron: '17 22 * * 2'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
# Learn more...
# https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#overriding-automatic-language-detection
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

View File

@ -1 +1 @@
0.9.40
0.9.42

View File

@ -1,5 +1,17 @@
## PVC Changelog
###### [v0.9.42](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.42)
* [Documentation] Reworks and updates various documentation sections
* [Node Daemon] Adjusts the fencing process to use a power off rather than a power reset for maximum certainty
* [Node Daemon] Ensures that MTU values are validated during the first read too
* [Node Daemon] Corrects the loading of the bridge_mtu value to use the current active setting rather than a fixed default to prevent unintended surprises
###### [v0.9.41](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.41)
* Fixes a bad conditional check in IPMI verification
* Implements per-network MTU configuration; NOTE: Requires new keys in pvcnoded.yaml (`bridge_mtu`) and Ansible group_vars (`pvc_bridge_mtu`)
###### [v0.9.40](https://github.com/parallelvirtualcluster/pvc/releases/tag/v0.9.40)
* [Docs] Documentation updates for new Changelog file

View File

@ -25,7 +25,7 @@ import yaml
from distutils.util import strtobool as dustrtobool
# Daemon version
version = '0.9.40'
version = '0.9.42'
# API version
API_VERSION = 1.0

View File

@ -2109,6 +2109,9 @@ class API_Network_Root(Resource):
enum:
- managed
- bridged
mtu:
type: integer
description: The MTU of the network, if set; empty otherwise
domain:
type: string
description: The DNS domain of the network ("managed" networks only)
@ -2169,6 +2172,7 @@ class API_Network_Root(Resource):
{'name': 'vni', 'required': True},
{'name': 'description', 'required': True},
{'name': 'nettype', 'choices': ('managed', 'bridged'), 'helptext': 'A valid nettype must be specified', 'required': True},
{'name': 'mtu'},
{'name': 'domain'},
{'name': 'name_servers'},
{'name': 'ip4_network'},
@ -2205,6 +2209,10 @@ class API_Network_Root(Resource):
enum:
- managed
- bridged
- in: query
name: mtu
type: integer
description: The MTU of the network; defaults to the underlying interface MTU if not set
- in: query
name: domain
type: string
@ -2261,6 +2269,7 @@ class API_Network_Root(Resource):
reqargs.get('vni', None),
reqargs.get('description', None),
reqargs.get('nettype', None),
reqargs.get('mtu', ''),
reqargs.get('domain', None),
name_servers,
reqargs.get('ip4_network', None),
@ -2301,6 +2310,7 @@ class API_Network_Element(Resource):
@RequestParser([
{'name': 'description', 'required': True},
{'name': 'nettype', 'choices': ('managed', 'bridged'), 'helptext': 'A valid nettype must be specified', 'required': True},
{'name': 'mtu'},
{'name': 'domain'},
{'name': 'name_servers'},
{'name': 'ip4_network'},
@ -2332,6 +2342,10 @@ class API_Network_Element(Resource):
enum:
- managed
- bridged
- in: query
name: mtu
type: integer
description: The MTU of the network; defaults to the underlying interface MTU if not set
- in: query
name: domain
type: string
@ -2388,6 +2402,7 @@ class API_Network_Element(Resource):
reqargs.get('vni', None),
reqargs.get('description', None),
reqargs.get('nettype', None),
reqargs.get('mtu', ''),
reqargs.get('domain', None),
name_servers,
reqargs.get('ip4_network', None),
@ -2401,6 +2416,7 @@ class API_Network_Element(Resource):
@RequestParser([
{'name': 'description'},
{'name': 'mtu'},
{'name': 'domain'},
{'name': 'name_servers'},
{'name': 'ip4_network'},
@ -2424,6 +2440,10 @@ class API_Network_Element(Resource):
name: description
type: string
description: The description of the network
- in: query
name: mtu
type: integer
description: The MTU of the network
- in: query
name: domain
type: string
@ -2484,6 +2504,7 @@ class API_Network_Element(Resource):
return api_helper.net_modify(
vni,
reqargs.get('description', None),
reqargs.get('mtu', None),
reqargs.get('domain', None),
name_servers,
reqargs.get('ip4_network', None),

View File

@ -927,7 +927,7 @@ def net_list(zkhandler, limit=None, is_fuzzy=True):
@ZKConnection(config)
def net_add(zkhandler, vni, description, nettype, domain, name_servers,
def net_add(zkhandler, vni, description, nettype, mtu, domain, name_servers,
ip4_network, ip4_gateway, ip6_network, ip6_gateway,
dhcp4_flag, dhcp4_start, dhcp4_end):
"""
@ -935,7 +935,7 @@ def net_add(zkhandler, vni, description, nettype, domain, name_servers,
"""
if dhcp4_flag:
dhcp4_flag = bool(strtobool(dhcp4_flag))
retflag, retdata = pvc_network.add_network(zkhandler, vni, description, nettype, domain, name_servers,
retflag, retdata = pvc_network.add_network(zkhandler, vni, description, nettype, mtu, domain, name_servers,
ip4_network, ip4_gateway, ip6_network, ip6_gateway,
dhcp4_flag, dhcp4_start, dhcp4_end)
@ -951,7 +951,7 @@ def net_add(zkhandler, vni, description, nettype, domain, name_servers,
@ZKConnection(config)
def net_modify(zkhandler, vni, description, domain, name_servers,
def net_modify(zkhandler, vni, description, mtu, domain, name_servers,
ip4_network, ip4_gateway,
ip6_network, ip6_gateway,
dhcp4_flag, dhcp4_start, dhcp4_end):
@ -960,7 +960,7 @@ def net_modify(zkhandler, vni, description, domain, name_servers,
"""
if dhcp4_flag is not None:
dhcp4_flag = bool(strtobool(dhcp4_flag))
retflag, retdata = pvc_network.modify_network(zkhandler, vni, description, domain, name_servers,
retflag, retdata = pvc_network.modify_network(zkhandler, vni, description, mtu, domain, name_servers,
ip4_network, ip4_gateway, ip6_network, ip6_gateway,
dhcp4_flag, dhcp4_start, dhcp4_end)

View File

@ -12,6 +12,18 @@ else
SUDO="sudo"
fi
KEEP_ARTIFACTS=""
if [[ -n ${1} ]]; then
for arg in ${@}; do
case ${arg} in
-k|--keep)
KEEP_ARTIFACTS="y"
shift
;;
esac
done
fi
echo -n "> Linting code for errors... "
./lint || exit
@ -48,3 +60,6 @@ for HOST in ${HOSTS[@]}; do
sleep 30
echo "done."
done
if [[ -z ${KEEP_ARTIFACTS} ]]; then
rm ../pvc*_${version}*
fi

View File

@ -124,6 +124,9 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
request_uri
)
# Default timeout is 3 seconds
timeout = 3
# Craft the authentication header if required
if config['api_key']:
headers['X-Api-Key'] = config['api_key']
@ -134,6 +137,7 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
if operation == 'get':
response = requests.get(
uri,
timeout=timeout,
headers=headers,
params=params,
data=data,
@ -142,6 +146,7 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
if operation == 'post':
response = requests.post(
uri,
timeout=timeout,
headers=headers,
params=params,
data=data,
@ -151,6 +156,7 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
if operation == 'put':
response = requests.put(
uri,
timeout=timeout,
headers=headers,
params=params,
data=data,
@ -160,6 +166,7 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
if operation == 'patch':
response = requests.patch(
uri,
timeout=timeout,
headers=headers,
params=params,
data=data,
@ -168,6 +175,7 @@ def call_api(config, operation, request_uri, headers={}, params=None, data=None,
if operation == 'delete':
response = requests.delete(
uri,
timeout=timeout,
headers=headers,
params=params,
data=data,

View File

@ -100,7 +100,7 @@ def net_list(config, limit):
return False, response.json().get('message', '')
def net_add(config, vni, description, nettype, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp4_flag, dhcp4_start, dhcp4_end):
def net_add(config, vni, description, nettype, mtu, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp4_flag, dhcp4_start, dhcp4_end):
"""
Add new network
@ -112,6 +112,7 @@ def net_add(config, vni, description, nettype, domain, name_servers, ip4_network
'vni': vni,
'description': description,
'nettype': nettype,
'mtu': mtu,
'domain': domain,
'name_servers': name_servers,
'ip4_network': ip4_network,
@ -132,7 +133,7 @@ def net_add(config, vni, description, nettype, domain, name_servers, ip4_network
return retstatus, response.json().get('message', '')
def net_modify(config, net, description, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp4_flag, dhcp4_start, dhcp4_end):
def net_modify(config, net, description, mtu, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp4_flag, dhcp4_start, dhcp4_end):
"""
Modify a network
@ -143,6 +144,8 @@ def net_modify(config, net, description, domain, name_servers, ip4_network, ip4_
params = dict()
if description is not None:
params['description'] = description
if mtu is not None:
params['mtu'] = mtu
if domain is not None:
params['domain'] = domain
if name_servers is not None:
@ -519,6 +522,7 @@ def format_info(config, network_information, long_output):
# Basic information
ainformation.append('{}VNI:{} {}'.format(ansiprint.purple(), ansiprint.end(), network_information['vni']))
ainformation.append('{}Type:{} {}'.format(ansiprint.purple(), ansiprint.end(), network_information['type']))
ainformation.append('{}MTU:{} {}'.format(ansiprint.purple(), ansiprint.end(), network_information['mtu']))
ainformation.append('{}Description:{} {}'.format(ansiprint.purple(), ansiprint.end(), network_information['description']))
if network_information['type'] == 'managed':
ainformation.append('{}Domain:{} {}'.format(ansiprint.purple(), ansiprint.end(), network_information['domain']))
@ -575,6 +579,7 @@ def format_list(config, network_list):
net_vni_length = 5
net_description_length = 12
net_nettype_length = 8
net_mtu_length = 4
net_domain_length = 6
net_v6_flag_length = 6
net_dhcp6_flag_length = 7
@ -589,6 +594,10 @@ def format_list(config, network_list):
_net_description_length = len(network_information['description']) + 1
if _net_description_length > net_description_length:
net_description_length = _net_description_length
# mtu column
_net_mtu_length = len(str(network_information['mtu'])) + 1
if _net_mtu_length > net_mtu_length:
net_mtu_length = _net_mtu_length
# domain column
_net_domain_length = len(network_information['domain']) + 1
if _net_domain_length > net_domain_length:
@ -599,14 +608,15 @@ def format_list(config, network_list):
bold=ansiprint.bold(),
end_bold=ansiprint.end(),
networks_header_length=net_vni_length + net_description_length + 1,
config_header_length=net_nettype_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 6,
config_header_length=net_nettype_length + net_mtu_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 7,
networks_header='Networks ' + ''.join(['-' for _ in range(9, net_vni_length + net_description_length)]),
config_header='Config ' + ''.join(['-' for _ in range(7, net_nettype_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 5)]))
config_header='Config ' + ''.join(['-' for _ in range(7, net_nettype_length + net_mtu_length + net_domain_length + net_v6_flag_length + net_dhcp6_flag_length + net_v4_flag_length + net_dhcp4_flag_length + 6)]))
)
network_list_output.append('{bold}\
{net_vni: <{net_vni_length}} \
{net_description: <{net_description_length}} \
{net_nettype: <{net_nettype_length}} \
{net_mtu: <{net_mtu_length}} \
{net_domain: <{net_domain_length}} \
{net_v6_flag: <{net_v6_flag_length}} \
{net_dhcp6_flag: <{net_dhcp6_flag_length}} \
@ -618,6 +628,7 @@ def format_list(config, network_list):
net_vni_length=net_vni_length,
net_description_length=net_description_length,
net_nettype_length=net_nettype_length,
net_mtu_length=net_mtu_length,
net_domain_length=net_domain_length,
net_v6_flag_length=net_v6_flag_length,
net_dhcp6_flag_length=net_dhcp6_flag_length,
@ -626,6 +637,7 @@ def format_list(config, network_list):
net_vni='VNI',
net_description='Description',
net_nettype='Type',
net_mtu='MTU',
net_domain='Domain',
net_v6_flag='IPv6',
net_dhcp6_flag='DHCPv6',
@ -649,6 +661,7 @@ def format_list(config, network_list):
{net_vni: <{net_vni_length}} \
{net_description: <{net_description_length}} \
{net_nettype: <{net_nettype_length}} \
{net_mtu: <{net_mtu_length}} \
{net_domain: <{net_domain_length}} \
{v6_flag_colour}{net_v6_flag: <{net_v6_flag_length}}{colour_off} \
{dhcp6_flag_colour}{net_dhcp6_flag: <{net_dhcp6_flag_length}}{colour_off} \
@ -660,6 +673,7 @@ def format_list(config, network_list):
net_vni_length=net_vni_length,
net_description_length=net_description_length,
net_nettype_length=net_nettype_length,
net_mtu_length=net_mtu_length,
net_domain_length=net_domain_length,
net_v6_flag_length=net_v6_flag_length,
net_dhcp6_flag_length=net_dhcp6_flag_length,
@ -668,6 +682,7 @@ def format_list(config, network_list):
net_vni=network_information['vni'],
net_description=network_information['description'],
net_nettype=network_information['type'],
net_mtu=network_information['mtu'],
net_domain=network_information['domain'],
net_v6_flag=v6_flag,
v6_flag_colour=v6_flag_colour,

View File

@ -51,6 +51,18 @@ default_store_data = {
}
#
# Version function
#
def print_version(ctx, param, value):
if not value or ctx.resilient_parsing:
return
from pkg_resources import get_distribution
version = get_distribution('pvc').version
click.echo(f'Parallel Virtual Cluster version {version}')
ctx.exit()
#
# Data store handling functions
#
@ -1874,6 +1886,11 @@ def cli_network():
type=click.Choice(['managed', 'bridged']),
help='Network type; managed networks control IP addressing; bridged networks are simple vLAN bridges. All subsequent options are unused for bridged networks.'
)
@click.option(
'-m', '--mtu', 'mtu',
default='',
help='MTU of the network interfaces.'
)
@click.option(
'-n', '--domain', 'domain',
default=None,
@ -1924,10 +1941,12 @@ def cli_network():
'vni'
)
@cluster_req
def net_add(vni, description, nettype, domain, ip_network, ip_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end, name_servers):
def net_add(vni, description, nettype, mtu, domain, ip_network, ip_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end, name_servers):
"""
Add a new virtual network with VXLAN identifier VNI.
NOTE: The MTU must be equal to, or less than, the underlying device MTU (either the node 'bridge_mtu' for bridged networks, or the node 'cluster_mtu' minus 50 for managed networks). Is only required if the device MTU should be lower than the underlying physical device MTU for compatibility. If unset, defaults to the underlying device MTU which will be set explcitly when the network is added to the nodes.
Examples:
pvc network add 101 --description my-bridged-net --type bridged
@ -1941,7 +1960,7 @@ def net_add(vni, description, nettype, domain, ip_network, ip_gateway, ip6_netwo
IPv6 is fully supported with --ipnet6 and --gateway6 in addition to or instead of IPv4. PVC will configure DHCPv6 in a semi-managed configuration for the network if set.
"""
retcode, retmsg = pvc_network.net_add(config, vni, description, nettype, domain, name_servers, ip_network, ip_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end)
retcode, retmsg = pvc_network.net_add(config, vni, description, nettype, mtu, domain, name_servers, ip_network, ip_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end)
cleanup(retcode, retmsg)
@ -1954,6 +1973,11 @@ def net_add(vni, description, nettype, domain, ip_network, ip_gateway, ip6_netwo
default=None,
help='Description of the network; must be unique and not contain whitespace.'
)
@click.option(
'-m', '--mtu', 'mtu',
default=None,
help='MTU of the network interfaces.'
)
@click.option(
'-n', '--domain', 'domain',
default=None,
@ -2004,16 +2028,18 @@ def net_add(vni, description, nettype, domain, ip_network, ip_gateway, ip6_netwo
'vni'
)
@cluster_req
def net_modify(vni, description, domain, name_servers, ip6_network, ip6_gateway, ip4_network, ip4_gateway, dhcp_flag, dhcp_start, dhcp_end):
def net_modify(vni, description, mtu, domain, name_servers, ip6_network, ip6_gateway, ip4_network, ip4_gateway, dhcp_flag, dhcp_start, dhcp_end):
"""
Modify details of virtual network VNI. All fields optional; only specified fields will be updated.
NOTE: The MTU must be equal to, or less than, the underlying device MTU (either the node 'bridge_mtu' for bridged networks, or the node 'cluster_mtu' minus 50 for managed networks). Is only required if the device MTU should be lower than the underlying physical device MTU for compatibility. To reset an explicit MTU to the default underlying device MTU, specify '--mtu' with a quoted empty string argument.
Example:
pvc network modify 1001 --gateway 10.1.1.1 --dhcp
"""
retcode, retmsg = pvc_network.net_modify(config, vni, description, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end)
retcode, retmsg = pvc_network.net_modify(config, vni, description, mtu, domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway, dhcp_flag, dhcp_start, dhcp_end)
cleanup(retcode, retmsg)
@ -4779,6 +4805,10 @@ def task_init(confirm_flag, overwrite_flag):
'-u', '--unsafe', '_unsafe', envvar='PVC_UNSAFE', is_flag=True, default=False,
help='Allow unsafe operations without confirmation/"--yes" argument.'
)
@click.option(
'--version', is_flag=True, callback=print_version,
expose_value=False, is_eager=True
)
def cli(_cluster, _debug, _quiet, _unsafe):
"""
Parallel Virtual Cluster CLI management tool

View File

@ -2,7 +2,7 @@ from setuptools import setup
setup(
name='pvc',
version='0.9.40',
version='0.9.42',
packages=['pvc', 'pvc.cli_lib'],
install_requires=[
'Click',

View File

@ -0,0 +1 @@
{"version": "6", "root": "", "base": {"root": "", "schema": "/schema", "schema.version": "/schema/version", "config": "/config", "config.maintenance": "/config/maintenance", "config.primary_node": "/config/primary_node", "config.primary_node.sync_lock": "/config/primary_node/sync_lock", "config.upstream_ip": "/config/upstream_ip", "config.migration_target_selector": "/config/migration_target_selector", "cmd": "/cmd", "cmd.node": "/cmd/nodes", "cmd.domain": "/cmd/domains", "cmd.ceph": "/cmd/ceph", "logs": "/logs", "node": "/nodes", "domain": "/domains", "network": "/networks", "storage": "/ceph", "storage.util": "/ceph/util", "osd": "/ceph/osds", "pool": "/ceph/pools", "volume": "/ceph/volumes", "snapshot": "/ceph/snapshots"}, "logs": {"node": "", "messages": "/messages"}, "node": {"name": "", "keepalive": "/keepalive", "mode": "/daemonmode", "data.active_schema": "/activeschema", "data.latest_schema": "/latestschema", "data.static": "/staticdata", "data.pvc_version": "/pvcversion", "running_domains": "/runningdomains", "count.provisioned_domains": "/domainscount", "count.networks": "/networkscount", "state.daemon": "/daemonstate", "state.router": "/routerstate", "state.domain": "/domainstate", "cpu.load": "/cpuload", "vcpu.allocated": "/vcpualloc", "memory.total": "/memtotal", "memory.used": "/memused", "memory.free": "/memfree", "memory.allocated": "/memalloc", "memory.provisioned": "/memprov", "ipmi.hostname": "/ipmihostname", "ipmi.username": "/ipmiusername", "ipmi.password": "/ipmipassword", "sriov": "/sriov", "sriov.pf": "/sriov/pf", "sriov.vf": "/sriov/vf"}, "sriov_pf": {"phy": "", "mtu": "/mtu", "vfcount": "/vfcount"}, "sriov_vf": {"phy": "", "pf": "/pf", "mtu": "/mtu", "mac": "/mac", "phy_mac": "/phy_mac", "config": "/config", "config.vlan_id": "/config/vlan_id", "config.vlan_qos": "/config/vlan_qos", "config.tx_rate_min": "/config/tx_rate_min", "config.tx_rate_max": "/config/tx_rate_max", "config.spoof_check": "/config/spoof_check", "config.link_state": "/config/link_state", "config.trust": "/config/trust", "config.query_rss": "/config/query_rss", "pci": "/pci", "pci.domain": "/pci/domain", "pci.bus": "/pci/bus", "pci.slot": "/pci/slot", "pci.function": "/pci/function", "used": "/used", "used_by": "/used_by"}, "domain": {"name": "", "xml": "/xml", "state": "/state", "profile": "/profile", "stats": "/stats", "node": "/node", "last_node": "/lastnode", "failed_reason": "/failedreason", "storage.volumes": "/rbdlist", "console.log": "/consolelog", "console.vnc": "/vnc", "meta.autostart": "/node_autostart", "meta.migrate_method": "/migration_method", "meta.node_selector": "/node_selector", "meta.node_limit": "/node_limit", "meta.tags": "/tags", "migrate.sync_lock": "/migrate_sync_lock"}, "tag": {"name": "", "type": "/type", "protected": "/protected"}, "network": {"vni": "", "type": "/nettype", "mtu": "/mtu", "rule": "/firewall_rules", "rule.in": "/firewall_rules/in", "rule.out": "/firewall_rules/out", "nameservers": "/name_servers", "domain": "/domain", "reservation": "/dhcp4_reservations", "lease": "/dhcp4_leases", "ip4.gateway": "/ip4_gateway", "ip4.network": "/ip4_network", "ip4.dhcp": "/dhcp4_flag", "ip4.dhcp_start": "/dhcp4_start", "ip4.dhcp_end": "/dhcp4_end", "ip6.gateway": "/ip6_gateway", "ip6.network": "/ip6_network", "ip6.dhcp": "/dhcp6_flag"}, "reservation": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname"}, "lease": {"mac": "", "ip": "/ipaddr", "hostname": "/hostname", "expiry": "/expiry", "client_id": "/clientid"}, "rule": {"description": "", "rule": "/rule", "order": "/order"}, "osd": {"id": "", "node": "/node", "device": "/device", "db_device": "/db_device", "stats": "/stats"}, "pool": {"name": "", "pgs": "/pgs", "stats": "/stats"}, "volume": {"name": "", "stats": "/stats"}, "snapshot": {"name": "", "stats": "/stats"}}

View File

@ -135,6 +135,7 @@ def getNetworkACLs(zkhandler, vni, _direction):
def getNetworkInformation(zkhandler, vni):
description = zkhandler.read(('network', vni))
nettype = zkhandler.read(('network.type', vni))
mtu = zkhandler.read(('network.mtu', vni))
domain = zkhandler.read(('network.domain', vni))
name_servers = zkhandler.read(('network.nameservers', vni))
ip6_network = zkhandler.read(('network.ip6.network', vni))
@ -151,6 +152,7 @@ def getNetworkInformation(zkhandler, vni):
'vni': int(vni),
'description': description,
'type': nettype,
'mtu': mtu,
'domain': domain,
'name_servers': name_servers.split(','),
'ip6': {
@ -235,7 +237,7 @@ def isValidIP(ipaddr):
#
# Direct functions
#
def add_network(zkhandler, vni, description, nettype,
def add_network(zkhandler, vni, description, nettype, mtu,
domain, name_servers, ip4_network, ip4_gateway, ip6_network, ip6_gateway,
dhcp4_flag, dhcp4_start, dhcp4_end):
# Ensure start and end DHCP ranges are set if the flag is set
@ -267,6 +269,7 @@ def add_network(zkhandler, vni, description, nettype,
result = zkhandler.write([
(('network', vni), description),
(('network.type', vni), nettype),
(('network.mtu', vni), mtu),
(('network.domain', vni), domain),
(('network.nameservers', vni), name_servers),
(('network.ip6.network', vni), ip6_network),
@ -290,13 +293,15 @@ def add_network(zkhandler, vni, description, nettype,
return False, 'ERROR: Failed to add network.'
def modify_network(zkhandler, vni, description=None, domain=None, name_servers=None,
def modify_network(zkhandler, vni, description=None, mtu=None, domain=None, name_servers=None,
ip4_network=None, ip4_gateway=None, ip6_network=None, ip6_gateway=None,
dhcp4_flag=None, dhcp4_start=None, dhcp4_end=None):
# Add the modified parameters to Zookeeper
update_data = list()
if description is not None:
update_data.append((('network', vni), description))
if mtu is not None:
update_data.append((('network.mtu', vni), mtu))
if domain is not None:
update_data.append((('network.domain', vni), domain))
if name_servers is not None:

View File

@ -466,7 +466,7 @@ class ZKHandler(object):
#
class ZKSchema(object):
# Current version
_version = 5
_version = 6
# Root for doing nested keys
_schema_root = ''
@ -595,6 +595,7 @@ class ZKSchema(object):
'network': {
'vni': '', # The root key
'type': '/nettype',
'mtu': '/mtu',
'rule': '/firewall_rules',
'rule.in': '/firewall_rules/in',
'rule.out': '/firewall_rules/out',

16
debian/changelog vendored
View File

@ -1,3 +1,19 @@
pvc (0.9.42-0) unstable; urgency=high
* [Documentation] Reworks and updates various documentation sections
* [Node Daemon] Adjusts the fencing process to use a power off rather than a power reset for maximum certainty
* [Node Daemon] Ensures that MTU values are validated during the first read too
* [Node Daemon] Corrects the loading of the bridge_mtu value to use the current active setting rather than a fixed default to prevent unintended surprises
-- Joshua M. Boniface <joshua@boniface.me> Tue, 12 Oct 2021 13:48:19 -0400
pvc (0.9.41-0) unstable; urgency=high
* Fixes a bad conditional check in IPMI verification
* Implements per-network MTU configuration; NOTE: Requires new keys in pvcnoded.yaml (`bridge_mtu`) and Ansible group_vars (`pvc_bridge_mtu`)
-- Joshua M. Boniface <joshua@boniface.me> Sat, 09 Oct 2021 19:39:21 -0400
pvc (0.9.40-0) unstable; urgency=high
* [Docs] Documentation updates for new Changelog file

View File

@ -2,23 +2,24 @@
- [PVC Cluster Architecture considerations](#pvc-cluster-architecture-considerations)
* [Node Specification](#node-specification)
- [n-1 Redundancy](#n-1-redundancy)
- [CPU](#cpu)
- [Memory](#memory)
- [Disk](#disk)
- [Network](#network)
* [PVC architecture](#pvc-architecture)
- [Operating System](#operating-system)
- [Ceph Storage Layout](#ceph-storage-layout)
- [Networks](#networks)
+ [System Networks](#system-networks)
+ [Client Networks](#client-networks)
* [Advanced Layouts](#advanced-layouts)
- [Coordinators versus Hypervisors](#coordinators-versus-hypervisors)
- [Georedundancy](#georedundancy)
* [Example System Diagrams](#example-system-diagrams)
- [Small 3-node cluster](#small-3-node-cluster)
- [Large 8-node cluster](#large-8-node-cluster)
+ [n-1 Redundancy](#n-1-redundancy)
+ [CPU](#cpu)
+ [Memory](#memory)
+ [Disk](#disk)
+ [Network](#network)
* [PVC architecture](#pvc+architecture)
+ [Operating System](#operating-system)
+ [Ceph Storage Layout](#ceph-storage-layout)
+ [Networks](#networks)
- [System Networks](#system+networks)
- [Client Networks](#client+networks)
+ [Fencing and Recovery](#fencing-and-recovery)
* [Advanced Layouts](#advanced+layouts)
+ [Coordinators versus Hypervisors](#coordinators-versus-hypervisors)
+ [Georedundancy](#georedundancy)
* [Example System Diagrams](#example+system-diagrams)
+ [Small 3-node cluster](#small-3-node-cluster)
+ [Large 8-node cluster](#large-8-node-cluster)
This document contains considerations the administrator should make when preparing for and building a PVC cluster. It is important that prospective PVC administrators read this document *thoroughly* before deploying a cluster to ensure they understand the requirements, caveats, and important details about how PVC operates.
@ -36,11 +37,12 @@ The following table provides recommended minimum specifications for each compone
| System disk (SSD/HDD/USB/SD/eMMC) | 2x 100GB RAID-1 |
| Data disk (SSD only) | 1x 400GB |
| Network interfaces | 2x 10Gbps (LACP LAG) |
| Total CPU cores (healthy) | 24 |
| Total CPU cores (n-1) | 16 |
| Total RAM (healthy) | 96GB |
| Total RAM (n-1) | 64GB |
| Total disk space | 400GB |
| Remote IPMI-over-IP | Available and connected |
| Total CPU cores (3 nodes healthy) | 24 |
| Total CPU cores (3 nodes n-1) | 16 |
| Total RAM (3 nodes healthy) | 96GB |
| Total RAM (3 nodes n-1) | 64GB |
| Total disk space (3 nodes) | 400GB |
For testing, or low-budget homelab applications, some aspects can be further tuned down, however consider the following sections carefully.
@ -156,6 +158,12 @@ Because PVC makes extensive use of cross-node communications, high-throughput an
A minimum of 2 network interfaces is recommended. These should then be combined into a logical aggregate (LAG) using 802.3ad (LACP) to provide redundant links and a boost in available bandwidth. Additional NICs can also be used to separate discrete parts of the networking stack, which will be discussed below.
#### Remote IPMI-over-IP
IPMI provides a method to manage the physical chassis' of nodes from outside of their operating system. Common implementations include Dell iDRAC, HP iLO, Cisco CIMC, and others.
PVC nodes in production deployments should always feature an IPMI-over-IP interface of some kind, which is then reachable either in, or via, the Upstream system network (see [System Networks](#system-networks)). This requirement is discussed in more detail during the [Fencing and Recovery](#fencing-and-recovery) section below.
## PVC Architecture
### Operating System
@ -182,7 +190,7 @@ Non-default values can also be set at pool creation time. For instance, one coul
Replication levels cannot be changed within PVC once a pool is created, however they can be changed via manual Ceph commands on a coordinator should the administrator require this, though discussion of this process is outside of the scope of this documentation. The administrator should carefully consider sizing, failure domains, and performance when first selecting storage devices and creating pools, to ensure the right level of resiliency versus data usage for their use-case and planned cluster size.
## Networks
### Networks
At a minimum, a production PVC cluster should use at least two 10Gbps Ethernet interfaces, connected in an LACP or active-backup bond on one or more switches. On top of this bond, the various cluster networks should be configured as 802.3q vLANs. PVC is be able to support configurations without bonding or 802.1q vLAN support, using multiple physical interfaces and no bridged client networks, but this is strongly discouraged due to the added complexity this introduces; the switches chosen for the cluster should include these requirements as a minimum.
@ -292,6 +300,40 @@ Generally speaking, SR-IOV connections are not recommended unless there is a goo
Future PVC versions may support other client network types, such as direct-routing between VMs.
### Fencing and Recovery
Self-management and self-healing are important components of PVC's design, and to accomplish this, PVC contains automated fencing and recovery functions to handle situations where nodes crash or become unreachable. PVC is then able, if properly configured, to directly power-cycle the failed node, and bring up any VMs that were running on it on the remaining hypervisors. This ensures that, while there might be a few minutes of downtime for VMs, they are recovered as quickly as possible without human intervention.
To operate correctly, these functions require each node in the cluster to have a functional IPMI-over-IP setup with a configured user who is able to perform chassis power commands. This differs depending on the chassis manufacturer and model, and should be tested prior to deploying any production cluster. If IPMI is not configured correctly at node startup, the daemon will warn and disable automatic recovery of the node. The IPMI should be present in the Upstream system network (see [System Networks](#system-networks) above), or in another secured network which is reachable from the Upstream system network, whichever is more convenient for the layout of the networks.
The general process is divided into 3 sections: detecting node failures, fencing nodes, and recovering from fenced nodes. Note that this process only applies to nodes in the `run` "daemon state"; if a node daemon cleanly shuts down (for instance due to a service restart or administrative action), it will not be fenced.
#### Detecting Failed Nodes
Within the PVC configuration, each node has 3 settings which determine the failure detection time. The first is the `keepalive_interval` setting. This is normally set to 5 seconds, and is the interval at which the node daemon of each node sends its keepalives (as well as gathers statistics about running VMs, Ceph components, etc.). This interval should never need to be changed, but is configurable for maximum flexibility in corner cases. During each keepalive, the node updates a specific key in the Zookeeper cluster with the current UNIX timestamp, which determines when the node was last alive. During their own keepalives, the other nodes check their peers' timestamps to confirm if they are updating normally. Note that, due to this happening during the peer keepalives, if all nodes lose contact with the Zookeeper database, they will *not* immediately begin fencing each other, since the keepalives will not complete; they will, however, upon recovery, jump immediately to the next section when they all realize that their last keepalives were over the threshold, and this situation is discussed there.
The second option is the `fence_intervals` setting. This option determines how many keepalive intervals a node can miss before it is marked `dead` and a fencing sequence started. This is normally set to 6 intervals, which combined with the 5 second `keepalive_interval`, gives a total of 30 seconds (+/- up to another 5 second `keepalive_interval` for peers should they not line up) for the node to be without updates before fencing begins.
The third setting is optional, and is best used in situations where the IPMI connectivity of a node is excessively flaky or can be impaired (e.g. georedundant clusters), or where VM uptime is more important than the burden of recovering from a split-brain situation, and is not as extensively tested. This option is `suicide_intervals`, and if set to a non-0 value, is the number of keepalive intervals before a node *itself* determines that it should forcibly power itself off, which should always be equal to or less than the normal `fence_intervals` setting. Naturally, the node must be somewhat functional to do this, and this can go very wrong, so using this option is not normally recommended.
#### Fencing Nodes
Once the cluster, and specifically one node in the cluster, has determined that a given node is `dead` due to a lack of keepalives, the fencing process starts. This spawns a dedicated child thread within the node daemon of the detecting node, which continually monitors the state of the `dead` node and then performs the fence.
During the `dead` process, the failed node has 6 chances, called "saving throws", at `keepalive_interval` second windows, to send another keepalive before it is fenced. This additional, fixed, delay helps ensure that the cluster will gracefully recover from intermittent network failures or loss of Zookeeper contact, by providing nodes up to another 6 keepalive intervals to save themselves once the fence timer actually begins. This bring the total time, with default options, of a node stopping contact to a node being fenced, to between 60 and 65 seconds. This duration is considered by the author an acceptable compromise between speedy recovery and avoiding false positives (and hence larger outages).
Once a node has been marked `dead` and has failed its 6 "saving throws", the fence process triggers an IPMI chassis reset sequence. First, the node is issued an IPMI `chassis power off` command to trigger a cold system shutdown. Next, it waits a fixed 1 second and then checks and logs the current `chassis power state`, and then issues a `chassis power on` signal to start up the node. It then finally waits a fixed 2 seconds, and then checks the current `chassis power status`. Using the results of these 3 commands, PVC is then able to determine with near certainty whether the node has truly been forced offline or not, and it can proceed to the next step.
#### Recovery from Node Fences
Once a node has been fenced, successfully or not, the system waits for one keepalive interval before proceeding.
The cluster then determines what to do based both on the result of the fence (whether the node was determined to have been successfully cold-reset or not) and on two additional configuration values. The first, `successful_fence`, specifies what action to take when the fence was successful, and is either `migrate` (VMs to other nodes), the default, or `None` (no action). The second, `failed_fence`, is an identical choice for when the fence was unsuccessful, and defaults to `None`.
If the fence was successful and `successful_fence` is set to `None`, then no migration takes place and the VMs on the fenced node will remain offline until the node recovers. If instead `successful_fence` is set to the default of `migrate`, the system will then begin migrating (and hence, starting) VMs that were active on the failed node to other nodes in the cluster. During this special `fence-flush` action, any stale RBD locks on the storage volumes are forcibly cleared, and this is considered safe since the fenced node is determined to have successfully been powered off and the VMs thus terminated. Once all VMs are migrated, the fenced node will then be set to a normal `flushed` state, as if it had been cleanly flushed before powering off. If and when the node returns to active, healthy service, either automatically (if the reset cleared the fault condition) or after human intervention, VMs can then migrate back and the cluster can resume normal operation; otherwise the cluster will remain in the degraded state until corrected.
If the fence was unsuccessful and `failed_fence` is set to the default of `None`, no automatic recovery takes place, since the cluster cannot determine that it is safe to do so. This would most commonly occur during network partitions where the `dead` node potentially remains up with VMs running on it, and the cluster is now in a split-brain situation. The `suicide_interval` option mentioned above is provided for this specific situation, and would allow the administrator to set the `failed_fence` action to `migrate` as well, as they could be somewhat confident that the node will have forcibly terminated itself. However due to the inherent potential for danger in this scenario, it is recommended to leave these options at their defaults, and handle such situations manually instead, as well as ensuring proper network design to avoid the potential for such split-brain situations to occur.
## Advanced Layouts
### Coordinators versus Hypervisors
@ -361,12 +403,12 @@ This section provides diagrams of 2 best-practice cluster configurations. These
#### Small 3-node cluster
![Small 3-node cluster](/images/pvc-3-node-cluster.png)
[![Small 3-node cluster](/images/pvc-3-node-cluster.png)](/images/pvc-3-node-cluster.png)
*Above: A diagram of a simple 3-node cluster with all nodes as coordinators. Dual 10 Gbps network interface per node, unified physical networking with collapsed cluster and storage networks.*
#### Large 8-node cluster
![Large 8-node cluster](/images/pvc-8-node-cluster.png)
[![Larger 8-node cluster](/images/pvc-8-node-cluster.png)](/images/pvc-8-node-cluster.png)
*Above: A diagram of a large 8-node cluster with 3 coordinators and 5 hypervisors. Quad 10Gbps network interfaces per node, split physical networking into guest/cluster and storage networks.*

View File

@ -6,41 +6,39 @@ This guide will walk you through setting up a simple 3-node PVC cluster from scr
### Part One - Preparing for bootstrap
0. Read through the [Cluster Architecture documentation](/architecture/cluster). This documentation details the requirements and conventions of a PVC cluster, and is important to understand before proceeding.
0. Read through the [Cluster Architecture documentation](/cluster-architecture). This documentation details the requirements and conventions of a PVC cluster, and is important to understand before proceeding.
0. Download the latest copy of the [`pvc-installer`](https://github.com/parallelvirtualcluster/pvc-installer) and [`pvc-ansible`](https://github.com/parallelvirtualcluster/pvc-ansible) repositories to your local machine.
0. Download the latest copy of the [`pvc-ansible`](https://github.com/parallelvirtualcluster/pvc-ansible) repository to your local machine.
0. In `pvc-ansible`, create an initial `hosts` inventory, using `hosts.default` as a template. You can manage multiple PVC clusters ("sites") from the Ansible repository easily, however for simplicity you can use the simple name `cluster` for your initial site. Define the 3 hostnames you will use under the site group; usually the provided names of `pvchv1`, `pvchv2`, and `pvchv3` are sufficient, though you may use any hostname pattern you wish. It is *very important* that the names all contain a sequential number, however, as this is used by various components.
0. Leverage the `create-local-repo.sh` script in the `pvc-ansible` directory to set up a local cluster configuration directory; follow the instructions the script provides, as all future steps will be done inside your new local configuration directory.
0. In `pvc-ansible`, create an initial set of `group_vars`, using the `group_vars/default` as a template. Inside these group vars are two main files: `base.yml` and `pvc.yml`. These example files are well-documented; read them carefully and specify all required options before proceeding.
0. Create an initial `hosts` inventory, using `hosts.default` in the `pvc-ansible` repo as a template. You can manage multiple PVC clusters ("sites") from the Ansible repository easily, however for simplicity you can use the simple name `cluster` for your initial site. Define the 3 hostnames you will use under the site group; usually the provided names of `pvchv1`, `pvchv2`, and `pvchv3` are sufficient, though you may use any hostname pattern you wish. It is *very important* that the names all contain a sequential number, however, as this is used by various components.
`base.yml` configures the `base` role and some common per-cluster configurations such as an upstream domain, a root password, and a set of administrative users, as well as and most importantly, the basic network configuration of the nodes. Make special note of the various items that must be generated such as passwords; these should all be cluster-unique.
0. Create an initial set of `group_vars` for your cluster at `group_vars/<cluster>`, using the `group_vars/default` in the `pvc-ansible` repo as a template. Inside these group vars are two main files: `base.yml` and `pvc.yml`. These example files are well-documented; read them carefully and specify all required options before proceeding, and reference the [Ansible manual](/manuals/ansible) for more detailed descriptions of the options.
`pvc.yml` configures the `pvc` role, including all the dependent software and PVC itself. Important to note is the `pvc_nodes` list, which contains a list of all the nodes as well as per-node configurations for each. All nodes, both coordinator and not, must be a part of this list.
* `base.yml` configures the `base` role and some common per-cluster configurations such as an upstream domain, a root password, a set of administrative users, various hardware configuration items, as well as and most importantly, the basic network configuration of the nodes. Make special note of the various items that must be generated such as passwords; these should all be cluster-unique.
0. Optionally though strongly recommended, move your new configurations out of the `pvc-ansible` repository. The `.gitignore` file will ignore all non-default data, so it is advisable to move these configurations to a separate, secure, repository or filestore, and symlink to them inside the `pvc-ansible` repository directories. The three important locations to symlink are:
* `hosts`: The main Ansible inventory for the various clusters.
* `group_vars/<cluster_name>`: The `group_vars` for the various clusters.
* `files/<cluster_name>`: Static files, created during the bootstrap Ansible run, for the various clusters.
* `pvc.yml` configures the `pvc` role, including all the dependent software and PVC itself. Important to note is the `pvc_nodes` list, which contains a list of all the nodes as well as per-node configurations for each. All nodes must be a part of this list.
0. In `pvc-installer`, run the `buildiso.sh` script to generate an installer ISO. This script requires `debootstrap`, `isolinux`, and `xorriso` to function. The resulting file will, by default, be named `pvc-installer_<date>.iso` in the current directory. For additional options, use the `-h` flag to show help information for the script.
0. In the `pvc-installer` directory, run the `buildiso.sh` script to generate an installer ISO. This script requires `debootstrap`, `isolinux`, and `xorriso` to function. The resulting file will, by default, be named `pvc-installer_<date>.iso` in the current directory. For additional options, use the `-h` flag to show help information for the script.
### Part Two - Preparing and installing the physical hosts
0. Prepare 3 physical servers with IPMI. These physical servers should have at the least a system disk (a single disk, hardware RAID1, or similar), one or more data (Ceph OSD) disks, and networking/CPU/RAM suitable for the cluster needs. Connect their networking based on the configuration set in the `pvc-ansible` `group_vars/base.yml` file.
0. Prepare 3 physical servers with IPMI. The servers should match the specifications and requirements outlined in the [Cluster Architecture documentation](/cluster-architecture). Connect their networking based on the configuration set in the `base.yml` group vars file for your cluster.
0. Configure the IPMI user specified in the `pvc-ansible` `group_vars/base.yml` file with the required permissions; this user will be used to reboot the host if it fails, so it must be able to control system power state.
0. Load the installer ISO generated in step 6 of the previous section onto a USB stick, or using IPMI virtual media, on the physical servers.
0. Configure IPMI to enable IPMI over LAN. Use the default (all-zero) encryption key; this is needed for fencing to work. Verify that IPMI over LAN is operating by using the following command from a machine able to reach the IPMI interface:
`/usr/bin/ipmitool -I lanplus -H <IPMI_host> -U <user> -P <password> chassis power status`
0. Boot the physical servers off of the installer ISO. Use UEFI mode - if available - for maximum flexibility and longevity.
0. Load the installer ISO generated in step 5 of the previous section onto a USB stick, or using IPMI virtual media, on the physical servers.
0. Follow the prompts from the installer ISO. It will ask for a hostname, the system disk device to use, the initial network interface to configure as well as vLANs and either DHCP or static IP information, and finally either an HTTP URL containing an SSH `authorized_keys` to use for the `deploy` user, or a password for this user if key auth is unavailable.
0. Boot the physical servers off of the installer ISO, in UEFI mode if available for maximum flexibility.
0. Wait for the installer to complete. This may take several minutes.
0. Follow the prompts from the installer ISO. It will ask for a hostname, the system disk device to use, the initial network interface to configure as well as either DHCP or static IP information, and finally either an HTTP URL containing an SSH `authorized_keys` to use for the `deploy` user, or a password for this user if key auth is unavailable.
0. At the end of the install process, follow the prompts carefully; it is usually prudent to pre-see the `/etc/network/interfaces` configuration based on your expected final physical network config (e.g. set up bonding, etc.) before proceeding, especially if you use DHCP, as the bonding configuration applied later could affect the address. The `chroot` is likely unneeded unless you have good reason to edit the system in this way.
0. Wait for the installer to complete. It will provide some next steps at the end, and wait for the administrator to acknowledge via an "Enter" key-press. The node will now reboot into the base PVC system.
0. Make note of the (temporary and insecure!) root password set by the installer; you may need it to troubleshoot the system if it does not come up properly. This will be overwritten later in the setup process.
0. Press "Enter" to reboot the system and confirm it is reachable.
0. Repeat the above steps for all 3 initial nodes. On boot, they will display their configured IP address to be used in the next steps.
@ -48,42 +46,44 @@ This guide will walk you through setting up a simple 3-node PVC cluster from scr
0. Make note of the IP addresses of all 3 initial nodes, and configure DNS, `/etc/hosts`, or Ansible `ansible_host=` hostvars to map these IP addresses to the hostnames set in the Ansible `hosts` and `group_vars` files.
0. Verify connectivity from your administrative host to the 3 initial nodes, including SSH access. Accept their host keys as required before proceeding as Ansible does not like those prompts.
0. Verify connectivity from your administrative host to the 3 initial nodes, including SSH access as the `deploy` user. Accept their host keys as required before proceeding as Ansible does not like those prompts. If you did not configure SSH key auth during the PVC installer process, configure it now, as it greatly simplifies Ansible configuration.
0. Verify your `group_vars` setup from part one, as errors here may require a re-installation and restart of the bootstrap process.
0. Verify your `group_vars` setup from part 1, as errors here may require a re-installation and restart of the bootstrap process.
0. Perform the initial bootstrap. From the `pvc-ansible` repository directory, execute the following `ansible-playbook` command, replacing `<cluster_name>` with the Ansible group name from the `hosts` file. Make special note of the additional `bootstrap=yes` variable, which tells the playbook that this is an initial bootstrap run.
0. Perform the initial bootstrap. From your local configuration repository directory, execute the following `ansible-playbook` command, replacing `<cluster_name>` with the Ansible group name from the `hosts` file. Make special note of the additional `bootstrap=yes` variable, which tells the playbook that this is an initial bootstrap run.
`$ ansible-playbook -v -i hosts pvc.yml -l <cluster_name> -e bootstrap=yes`
**WARNING:** Never rerun this playbook with the `-e bootstrap=yes` option against an active cluster. This will have unintended, disastrous consequences.
**WARNING:** Never run this playbook with the `-e bootstrap=yes` option against an active, already-bootstrapped cluster. This will have **disastrous consequences** including the **loss of all data** in the Ceph system as well as any configured networks, VMs, etc.
0. Wait for the Ansible playbook run to finish. Once completed, the cluster bootstrap will be finished, and all 3 nodes will have rebooted into a working PVC cluster.
0. Wait for the Ansible playbook run to finish. Once completed, the cluster bootstrap will be finished, and all 3 nodes will have rebooted into a working PVC cluster. If any errors occur, carefully evaluate them and re-run the playbook (with `-o bootstrap=yes` - your cluster is not active yet!) as required.
0. Install the CLI client on your administrative host, and add and verify connectivity to the cluster; this will also verify that the API is working. You will need to know the cluster upstream floating IP address here, and if you configured SSL or authentication for the API in your `group_vars`, adjust the first command as needed (see `pvc cluster add -h` for details).
`$ pvc cluster add -a <upstream_floating_ip> mycluster`
0. Download and install the CLI client package (`pvc-client-cli.deb`) on your administrative host, and add and verify connectivity to the cluster; this will also verify that the API is working. You will need to know the cluster upstream floating IP address you configured in the `networks` section of the `base.yml` playbook, and if you configured SSL or authentication for the API in your `group_vars`, adjust the first command as needed (see `pvc cluster add -h` for details). A human-readable description can also be specified, which is useful if you manage multiple clusters and their names become unweildy.
`$ pvc cluster add -a <upstream_floating_ip> -d "My first PVC cluster" mycluster`
`$ pvc -c mycluster node list`
We can also set a default cluster by exporting the `PVC_CLUSTER` environment variable to avoid requiring `-c cluster` with every subsequent command:
You can also set a default cluster by exporting the `PVC_CLUSTER` environment variable to avoid requiring `-c cluster` with every subsequent command:
`$ export PVC_CLUSTER="mycluster"`
**Note:** It is fully possible to administer the cluster from the nodes themselves via SSH should you so choose, to avoid requiring the PVC client on your local machine.
### Part Four - Configuring the Ceph storage cluster
0. Determine the Ceph OSD block devices on each host, via an `ssh` shell. For instance, use `lsblk` or check `/dev/disk/by-path` to show the block devices by their physical SAS/SATA bus location, and obtain the relevant `/dev/sdX` name for each disk you wish to be a Ceph OSD on each host.
0. Determine the Ceph OSD block devices on each host via an `ssh` shell. For instance, use `lsblk` or check `/dev/disk/by-path` to show the block devices by their physical SAS/SATA bus location, and obtain the relevant `/dev/sdX` name for each disk you wish to be a Ceph OSD on each host.
0. Add each OSD device to each host. The general command is:
0. Cofigure an OSD device for each data disk in each host. The general command is:
`$ pvc storage osd add --weight <weight> <node> <device>`
For example, if each node has two data disks, as `/dev/sdb` and `/dev/sdc`, run the commands as follows:
For example, if each node has two data disks, as `/dev/sdb` and `/dev/sdc`, run the commands as follows to add the first disk to each node, then the second disk to each node:
`$ pvc storage osd add --weight 1.0 pvchv1 /dev/sdb`
`$ pvc storage osd add --weight 1.0 pvchv1 /dev/sdc`
`$ pvc storage osd add --weight 1.0 pvchv2 /dev/sdb`
`$ pvc storage osd add --weight 1.0 pvchv2 /dev/sdc`
`$ pvc storage osd add --weight 1.0 pvchv3 /dev/sdb`
`$ pvc storage osd add --weight 1.0 pvchv1 /dev/sdc`
`$ pvc storage osd add --weight 1.0 pvchv2 /dev/sdc`
`$ pvc storage osd add --weight 1.0 pvchv3 /dev/sdc`
**NOTE:** On the CLI, the `--weight` argument is optional, and defaults to `1.0`. In the API, it must be specified explicitly, but the CLI sets a default value. OSD weights determine the relative amount of data which can fit onto each OSD. Under normal circumstances, you would want all OSDs to be of identical size, and hence all should have the same weight. If your OSDs are instead different sizes, the weight should be proportional to the size, e.g. `1.0` for a 100GB disk, `2.0` for a 200GB disk, etc. For more details, see the Ceph documentation.
**NOTE:** On the CLI, the `--weight` argument is optional, and defaults to `1.0`. In the API, it must be specified explicitly, but the CLI sets a default value. OSD weights determine the relative amount of data which can fit onto each OSD. Under normal circumstances, you would want all OSDs to be of identical size, and hence all should have the same weight. If your OSDs are instead different sizes, the weight should be proportional to the size, e.g. `1.0` for a 100GB disk, `2.0` for a 200GB disk, etc. For more details, see the [Cluster Architecture](/cluster-architecture) and Ceph documentation.
**NOTE:** OSD commands wait for the action to complete on the node, and can take some time.
**NOTE:** OSD commands wait for the action to complete on the node, and can take some time (up to 30 seconds).
**NOTE:** You can add OSDs in any order you wish, for instance you can add the first OSD to each node and then add the second to each node, or you can add all nodes' OSDs together at once like the example. This ordering does not affect the cluster in any way.
@ -93,10 +93,14 @@ This guide will walk you through setting up a simple 3-node PVC cluster from scr
0. Create an RBD pool to store VM images on. The general command is:
`$ pvc storage pool add <name> <placement_groups>`
For example, to create a pool named `vms` with 256 placement groups (a good default with 6 OSD disks), run the command as follows:
`$ pvc storage pool add vms 256`
**NOTE:** Ceph placement groups are a complex topic; as a general rule it's easier to grow than shrink, so start small and grow as your cluster grows. The following are some good starting numbers for 3-node clusters, though the Ceph documentation and the [Ceph placement group calculator](https://ceph.com/pgcalc/) are advisable for anything more complex. There is a trade-off between CPU usage and the number of total PGs for all pools in the cluster, with more PGs meaning more CPU usage.
**NOTE:** Ceph placement groups are a complex topic; as a general rule it's easier to grow than shrink, so start small and grow as your cluster grows. The general formula is to calculate the ideal number of PGs is `pgs * maxcopies / osds = ~250`, then round `pgs` down to the closest power of 2; generally, you want as close to 250 PGs per OSD as possible, but no more than 250. With 3-6 OSDs, 256 is a good number, and with 9+ OSDs, 512 is a good number. Ceph will error if the total number exceeds the limit. For more details see the Ceph documentation and the [placement group calculator](https://ceph.com/pgcalc/).
* 3 OSDs total: 128 PGs (1 pool) or 64 PGs (2 or more pools, each)
* 6 OSDs total: 256 PGs (1 pool) or 128 PGs (2 or more pools, each)
* 9+ OSDs total: 256 PGs
For example, to create a pool named `vms` with 256 placement groups, run the command as follows:
`$ pvc storage pool add vms 256`
**NOTE:** As detailed in the [cluster architecture documentation](/cluster-architecture), you can also set a custom replica configuration for each pool if the default of 3 replica copies with 2 minimum copies is not acceptable. See `pvc storage pool add -h` or that document for full details.
@ -105,7 +109,7 @@ This guide will walk you through setting up a simple 3-node PVC cluster from scr
### Part Five - Creating virtual networks
0. Determine a domain name and IPv4, and/or IPv6 network for your first client network, and any other client networks you may wish to create. These networks should never overlap with the cluster networks. For full details on the client network types, see the [cluster architecture documentation](/cluster-architecture).
0. Determine a domain name and IPv4, and/or IPv6 network for your first client network, and any other client networks you may wish to create. These networks must not overlap with the cluster networks. For full details on the client network types, see the [cluster architecture documentation](/cluster-architecture).
0. Create the virtual network. There are many options here, so see `pvc network add -h` for details.

View File

@ -105,6 +105,11 @@ Example configuration:
cluster_group: mycluster
timezone_location: Canada/Eastern
local_domain: upstream.local
recursive_dns_servers:
- 8.8.8.8
- 8.8.4.4
recursive_dns_search_domains:
- "{{ local_domain }}"
username_ipmi_host: "pvc"
passwd_ipmi_host: "MyPassword2019"
@ -184,6 +189,18 @@ The TZ database format name of the local timezone, e.g. `America/Toronto` or `Ca
The domain name of the PVC cluster nodes. This is the domain portion of the FQDN of each node, and should usually be the domain of the `upstream` network.
#### `recursive_dns_servers`
* *optional*
A list of recursive DNS servers to be used by cluster nodes. Defaults to Google Public DNS if unspecified.
#### `recursive_dns_search_domains`
* *optional*
A list of domain names (must explicitly include `local_domain` if desired) to be used for shortname DNS lookups.
#### `username_ipmi_host`
* *optional*
@ -450,6 +467,7 @@ pvc_nodes:
ipmi_password: "{{ passwd_ipmi_host }}"
pvc_bridge_device: bondU
pvc_bridge_mtu: 1500
pvc_sriov_enable: True
pvc_sriov_device:
@ -907,6 +925,12 @@ The IPMI password for the node management controller. Unless a per-host override
The device name of the underlying network interface to be used for "bridged"-type client networks. For each "bridged"-type network, an IEEE 802.3q vLAN and bridge will be created on top of this device to pass these networks. In most cases, using the reflexive `networks['cluster']['raw_device']` or `networks['upstream']['raw_device']` from the Base role is sufficient.
#### `pvc_bridge_mtu`
* *required*
The MTU of the underlying network interface to be used for "bridged"-type client networks. This is the maximum MTU such networks can use.
#### `pvc_sriov_enable`
* *optional*

View File

@ -146,6 +146,7 @@ pvc:
console_log_lines: 1000
networking:
bridge_device: ens4
bridge_mtu: 1500
sriov_enable: True
sriov_device:
- phy: ens1f0
@ -427,6 +428,13 @@ How many lines of VM console logs to keep in the Zookeeper database for each VM.
The network interface device used to create Bridged client network vLANs on. For most clusters, should match the underlying device of the various static networks (e.g. `ens4` or `bond0`), though may also use a separate network interface.
#### `system` → `configuration` → `networking` → `bridge_mtu`
* *optional*
* *requires* `functions``enable_networking`
The network interface MTU for the Bridged client network device. This is the maximum MTU a bridged client network can use.
#### `system` → `configuration` → `networking` → `sriov_enable`
* *optional*, defaults to `False`

View File

@ -364,6 +364,10 @@
},
"type": "object"
},
"mtu": {
"description": "The MTU of the network, if set; empty otherwise",
"type": "integer"
},
"name_servers": {
"description": "The configured DNS nameservers of the network for NS records (\"managed\" networks only)",
"items": {
@ -1765,6 +1769,12 @@
"required": true,
"type": "string"
},
{
"description": "The MTU of the network; defaults to the underlying interface MTU if not set",
"in": "query",
"name": "mtu",
"type": "integer"
},
{
"description": "The DNS domain of the network (\"managed\" networks only)",
"in": "query",
@ -1910,6 +1920,12 @@
"required": true,
"type": "string"
},
{
"description": "The MTU of the network; defaults to the underlying interface MTU if not set",
"in": "query",
"name": "mtu",
"type": "integer"
},
{
"description": "The DNS domain of the network (\"managed\" networks only)",
"in": "query",
@ -1993,6 +2009,12 @@
"name": "description",
"type": "string"
},
{
"description": "The MTU of the network",
"in": "query",
"name": "mtu",
"type": "integer"
},
{
"description": "The DNS domain of the network (\"managed\" networks only)",
"in": "query",

View File

@ -161,6 +161,9 @@ pvc:
networking:
# bridge_device: Underlying device to use for bridged vLAN networks; usually the device of <cluster>
bridge_device: ens4
# bridge_mtu: The MTU of the underlying device used for bridged vLAN networks, and thus the maximum
# MTU of the overlying bridge devices.
bridge_mtu: 1500
# sriov_enable: Enable or disable (default if absent) SR-IOV network support
sriov_enable: False
# sriov_device: Underlying device(s) to use for SR-IOV networks; can be bridge_device or other NIC(s)

View File

@ -48,7 +48,7 @@ import re
import json
# Daemon version
version = '0.9.40'
version = '0.9.42'
##########################################################
@ -316,7 +316,7 @@ def entrypoint():
pvcnoded.util.networking.create_nft_configuration(logger, config)
# Create our object dictionaries
logger.out('Setting up objects', state='i')
logger.out('Setting up objects', state='s')
d_node = dict()
node_list = list()

View File

@ -39,27 +39,37 @@ class VXNetworkInstance(object):
self.cluster_dev = config['cluster_dev']
self.cluster_mtu = config['cluster_mtu']
self.bridge_dev = config['bridge_dev']
self.bridge_mtu = config['bridge_mtu']
self.nettype = self.zkhandler.read(('network.type', self.vni))
if self.nettype == 'bridged':
self.base_nic = 'vlan{}'.format(self.vni)
self.bridge_nic = 'vmbr{}'.format(self.vni)
self.max_mtu = self.bridge_mtu
self.logger.out(
'Creating new bridged network',
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
self.init_bridged()
elif self.nettype == 'managed':
self.base_nic = 'vxlan{}'.format(self.vni)
self.bridge_nic = 'vmbr{}'.format(self.vni)
self.max_mtu = self.cluster_mtu - 50
self.logger.out(
'Creating new managed network',
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
self.init_managed()
else:
self.base_nic = None
self.bridge_nic = None
self.max_mtu = 0
self.logger.out(
'Invalid network type {}'.format(self.nettype),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
pass
@ -68,8 +78,11 @@ class VXNetworkInstance(object):
self.old_description = None
self.description = None
self.vlan_nic = 'vlan{}'.format(self.vni)
self.bridge_nic = 'vmbr{}'.format(self.vni)
try:
self.vx_mtu = self.zkhandler.read(('network.mtu', self.vni))
self.validateNetworkMTU()
except Exception:
self.vx_mtu = None
# Zookeper handlers for changed states
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network', self.vni))
@ -83,6 +96,22 @@ class VXNetworkInstance(object):
self.old_description = self.description
self.description = data.decode('ascii')
# Try block for migration purposes
try:
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.mtu', self.vni))
def watch_network_mtu(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
if data and str(self.vx_mtu) != data.decode('ascii'):
self.vx_mtu = data.decode('ascii')
self.validateNetworkMTU()
self.updateNetworkMTU()
except Exception:
self.validateNetworkMTU()
self.createNetworkBridged()
# Initialize a managed network
@ -102,8 +131,11 @@ class VXNetworkInstance(object):
self.dhcp4_start = self.zkhandler.read(('network.ip4.dhcp_start', self.vni))
self.dhcp4_end = self.zkhandler.read(('network.ip4.dhcp_end', self.vni))
self.vxlan_nic = 'vxlan{}'.format(self.vni)
self.bridge_nic = 'vmbr{}'.format(self.vni)
try:
self.vx_mtu = self.zkhandler.read(('network.mtu', self.vni))
self.validateNetworkMTU()
except Exception:
self.vx_mtu = None
self.nftables_netconf_filename = '{}/networks/{}.nft'.format(self.config['nft_dynamic_directory'], self.vni)
self.firewall_rules = []
@ -138,7 +170,7 @@ add rule inet filter input tcp dport 80 meta iifname {bridgenic} counter accept
# Block traffic into the router from network
add rule inet filter input meta iifname {bridgenic} counter drop
""".format(
vxlannic=self.vxlan_nic,
vxlannic=self.base_nic,
bridgenic=self.bridge_nic
)
@ -147,14 +179,14 @@ add rule inet filter forward ip daddr {netaddr4} counter jump {vxlannic}-in
add rule inet filter forward ip saddr {netaddr4} counter jump {vxlannic}-out
""".format(
netaddr4=self.ip4_network,
vxlannic=self.vxlan_nic,
vxlannic=self.base_nic,
)
self.firewall_rules_v6 = """# Jump from forward chain to this chain when matching net (IPv4)
add rule inet filter forward ip6 daddr {netaddr6} counter jump {vxlannic}-in
add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
""".format(
netaddr6=self.ip6_network,
vxlannic=self.vxlan_nic,
vxlannic=self.base_nic,
)
self.firewall_rules_in = self.zkhandler.children(('network.rule.in', self.vni))
@ -209,6 +241,22 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.stopDHCPServer()
self.startDHCPServer()
# Try block for migration purposes
try:
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.mtu', self.vni))
def watch_network_mtu(data, stat, event=''):
if event and event.type == 'DELETED':
# The key has been deleted after existing before; terminate this watcher
# because this class instance is about to be reaped in Daemon.py
return False
if data and str(self.vx_mtu) != data.decode('ascii'):
self.vx_mtu = data.decode('ascii')
self.validateNetworkMTU()
self.updateNetworkMTU()
except Exception:
self.validateNetworkMTU()
@self.zkhandler.zk_conn.DataWatch(self.zkhandler.schema.path('network.ip6.network', self.vni))
def watch_network_ip6_network(data, stat, event=''):
if event and event.type == 'DELETED':
@ -383,6 +431,66 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
def getvni(self):
return self.vni
def validateNetworkMTU(self):
update_mtu = False
# Explicitly set the MTU to max_mtu if unset (in Zookeeper too assuming the key exists)
if self.vx_mtu == '' or self.vx_mtu is None:
self.logger.out(
'MTU not specified; setting to maximum MTU {} instead'.format(self.max_mtu),
prefix='VNI {}'.format(self.vni),
state='w'
)
self.vx_mtu = self.max_mtu
update_mtu = True
# Set MTU to an integer (if it's not)
if not isinstance(self.vx_mtu, int):
self.vx_mtu = int(self.vx_mtu)
# Ensure the MTU is valid
if self.vx_mtu > self.max_mtu:
self.logger.out(
'MTU {} is larger than maximum MTU {}; setting to maximum MTU instead'.format(self.vx_mtu, self.max_mtu),
prefix='VNI {}'.format(self.vni),
state='w'
)
self.vx_mtu = self.max_mtu
update_mtu = True
if update_mtu:
# Try block for migration purposes
try:
self.zkhandler.write([
(('network.mtu', self.vni), self.vx_mtu)
])
except Exception as e:
self.logger.out(
'Could not update MTU in Zookeeper: {}'.format(e),
prefix='VNI {}'.format(self.vni),
state='w'
)
def updateNetworkMTU(self):
self.logger.out(
'Setting network MTU to {}'.format(self.vx_mtu),
prefix='VNI {}'.format(self.vni),
state='i'
)
# Set MTU of base and bridge NICs
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.base_nic,
self.vx_mtu
)
)
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.bridge_nic,
self.vx_mtu
)
)
def updateDHCPReservations(self, old_reservations_list, new_reservations_list):
for reservation in new_reservations_list:
if reservation not in old_reservations_list:
@ -411,7 +519,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.logger.out(
'Updating firewall rules',
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
ordered_acls_in = {}
ordered_acls_out = {}
@ -458,18 +566,18 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
def createNetworkBridged(self):
self.logger.out(
'Creating bridged vLAN device {} on interface {}'.format(
self.vlan_nic,
self.base_nic,
self.bridge_dev
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
# Create vLAN interface
common.run_os_command(
'ip link add link {} name {} type vlan id {}'.format(
self.bridge_dev,
self.vlan_nic,
self.base_nic,
self.vni
)
)
@ -480,20 +588,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
)
# Set MTU of vLAN and bridge NICs
vx_mtu = self.cluster_mtu
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.vlan_nic,
vx_mtu
)
)
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.bridge_nic,
vx_mtu
)
)
self.updateNetworkMTU()
# Disable tx checksum offload on bridge interface (breaks DHCP on Debian < 9)
common.run_os_command(
@ -513,7 +608,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
common.run_os_command(
'brctl addif {} {}'.format(
self.bridge_nic,
self.vlan_nic
self.base_nic
)
)
@ -524,13 +619,13 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.cluster_dev
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
# Create VXLAN interface
common.run_os_command(
'ip link add {} type vxlan id {} dstport 4789 dev {}'.format(
self.vxlan_nic,
self.base_nic,
self.vni,
self.cluster_dev
)
@ -542,20 +637,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
)
# Set MTU of VXLAN and bridge NICs
vx_mtu = self.cluster_mtu - 50
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.vxlan_nic,
vx_mtu
)
)
common.run_os_command(
'ip link set {} mtu {} up'.format(
self.bridge_nic,
vx_mtu
)
)
self.updateNetworkMTU()
# Disable tx checksum offload on bridge interface (breaks DHCP on Debian < 9)
common.run_os_command(
@ -575,7 +657,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
common.run_os_command(
'brctl addif {} {}'.format(
self.bridge_nic,
self.vxlan_nic
self.base_nic
)
)
@ -600,7 +682,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.createIPAddress(self.ip6_gateway, self.ip6_cidrnetmask, self.bridge_nic)
@ -613,7 +695,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.createIPAddress(self.ip4_gateway, self.ip4_cidrnetmask, self.bridge_nic)
@ -624,7 +706,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
# Recreate the environment we need for dnsmasq
@ -719,7 +801,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.cluster_dev
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.run_os_command(
'ip link set {} down'.format(
@ -728,13 +810,13 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
common.run_os_command(
'ip link set {} down'.format(
self.vlan_nic
self.base_nic
)
)
common.run_os_command(
'brctl delif {} {}'.format(
self.bridge_nic,
self.vlan_nic
self.base_nic
)
)
common.run_os_command(
@ -744,7 +826,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
common.run_os_command(
'ip link delete {}'.format(
self.vlan_nic
self.base_nic
)
)
@ -755,7 +837,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.cluster_dev
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.run_os_command(
'ip link set {} down'.format(
@ -764,13 +846,13 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
common.run_os_command(
'ip link set {} down'.format(
self.vxlan_nic
self.base_nic
)
)
common.run_os_command(
'brctl delif {} {}'.format(
self.bridge_nic,
self.vxlan_nic
self.base_nic
)
)
common.run_os_command(
@ -780,7 +862,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
)
common.run_os_command(
'ip link delete {}'.format(
self.vxlan_nic
self.base_nic
)
)
@ -788,7 +870,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.logger.out(
'Removing firewall rules',
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
try:
@ -815,7 +897,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.removeIPAddress(self.ip6_gateway, self.ip6_cidrnetmask, self.bridge_nic)
@ -827,7 +909,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
common.removeIPAddress(self.ip4_gateway, self.ip4_cidrnetmask, self.bridge_nic)
@ -838,7 +920,7 @@ add rule inet filter forward ip6 saddr {netaddr6} counter jump {vxlannic}-out
self.bridge_nic
),
prefix='VNI {}'.format(self.vni),
state='o'
state='i'
)
# Terminate, then kill
self.dhcp_server_daemon.signal('term')

View File

@ -19,13 +19,17 @@
#
###############################################################################
import daemon_lib.common as common
import os
import subprocess
import yaml
from socket import gethostname
from re import findall
from psutil import cpu_count
from ipaddress import ip_address, ip_network
from json import loads
class MalformedConfigurationError(Exception):
@ -287,10 +291,18 @@ def get_configuration():
'upstream_mtu': o_sysnetwork_upstream.get('mtu', None),
'upstream_dev_ip': o_sysnetwork_upstream.get('address', None),
'bridge_dev': o_sysnetworks.get('bridge_device', None),
'bridge_mtu': o_sysnetworks.get('bridge_mtu', None),
'enable_sriov': o_sysnetworks.get('sriov_enable', False),
'sriov_device': o_sysnetworks.get('sriov_device', list())
}
if config_networks['bridge_mtu'] is None:
# Read the current MTU of bridge_dev and set bridge_mtu to it; avoids weird resets
retcode, stdout, stderr = common.run_os_command(f"ip -json link show dev {config_networks['bridge_dev']}")
current_bridge_mtu = loads(stdout)[0]['mtu']
print(f"Config key bridge_mtu not explicitly set; using live MTU {current_bridge_mtu} from {config_networks['bridge_dev']}")
config_networks['bridge_mtu'] = current_bridge_mtu
config = {**config, **config_networks}
for network_type in ['cluster', 'storage', 'upstream']:

View File

@ -133,23 +133,39 @@ def migrateFromFencedNode(zkhandler, node_name, config, logger):
# Perform an IPMI fence
#
def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger):
# Forcibly reboot the node
ipmi_command_reset = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power reset'.format(
# Power off the node the node
logger.out('Sending power off to dead node', state='i')
ipmi_command_stop = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power off'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_reset_retcode, ipmi_reset_stdout, ipmi_reset_stderr = common.run_os_command(ipmi_command_reset)
ipmi_stop_retcode, ipmi_stop_stdout, ipmi_stop_stderr = common.run_os_command(ipmi_command_stop)
if ipmi_reset_retcode != 0:
logger.out(f'Failed to reboot dead node: {ipmi_reset_stderr}', state='e')
if ipmi_stop_retcode != 0:
logger.out(f'Failed to power off dead node: {ipmi_stop_stderr}', state='e')
time.sleep(1)
# Power on the node (just in case it is offline)
# Check the chassis power state
logger.out('Checking power state of dead node', state='i')
ipmi_command_status = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power status'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status)
if ipmi_status_retcode == 0:
logger.out(f'Current chassis power state is: {ipmi_status_stdout.strip()}', state='i')
else:
logger.out(f'Current chassis power state is: Unknown', state='w')
# Power on the node
logger.out('Sending power on to dead node', state='i')
ipmi_command_start = '/usr/bin/ipmitool -I lanplus -H {} -U {} -P {} chassis power on'.format(
ipmi_hostname, ipmi_user, ipmi_password
)
ipmi_start_retcode, ipmi_start_stdout, ipmi_start_stderr = common.run_os_command(ipmi_command_start)
if ipmi_start_retcode != 0:
logger.out(f'Failed to power on dead node: {ipmi_start_stderr}', state='w')
time.sleep(2)
# Check the chassis power state
@ -159,7 +175,7 @@ def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger):
)
ipmi_status_retcode, ipmi_status_stdout, ipmi_status_stderr = common.run_os_command(ipmi_command_status)
if ipmi_reset_retcode == 0:
if ipmi_stop_retcode == 0:
if ipmi_status_stdout.strip() == "Chassis Power is on":
# We successfully rebooted the node and it is powered on; this is a succeessful fence
logger.out('Successfully rebooted dead node', state='o')
@ -189,7 +205,7 @@ def reboot_via_ipmi(ipmi_hostname, ipmi_user, ipmi_password, logger):
def verify_ipmi(ipmi_hostname, ipmi_user, ipmi_password):
ipmi_command = f'/usr/bin/ipmitool -I lanplus -H {ipmi_hostname} -U {ipmi_user} -P {ipmi_password} chassis power status'
retcode, stdout, stderr = common.run_os_command(ipmi_command, timeout=2)
if retcode == 0 and stdout.strip() != "Chassis Power is on":
if retcode == 0 and stdout.strip() == "Chassis Power is on":
return True
else:
return False