Overview
With v2.22.0 the s390-tools (aka s390utils on RHEL) have an important addition that helps to pass through crypto domains to KVM guests: they allow for persistence and help the user to avoid invalid configurations. (Thanks to Matthew Rosato for explaining the details to me.)
As you might remember from my previous post, crypto device passthrough (with libvirt) consists of three main steps:
- Remove the host driver from the device and assign vfio_ap
- Start a mediated device of type vfio_ap-passthrough with assigned adapter, usage domain and optionally control domain
- Attach the mediated device via its UUID and the <hostdev> element to the KVM domain
However, if you reboot your LPAR, several configurations need to be persisted in your environment:
- Of course, you need the mediated device configuration and the KVM definition to be persisted.
- The passthrough driver needs to be loaded; this might depend on the kernel you are using. On RHEL you can configure the kernel module vfio_ap to be automatically loaded at boot as described here. Otherwise, trying to define a device via the nodedev API might just tell you: unsupported configuration: invalid parent device 'ap_matrix'
- Finally, the crypto devices' driver assignments need to be persisted.
While 1. is taken care of the libvirt (with mdevctl's help), 2. and 3. are what the mentioned s390-tools provide with release v2.22.0. (Mind that it leverages kernel uevents BINDINGS=complete and COMPLETECOUNT=X that might not be available in older kernel versions, e.g. v4.x.)
As a result, we now can setup workloads in KVM machines that leverage the crypto devices (e.g. hardware accelerated crypto operations), set it up once, set it to start automatically and after we reboot the LPAR (e.g. when we apply important security updates to the kernel), our KVM are restarted automatically.
How to persist crypto device driver assignments
Suppose the vfio_ap kernel module is loaded (v2.22.0's new ap udev rule takes care of this, too) and the mediated device configured as well.
In order to configure the device driver assignment, instead of using sysfs directly, we'll make use of the new type ap on the lszdev and chzdev commands.
Check if your s390-tools version supports ap handling
Simply run
# lszdev --list-types
and confirm that it's listed
...
ap Cryptographic Adjunct Processor (AP) device
...
Configure apmask and aqmask with chzdev
chzdev is not a new tool. Important for the persistence of device configurations are the flags:
-a
--active
Apply changes to the active configuration only
...
-p
--persistent
Apply changes to persistent configuration only.
where omitting both flags has the same effect as passing them both: any change is applied immediately and will be restored on reboot.
The interesting part is how the
apmask and
aqmask are configured: With this tool we use a decimal base which can be more convenient. You might remember that each mask represents the adapters resp. domains through an array of 256 bits, where a 0 says 'used by the host' and a 1 'free for use by other driver (vfio_ap)', details see
kernel doc.
We've been used to the hexadecimal representation when using the lszcrypt tool or the sysfs directly, e.g.
# lszcrypt
CARD.DOM TYPE MODE STATUS REQUESTS
----------------------------------------------
02 CEX7A Accelerator online 0
02.002b CEX7A Accelerator online 0
02.0032 CEX7A Accelerator online 0
02.0033 CEX7A Accelerator online 0
So, if we want to pass the domain 02.002b through we have to get their decimal values. For those who are not good at base change and have a bash console you can achieve this - e.g. for 2b - like this:
# echo "obase=10; ibase=16; 2B" | bc
43
(It's important you use the upper case letter.)
Now instead of echoing into the sysfs path, you can simply issue
# chzdev -t ap apmask=-2 aqmask=-43
(remember the "-" means to "take away" from the host)
and confirm via
# lszdev -t ap
DEVICE TYPE ap
Description : Cryptographic Adjunct Processor (AP) device
Modules : ap
Active : yes
Persistent : yes
ATTRIBUTE ACTIVE PERSISTENT
apmask "0-1,3-255" "0-1,3-255"
aqmask "0-42,44-255" "0-42,44-255"
The tool will write out the persistent setting in a udev rule making sure the assignment is restored after reboot.
chzdev allows for more sophisticated operations, e.g.
# chzdev -t ap apmask=+2,-4-8 aqmask=+43,-0-42
will return 02.002b to the host and give the ranges 4-8 and 0-42 to the vfio_ap driver
# lszdev -t ap
DEVICE TYPE ap
Description : Cryptographic Adjunct Processor (AP) device
Modules : ap
Active : yes
Persistent : yes
ATTRIBUTE ACTIVE PERSISTENT
apmask "0-3,9-255" "0-3,9-255"
aqmask "43-255" "43-255"
This also allows us to easily reset all of our configurations during testing via chzdev -t ap apmask=0-255 aqmask=0-255.
Protection against bad configurations
As a further improvement for users, the tools check the current environment and integrate with mdevctl to help avoid conflicting configurations. For example:
Don't allow for mediated device definitions if another device already uses the same device
With a the following nodedev successfully defined, we can't return the adapter to the host:
# virsh nodedev-dumpxml
<device>
<name>mdev_d36d7d0f_cf3d_4fef_bb9c_ed393954996b_matrix</name>
<path>/sys/devices/vfio_ap/matrix/d36d7d0f-cf3d-4fef-bb9c-ed393954996b</path>
<parent>ap_matrix</parent>
<driver>
<name>vfio_ap_mdev</name>
</driver>
<capability type='mdev'>
<type id='vfio_ap-passthrough'/>
<uuid>d36d7d0f-cf3d-4fef-bb9c-ed393954996b</uuid>
<parent_addr>matrix</parent_addr>
<iommuGroup number='1'/>
<attr name='assign_adapter' value='0x02'/>
<attr name='assign_domain' value='0x002b'/>
</capability>
</device>
# chzdev -t ap apmask=+2
chzdev: apmask conflicts with mdev d36d7d0f-cf3d-4fef-bb9c-ed393954996b APQN 2.43
chzdev: persistent apmask conflicts with defined autostart mdev d36d7d0f-cf3d-4fef-bb9c-ed393954996b APQN 2.43
ap device type configure failed
Error: Invalid configuration
If interested you can check out further validations in
ap_check's source code
here.