Tuesday, July 19, 2022

Persistent crypto device passthrough on s390x KVM

Overview

With v2.22.0 the s390-tools (aka s390utils on RHEL) have an important addition that helps to pass through crypto domains to KVM guests: they allow for persistence and help the user to avoid invalid configurations. (Thanks to Matthew Rosato for explaining the details to me.)

As you might remember from my previous post, crypto device passthrough (with libvirt) consists of three main steps:

  1. Remove the host driver from the device and assign vfio_ap
  2. Start a mediated device of type vfio_ap-passthrough with assigned adapter, usage domain and optionally control domain
  3. Attach the mediated device via its UUID and the <hostdev> element to the KVM domain
For a while now, we have the nice tool mdevctl to help with step 2. above, see for example RHEL 8 official user documentation. Actually, libvirt also integrates with mdevctl, so you can manage your mediated devices via libvirt's nodedev API.

However, if you reboot your LPAR, several configurations need to be persisted in your environment:

  1. Of course, you need the mediated device configuration and the KVM definition to be persisted.
  2. The passthrough driver needs to be loaded; this might depend on the kernel you are using. On RHEL you can configure the kernel module vfio_ap to be automatically loaded at boot as described here. Otherwise, trying to define a device via the nodedev API might just tell you: unsupported configuration: invalid parent device 'ap_matrix'
  3. Finally, the crypto devices' driver assignments need to be persisted.
While 1. is taken care of the libvirt (with mdevctl's help), 2. and 3. are what the mentioned s390-tools provide with release v2.22.0. (Mind that it leverages kernel uevents BINDINGS=complete and COMPLETECOUNT=X that might not be available in older kernel versions, e.g. v4.x.)

As a result, we now can setup workloads in KVM machines that leverage the crypto devices (e.g. hardware accelerated crypto operations), set it up once, set it to start automatically and after we reboot the LPAR (e.g. when we apply important security updates to the kernel), our KVM are restarted automatically.

How to persist crypto device driver assignments

Suppose the vfio_ap kernel module is loaded (v2.22.0's new ap udev rule takes care of this, too) and the mediated device configured as well.
In order to configure the device driver assignment, instead of using sysfs directly, we'll make use of the new type ap on the lszdev and chzdev commands.

Check if your s390-tools version supports ap handling

Simply run

# lszdev --list-types

and confirm that it's listed

...
ap           Cryptographic Adjunct Processor (AP) device
...

Configure apmask and aqmask with chzdev

chzdev is not a new tool. Important for the persistence of device configurations are the flags:

       -a
       --active
           Apply changes to the active configuration only
...

       -p
       --persistent
           Apply changes to persistent configuration only.

where omitting both flags has the same effect as passing them both: any change is applied immediately and will be restored on reboot.

The interesting part is how the apmask and aqmask are configured: With this tool we use a decimal base which can be more convenient. You might remember that each mask represents the adapters resp. domains through an array of 256 bits, where a 0 says 'used by the host' and a 1 'free for use by other driver (vfio_ap)', details see kernel doc.

We've been used to the hexadecimal representation when using the lszcrypt tool or the sysfs directly, e.g.

# lszcrypt

CARD.DOM TYPE  MODE        STATUS     REQUESTS
----------------------------------------------
02       CEX7A Accelerator online            0
02.002b  CEX7A Accelerator online            0
02.0032  CEX7A Accelerator online            0
02.0033  CEX7A Accelerator online            0

So, if we want to pass the domain 02.002b through we have to get their decimal values. For those who are not good at base change and have a bash console you can achieve this - e.g. for 2b - like this:

# echo "obase=10; ibase=16; 2B" | bc
43

(It's important you use the upper case letter.)

Now instead of echoing into the sysfs path, you can simply issue

# chzdev -t ap apmask=-2 aqmask=-43

(remember the "-" means to "take away" from the host)

and confirm via

# lszdev -t ap
DEVICE TYPE ap
  Description        : Cryptographic Adjunct Processor (AP) device
  Modules            : ap
  Active             : yes
  Persistent         : yes

  ATTRIBUTE  ACTIVE         PERSISTENT
  apmask     "0-1,3-255"    "0-1,3-255"
  aqmask     "0-42,44-255"  "0-42,44-255"


The tool will write out the persistent setting in a udev rule making sure the assignment is restored after reboot.

chzdev allows for more sophisticated operations, e.g.

# chzdev -t ap apmask=+2,-4-8 aqmask=+43,-0-42

will return 02.002b to the host and give the ranges 4-8 and 0-42 to the vfio_ap driver

# lszdev -t ap
DEVICE TYPE ap
  Description        : Cryptographic Adjunct Processor (AP) device
  Modules            : ap
  Active             : yes
  Persistent         : yes

  ATTRIBUTE  ACTIVE       PERSISTENT
  apmask     "0-3,9-255"  "0-3,9-255"
  aqmask     "43-255"     "43-255"

This also allows us to easily reset all of our configurations during testing via chzdev -t ap apmask=0-255 aqmask=0-255.

Protection against bad configurations

As a further improvement for users, the tools check the current environment and integrate with mdevctl to help avoid conflicting configurations. For example:

Don't allow for mediated device definitions if another device already uses the same device

With a the following nodedev successfully defined, we can't return the adapter to the host:

# virsh nodedev-dumpxml 
<device>
  <name>mdev_d36d7d0f_cf3d_4fef_bb9c_ed393954996b_matrix</name>
  <path>/sys/devices/vfio_ap/matrix/d36d7d0f-cf3d-4fef-bb9c-ed393954996b</path>
  <parent>ap_matrix</parent>
  <driver>
    <name>vfio_ap_mdev</name>
  </driver>
  <capability type='mdev'>
    <type id='vfio_ap-passthrough'/>
    <uuid>d36d7d0f-cf3d-4fef-bb9c-ed393954996b</uuid>
    <parent_addr>matrix</parent_addr>
    <iommuGroup number='1'/>
    <attr name='assign_adapter' value='0x02'/>
    <attr name='assign_domain' value='0x002b'/>
  </capability>
</device>

# chzdev -t ap apmask=+2
chzdev: apmask conflicts with mdev d36d7d0f-cf3d-4fef-bb9c-ed393954996b APQN 2.43
chzdev: persistent apmask conflicts with defined autostart mdev d36d7d0f-cf3d-4fef-bb9c-ed393954996b APQN 2.43
ap device type configure failed
    Error: Invalid configuration


If interested you can check out further validations in ap_check's source code here.