diff options
Diffstat (limited to 'docs/system')
123 files changed, 13885 insertions, 0 deletions
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst new file mode 100644 index 00000000..6c5b0512 --- /dev/null +++ b/docs/system/arm/aspeed.rst @@ -0,0 +1,238 @@ +Aspeed family boards (``*-bmc``, ``ast2500-evb``, ``ast2600-evb``) +================================================================== + +The QEMU Aspeed machines model BMCs of various OpenPOWER systems and +Aspeed evaluation boards. They are based on different releases of the +Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the +AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600 +with dual cores ARM Cortex-A7 CPUs (1.2GHz). + +The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C, +etc. + +AST2400 SoC based machines : + +- ``palmetto-bmc`` OpenPOWER Palmetto POWER8 BMC +- ``quanta-q71l-bmc`` OpenBMC Quanta BMC +- ``supermicrox11-bmc`` Supermicro X11 BMC + +AST2500 SoC based machines : + +- ``ast2500-evb`` Aspeed AST2500 Evaluation board +- ``romulus-bmc`` OpenPOWER Romulus POWER9 BMC +- ``witherspoon-bmc`` OpenPOWER Witherspoon POWER9 BMC +- ``sonorapass-bmc`` OCP SonoraPass BMC +- ``fp5280g2-bmc`` Inspur FP5280G2 BMC +- ``g220a-bmc`` Bytedance G220A BMC + +AST2600 SoC based machines : + +- ``ast2600-evb`` Aspeed AST2600 Evaluation board (Cortex-A7) +- ``tacoma-bmc`` OpenPOWER Witherspoon POWER9 AST2600 BMC +- ``rainier-bmc`` IBM Rainier POWER10 BMC +- ``fuji-bmc`` Facebook Fuji BMC +- ``bletchley-bmc`` Facebook Bletchley BMC +- ``fby35-bmc`` Facebook fby35 BMC +- ``qcom-dc-scm-v1-bmc`` Qualcomm DC-SCM V1 BMC +- ``qcom-firework-bmc`` Qualcomm Firework BMC + +Supported devices +----------------- + + * SMP (for the AST2600 Cortex-A7) + * Interrupt Controller (VIC) + * Timer Controller + * RTC Controller + * I2C Controller, including the new register interface of the AST2600 + * System Control Unit (SCU) + * SRAM mapping + * X-DMA Controller (basic interface) + * Static Memory Controller (SMC or FMC) - Only SPI Flash support + * SPI Memory Controller + * USB 2.0 Controller + * SD/MMC storage controllers + * SDRAM controller (dummy interface for basic settings and training) + * Watchdog Controller + * GPIO Controller (Master only) + * UART + * Ethernet controllers + * Front LEDs (PCA9552 on I2C bus) + * LPC Peripheral Controller (a subset of subdevices are supported) + * Hash/Crypto Engine (HACE) - Hash support only. TODO: HMAC and RSA + * ADC + * Secure Boot Controller (AST2600) + * eMMC Boot Controller (dummy) + * PECI Controller (minimal) + * I3C Controller + + +Missing devices +--------------- + + * Coprocessor support + * PWM and Fan Controller + * Slave GPIO Controller + * Super I/O Controller + * PCI-Express 1 Controller + * Graphic Display Controller + * MCTP Controller + * Mailbox Controller + * Virtual UART + * eSPI Controller + +Boot options +------------ + +The Aspeed machines can be started using the ``-kernel`` and ``-dtb`` options +to load a Linux kernel or from a firmware. Images can be downloaded from the +OpenBMC jenkins : + + https://jenkins.openbmc.org/job/ci-openbmc/lastSuccessfulBuild/ + +or directly from the OpenBMC GitHub release repository : + + https://github.com/openbmc/openbmc/releases + +To boot a kernel directly from a Linux build tree: + +.. code-block:: bash + + $ qemu-system-arm -M ast2600-evb -nographic \ + -kernel arch/arm/boot/zImage \ + -dtb arch/arm/boot/dts/aspeed-ast2600-evb.dtb \ + -initrd rootfs.cpio + +The image should be attached as an MTD drive. Run : + +.. code-block:: bash + + $ qemu-system-arm -M romulus-bmc -nic user \ + -drive file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd -nographic + +Options specific to Aspeed machines are : + + * ``execute-in-place`` which emulates the boot from the CE0 flash + device by using the FMC controller to load the instructions, and + not simply from RAM. This takes a little longer. + + * ``fmc-model`` to change the FMC Flash model. FW needs support for + the chip model to boot. + + * ``spi-model`` to change the SPI Flash model. + +For instance, to start the ``ast2500-evb`` machine with a different +FMC chip and a bigger (64M) SPI chip, use : + +.. code-block:: bash + + -M ast2500-evb,fmc-model=mx25l25635e,spi-model=mx66u51235f + + +Aspeed minibmc family boards (``ast1030-evb``) +================================================================== + +The QEMU Aspeed machines model mini BMCs of various Aspeed evaluation +boards. They are based on different releases of the +Aspeed SoC : the AST1030 integrating an ARM Cortex M4F CPU (200MHz). + +The SoC comes with SRAM, SPI, I2C, etc. + +AST1030 SoC based machines : + +- ``ast1030-evb`` Aspeed AST1030 Evaluation board (Cortex-M4F) + +Supported devices +----------------- + + * SMP (for the AST1030 Cortex-M4F) + * Interrupt Controller (VIC) + * Timer Controller + * I2C Controller + * System Control Unit (SCU) + * SRAM mapping + * Static Memory Controller (SMC or FMC) - Only SPI Flash support + * SPI Memory Controller + * USB 2.0 Controller + * Watchdog Controller + * GPIO Controller (Master only) + * UART + * LPC Peripheral Controller (a subset of subdevices are supported) + * Hash/Crypto Engine (HACE) - Hash support only. TODO: HMAC and RSA + * ADC + * Secure Boot Controller + * PECI Controller (minimal) + + +Missing devices +--------------- + + * PWM and Fan Controller + * Slave GPIO Controller + * Mailbox Controller + * Virtual UART + * eSPI Controller + * I3C Controller + +Boot options +------------ + +The Aspeed machines can be started using the ``-kernel`` to load a +Zephyr OS or from a firmware. Images can be downloaded from the +ASPEED GitHub release repository : + + https://github.com/AspeedTech-BMC/zephyr/releases + +To boot a kernel directly from a Zephyr build tree: + +.. code-block:: bash + + $ qemu-system-arm -M ast1030-evb -nographic \ + -kernel zephyr.elf + +Facebook Yosemite v3.5 Platform and CraterLake Server (``fby35``) +================================================================== + +Facebook has a series of multi-node compute server designs named +Yosemite. The most recent version released was +`Yosemite v3 <https://www.opencompute.org/documents/ocp-yosemite-v3-platform-design-specification-1v16-pdf>`__. + +Yosemite v3.5 is an iteration on this design, and is very similar: there's a +baseboard with a BMC, and 4 server slots. The new server board design termed +"CraterLake" includes a Bridge IC (BIC), with room for expansion boards to +include various compute accelerators (video, inferencing, etc). At the moment, +only the first server slot's BIC is included. + +Yosemite v3.5 is itself a sled which fits into a 40U chassis, and 3 sleds +can be fit into a chassis. See `here <https://www.opencompute.org/products/423/wiwynn-yosemite-v3-server>`__ +for an example. + +In this generation, the BMC is an AST2600 and each BIC is an AST1030. The BMC +runs `OpenBMC <https://github.com/facebook/openbmc>`__, and the BIC runs +`OpenBIC <https://github.com/facebook/openbic>`__. + +Firmware images can be retrieved from the Github releases or built from the +source code, see the README's for instructions on that. This image uses the +"fby35" machine recipe from OpenBMC, and the "yv35-cl" target from OpenBIC. +Some reference images can also be found here: + +.. code-block:: bash + + $ wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd + $ wget https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf + +Since this machine has multiple SoC's, each with their own serial console, the +recommended way to run it is to allocate a pseudoterminal for each serial +console and let the monitor use stdio. Also, starting in a paused state is +useful because it allows you to attach to the pseudoterminals before the boot +process starts. + +.. code-block:: bash + + $ qemu-system-arm -machine fby35 \ + -drive file=fby35.mtd,format=raw,if=mtd \ + -device loader,file=Y35BCL.elf,addr=0,cpu-num=2 \ + -serial pty -serial pty -serial mon:stdio \ + -display none -S + $ screen /dev/tty0 # In a separate TMUX pane, terminal window, etc. + $ screen /dev/tty1 + $ (qemu) c # Start the boot process once screen is setup. diff --git a/docs/system/arm/collie.rst b/docs/system/arm/collie.rst new file mode 100644 index 00000000..5cc67b6d --- /dev/null +++ b/docs/system/arm/collie.rst @@ -0,0 +1,16 @@ +Sharp Zaurus SL-5500 (``collie``) +================================= + +This machine is a model of the Sharp Zaurus SL-5500, which was +a 1990s PDA based on the StrongARM SA1110. + +Implemented devices: + + * NOR flash + * Interrupt controller + * Timer + * RTC + * GPIO + * Peripheral Pin Controller (PPC) + * UARTs + * Synchronous Serial Ports (SSP) diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst new file mode 100644 index 00000000..00c44404 --- /dev/null +++ b/docs/system/arm/cpu-features.rst @@ -0,0 +1,445 @@ +Arm CPU Features +================ + +CPU features are optional features that a CPU of supporting type may +choose to implement or not. In QEMU, optional CPU features have +corresponding boolean CPU proprieties that, when enabled, indicate +that the feature is implemented, and, conversely, when disabled, +indicate that it is not implemented. An example of an Arm CPU feature +is the Performance Monitoring Unit (PMU). CPU types such as the +Cortex-A15 and the Cortex-A57, which respectively implement Arm +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally +implement PMUs. For example, if a user wants to use a Cortex-A15 without +a PMU, then the ``-cpu`` parameter should contain ``pmu=off`` on the QEMU +command line, i.e. ``-cpu cortex-a15,pmu=off``. + +As not all CPU types support all optional CPU features, then whether or +not a CPU property exists depends on the CPU type. For example, CPUs +that implement the ARMv8-A architecture reference manual may optionally +support the AArch32 CPU feature, which may be enabled by disabling the +``aarch64`` CPU property. A CPU type such as the Cortex-A15, which does +not implement ARMv8-A, will not have the ``aarch64`` CPU property. + +QEMU's support may be limited for some CPU features, only partially +supporting the feature or only supporting the feature under certain +configurations. For example, the ``aarch64`` CPU feature, which, when +disabled, enables the optional AArch32 CPU feature, is only supported +when using the KVM accelerator and when running on a host CPU type that +supports the feature. While ``aarch64`` currently only works with KVM, +it could work with TCG. CPU features that are specific to KVM are +prefixed with "kvm-" and are described in "KVM VCPU Features". + +CPU Feature Probing +=================== + +Determining which CPU features are available and functional for a given +CPU type is possible with the ``query-cpu-model-expansion`` QMP command. +Below are some examples where ``scripts/qmp/qmp-shell`` (see the top comment +block in the script for usage) is used to issue the QMP commands. + +1. Determine which CPU features are available for the ``max`` CPU type + (Note, we started QEMU with qemu-system-aarch64, so ``max`` is + implementing the ARMv8-A reference manual in this case):: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max"} + { "return": { + "model": { "name": "max", "props": { + "sve1664": true, "pmu": true, "sve1792": true, "sve1920": true, + "sve128": true, "aarch64": true, "sve1024": true, "sve": true, + "sve640": true, "sve768": true, "sve1408": true, "sve256": true, + "sve1152": true, "sve512": true, "sve384": true, "sve1536": true, + "sve896": true, "sve1280": true, "sve2048": true + }}}} + +We see that the ``max`` CPU type has the ``pmu``, ``aarch64``, ``sve``, and many +``sve<N>`` CPU features. We also see that all the CPU features are +enabled, as they are all ``true``. (The ``sve<N>`` CPU features are all +optional SVE vector lengths (see "SVE CPU Properties"). While with TCG +all SVE vector lengths can be supported, when KVM is in use it's more +likely that only a few lengths will be supported, if SVE is supported at +all.) + +(2) Let's try to disable the PMU:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}} + { "return": { + "model": { "name": "max", "props": { + "sve1664": true, "pmu": false, "sve1792": true, "sve1920": true, + "sve128": true, "aarch64": true, "sve1024": true, "sve": true, + "sve640": true, "sve768": true, "sve1408": true, "sve256": true, + "sve1152": true, "sve512": true, "sve384": true, "sve1536": true, + "sve896": true, "sve1280": true, "sve2048": true + }}}} + +We see it worked, as ``pmu`` is now ``false``. + +(3) Let's try to disable ``aarch64``, which enables the AArch32 CPU feature:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"aarch64":false}} + {"error": { + "class": "GenericError", "desc": + "'aarch64' feature cannot be disabled unless KVM is enabled and 32-bit EL1 is supported" + }} + +It looks like this feature is limited to a configuration we do not +currently have. + +(4) Let's disable ``sve`` and see what happens to all the optional SVE + vector lengths:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"sve":false}} + { "return": { + "model": { "name": "max", "props": { + "sve1664": false, "pmu": true, "sve1792": false, "sve1920": false, + "sve128": false, "aarch64": true, "sve1024": false, "sve": false, + "sve640": false, "sve768": false, "sve1408": false, "sve256": false, + "sve1152": false, "sve512": false, "sve384": false, "sve1536": false, + "sve896": false, "sve1280": false, "sve2048": false + }}}} + +As expected they are now all ``false``. + +(5) Let's try probing CPU features for the Cortex-A15 CPU type:: + + (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"} + {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}} + +Only the ``pmu`` CPU feature is available. + +A note about CPU feature dependencies +------------------------------------- + +It's possible for features to have dependencies on other features. I.e. +it may be possible to change one feature at a time without error, but +when attempting to change all features at once an error could occur +depending on the order they are processed. It's also possible changing +all at once doesn't generate an error, because a feature's dependencies +are satisfied with other features, but the same feature cannot be changed +independently without error. For these reasons callers should always +attempt to make their desired changes all at once in order to ensure the +collection is valid. + +A note about CPU models and KVM +------------------------------- + +Named CPU models generally do not work with KVM. There are a few cases +that do work, e.g. using the named CPU model ``cortex-a57`` with KVM on a +seattle host, but mostly if KVM is enabled the ``host`` CPU type must be +used. This means the guest is provided all the same CPU features as the +host CPU type has. And, for this reason, the ``host`` CPU type should +enable all CPU features that the host has by default. Indeed it's even +a bit strange to allow disabling CPU features that the host has when using +the ``host`` CPU type, but in the absence of CPU models it's the best we can +do if we want to launch guests without all the host's CPU features enabled. + +Enabling KVM also affects the ``query-cpu-model-expansion`` QMP command. The +affect is not only limited to specific features, as pointed out in example +(3) of "CPU Feature Probing", but also to which CPU types may be expanded. +When KVM is enabled, only the ``max``, ``host``, and current CPU type may be +expanded. This restriction is necessary as it's not possible to know all +CPU types that may work with KVM, but it does impose a small risk of users +experiencing unexpected errors. For example on a seattle, as mentioned +above, the ``cortex-a57`` CPU type is also valid when KVM is enabled. +Therefore a user could use the ``host`` CPU type for the current type, but +then attempt to query ``cortex-a57``, however that query will fail with our +restrictions. This shouldn't be an issue though as management layers and +users have been preferring the ``host`` CPU type for use with KVM for quite +some time. Additionally, if the KVM-enabled QEMU instance running on a +seattle host is using the ``cortex-a57`` CPU type, then querying ``cortex-a57`` +will work. + +Using CPU Features +================== + +After determining which CPU features are available and supported for a +given CPU type, then they may be selectively enabled or disabled on the +QEMU command line with that CPU type:: + + $ qemu-system-aarch64 -M virt -cpu max,pmu=off,sve=on,sve128=on,sve256=on + +The example above disables the PMU and enables the first two SVE vector +lengths for the ``max`` CPU type. Note, the ``sve=on`` isn't actually +necessary, because, as we observed above with our probe of the ``max`` CPU +type, ``sve`` is already on by default. Also, based on our probe of +defaults, it would seem we need to disable many SVE vector lengths, rather +than only enabling the two we want. This isn't the case, because, as +disabling many SVE vector lengths would be quite verbose, the ``sve<N>`` CPU +properties have special semantics (see "SVE CPU Property Parsing +Semantics"). + +KVM VCPU Features +================= + +KVM VCPU features are CPU features that are specific to KVM, such as +paravirt features or features that enable CPU virtualization extensions. +The features' CPU properties are only available when KVM is enabled and +are named with the prefix "kvm-". KVM VCPU features may be probed, +enabled, and disabled in the same way as other CPU features. Below is +the list of KVM VCPU features and their descriptions. + + kvm-no-adjvtime By default kvm-no-adjvtime is disabled. This + means that by default the virtual time + adjustment is enabled (vtime is not *not* + adjusted). + + When virtual time adjustment is enabled each + time the VM transitions back to running state + the VCPU's virtual counter is updated to ensure + stopped time is not counted. This avoids time + jumps surprising guest OSes and applications, + as long as they use the virtual counter for + timekeeping. However it has the side effect of + the virtual and physical counters diverging. + All timekeeping based on the virtual counter + will appear to lag behind any timekeeping that + does not subtract VM stopped time. The guest + may resynchronize its virtual counter with + other time sources as needed. + + Enable kvm-no-adjvtime to disable virtual time + adjustment, also restoring the legacy (pre-5.0) + behavior. + + kvm-steal-time Since v5.2, kvm-steal-time is enabled by + default when KVM is enabled, the feature is + supported, and the guest is 64-bit. + + When kvm-steal-time is enabled a 64-bit guest + can account for time its CPUs were not running + due to the host not scheduling the corresponding + VCPU threads. The accounting statistics may + influence the guest scheduler behavior and/or be + exposed to the guest userspace. + +TCG VCPU Features +================= + +TCG VCPU features are CPU features that are specific to TCG. +Below is the list of TCG VCPU features and their descriptions. + + pauth-impdef When ``FEAT_Pauth`` is enabled, either the + *impdef* (Implementation Defined) algorithm + is enabled or the *architected* QARMA algorithm + is enabled. By default the impdef algorithm + is disabled, and QARMA is enabled. + + The architected QARMA algorithm has good + cryptographic properties, but can be quite slow + to emulate. The impdef algorithm used by QEMU + is non-cryptographic but significantly faster. + +SVE CPU Properties +================== + +There are two types of SVE CPU properties: ``sve`` and ``sve<N>``. The first +is used to enable or disable the entire SVE feature, just as the ``pmu`` +CPU property completely enables or disables the PMU. The second type +is used to enable or disable specific vector lengths, where ``N`` is the +number of bits of the length. The ``sve<N>`` CPU properties have special +dependencies and constraints, see "SVE CPU Property Dependencies and +Constraints" below. Additionally, as we want all supported vector lengths +to be enabled by default, then, in order to avoid overly verbose command +lines (command lines full of ``sve<N>=off``, for all ``N`` not wanted), we +provide the parsing semantics listed in "SVE CPU Property Parsing +Semantics". + +SVE CPU Property Dependencies and Constraints +--------------------------------------------- + + 1) At least one vector length must be enabled when ``sve`` is enabled. + + 2) If a vector length ``N`` is enabled, then, when KVM is enabled, all + smaller, host supported vector lengths must also be enabled. If + KVM is not enabled, then only all the smaller, power-of-two vector + lengths must be enabled. E.g. with KVM if the host supports all + vector lengths up to 512-bits (128, 256, 384, 512), then if ``sve512`` + is enabled, the 128-bit vector length, 256-bit vector length, and + 384-bit vector length must also be enabled. Without KVM, the 384-bit + vector length would not be required. + + 3) If KVM is enabled then only vector lengths that the host CPU type + support may be enabled. If SVE is not supported by the host, then + no ``sve*`` properties may be enabled. + +SVE CPU Property Parsing Semantics +---------------------------------- + + 1) If SVE is disabled (``sve=off``), then which SVE vector lengths + are enabled or disabled is irrelevant to the guest, as the entire + SVE feature is disabled and that disables all vector lengths for + the guest. However QEMU will still track any ``sve<N>`` CPU + properties provided by the user. If later an ``sve=on`` is provided, + then the guest will get only the enabled lengths. If no ``sve=on`` + is provided and there are explicitly enabled vector lengths, then + an error is generated. + + 2) If SVE is enabled (``sve=on``), but no ``sve<N>`` CPU properties are + provided, then all supported vector lengths are enabled, which when + KVM is not in use means including the non-power-of-two lengths, and, + when KVM is in use, it means all vector lengths supported by the host + processor. + + 3) If SVE is enabled, then an error is generated when attempting to + disable the last enabled vector length (see constraint (1) of "SVE + CPU Property Dependencies and Constraints"). + + 4) If one or more vector lengths have been explicitly enabled and at + least one of the dependency lengths of the maximum enabled length + has been explicitly disabled, then an error is generated (see + constraint (2) of "SVE CPU Property Dependencies and Constraints"). + + 5) When KVM is enabled, if the host does not support SVE, then an error + is generated when attempting to enable any ``sve*`` properties (see + constraint (3) of "SVE CPU Property Dependencies and Constraints"). + + 6) When KVM is enabled, if the host does support SVE, then an error is + generated when attempting to enable any vector lengths not supported + by the host (see constraint (3) of "SVE CPU Property Dependencies and + Constraints"). + + 7) If one or more ``sve<N>`` CPU properties are set ``off``, but no ``sve<N>``, + CPU properties are set ``on``, then the specified vector lengths are + disabled but the default for any unspecified lengths remains enabled. + When KVM is not enabled, disabling a power-of-two vector length also + disables all vector lengths larger than the power-of-two length. + When KVM is enabled, then disabling any supported vector length also + disables all larger vector lengths (see constraint (2) of "SVE CPU + Property Dependencies and Constraints"). + + 8) If one or more ``sve<N>`` CPU properties are set to ``on``, then they + are enabled and all unspecified lengths default to disabled, except + for the required lengths per constraint (2) of "SVE CPU Property + Dependencies and Constraints", which will even be auto-enabled if + they were not explicitly enabled. + + 9) If SVE was disabled (``sve=off``), allowing all vector lengths to be + explicitly disabled (i.e. avoiding the error specified in (3) of + "SVE CPU Property Parsing Semantics"), then if later an ``sve=on`` is + provided an error will be generated. To avoid this error, one must + enable at least one vector length prior to enabling SVE. + +SVE CPU Property Examples +------------------------- + + 1) Disable SVE:: + + $ qemu-system-aarch64 -M virt -cpu max,sve=off + + 2) Implicitly enable all vector lengths for the ``max`` CPU type:: + + $ qemu-system-aarch64 -M virt -cpu max + + 3) When KVM is enabled, implicitly enable all host CPU supported vector + lengths with the ``host`` CPU type:: + + $ qemu-system-aarch64 -M virt,accel=kvm -cpu host + + 4) Only enable the 128-bit vector length:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=on + + 5) Disable the 512-bit vector length and all larger vector lengths, + since 512 is a power-of-two. This results in all the smaller, + uninitialized lengths (128, 256, and 384) defaulting to enabled:: + + $ qemu-system-aarch64 -M virt -cpu max,sve512=off + + 6) Enable the 128-bit, 256-bit, and 512-bit vector lengths:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on + + 7) The same as (6), but since the 128-bit and 256-bit vector + lengths are required for the 512-bit vector length to be enabled, + then allow them to be auto-enabled:: + + $ qemu-system-aarch64 -M virt -cpu max,sve512=on + + 8) Do the same as (7), but by first disabling SVE and then re-enabling it:: + + $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on + + 9) Force errors regarding the last vector length:: + + $ qemu-system-aarch64 -M virt -cpu max,sve128=off + $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on + +SVE CPU Property Recommendations +-------------------------------- + +The examples in "SVE CPU Property Examples" exhibit many ways to select +vector lengths which developers may find useful in order to avoid overly +verbose command lines. However, the recommended way to select vector +lengths is to explicitly enable each desired length. Therefore only +example's (1), (4), and (6) exhibit recommended uses of the properties. + +SME CPU Property Examples +------------------------- + + 1) Disable SME:: + + $ qemu-system-aarch64 -M virt -cpu max,sme=off + + 2) Implicitly enable all vector lengths for the ``max`` CPU type:: + + $ qemu-system-aarch64 -M virt -cpu max + + 3) Only enable the 256-bit vector length:: + + $ qemu-system-aarch64 -M virt -cpu max,sme256=on + + 3) Enable the 256-bit and 1024-bit vector lengths:: + + $ qemu-system-aarch64 -M virt -cpu max,sme256=on,sme1024=on + + 4) Disable the 512-bit vector length. This results in all the other + lengths supported by ``max`` defaulting to enabled + (128, 256, 1024 and 2048):: + + $ qemu-system-aarch64 -M virt -cpu max,sve512=off + +SVE User-mode Default Vector Length Property +-------------------------------------------- + +For qemu-aarch64, the cpu property ``sve-default-vector-length=N`` is +defined to mirror the Linux kernel parameter file +``/proc/sys/abi/sve_default_vector_length``. The default length, ``N``, +is in units of bytes and must be between 16 and 8192. +If not specified, the default vector length is 64. + +If the default length is larger than the maximum vector length enabled, +the actual vector length will be reduced. Note that the maximum vector +length supported by QEMU is 256. + +If this property is set to ``-1`` then the default vector length +is set to the maximum possible length. + +SME CPU Properties +================== + +The SME CPU properties are much like the SVE properties: ``sme`` is +used to enable or disable the entire SME feature, and ``sme<N>`` is +used to enable or disable specific vector lengths. Finally, +``sme_fa64`` is used to enable or disable ``FEAT_SME_FA64``, which +allows execution of the "full a64" instruction set while Streaming +SVE mode is enabled. + +SME is not supported by KVM at this time. + +At least one vector length must be enabled when ``sme`` is enabled, +and all vector lengths must be powers of 2. The maximum vector +length supported by qemu is 2048 bits. Otherwise, there are no +additional constraints on the set of vector lengths supported by SME. + +SME User-mode Default Vector Length Property +-------------------------------------------- + +For qemu-aarch64, the cpu property ``sme-default-vector-length=N`` is +defined to mirror the Linux kernel parameter file +``/proc/sys/abi/sme_default_vector_length``. The default length, ``N``, +is in units of bytes and must be between 16 and 8192. +If not specified, the default vector length is 32. + +As with ``sve-default-vector-length``, if the default length is larger +than the maximum vector length enabled, the actual vector length will +be reduced. If this property is set to ``-1`` then the default vector +length is set to the maximum possible length. diff --git a/docs/system/arm/cubieboard.rst b/docs/system/arm/cubieboard.rst new file mode 100644 index 00000000..344ff8ce --- /dev/null +++ b/docs/system/arm/cubieboard.rst @@ -0,0 +1,16 @@ +Cubietech Cubieboard (``cubieboard``) +===================================== + +The ``cubieboard`` model emulates the Cubietech Cubieboard, +which is a Cortex-A8 based single-board computer using +the AllWinner A10 SoC. + +Emulated devices: + +- Timer +- UART +- RTC +- EMAC +- SDHCI +- USB controller +- SATA controller diff --git a/docs/system/arm/digic.rst b/docs/system/arm/digic.rst new file mode 100644 index 00000000..2b3520ff --- /dev/null +++ b/docs/system/arm/digic.rst @@ -0,0 +1,11 @@ +Canon A1100 (``canon-a1100``) +============================= + +This machine is a model of the Canon PowerShot A1100 camera, which +uses the DIGIC SoC. This model is based on reverse engineering efforts +by the contributors to the `CHDK <http://chdk.wikia.com/>`_ and +`Magic Lantern <http://www.magiclantern.fm/>`_ projects. + +The emulation is incomplete. In particular it can't be used +to run the original camera firmware, but it can successfully run +an experimental version of the `barebox bootloader <http://www.barebox.org/>`_. diff --git a/docs/system/arm/emcraft-sf2.rst b/docs/system/arm/emcraft-sf2.rst new file mode 100644 index 00000000..377e2487 --- /dev/null +++ b/docs/system/arm/emcraft-sf2.rst @@ -0,0 +1,15 @@ +Emcraft SmartFusion2 SOM kit (``emcraft-sf2``) +============================================== + +The ``emcraft-sf2`` board emulates the SmartFusion2 SOM kit from +Emcraft (M2S010). This is a System-on-Module from EmCraft systems, +based on the SmartFusion2 SoC FPGA from Microsemi Corporation. +The SoC is based on a Cortex-M4 processor. + +Emulated devices: + +- System timer +- System registers +- SPI controller +- UART +- EMAC diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst new file mode 100644 index 00000000..e3af79bb --- /dev/null +++ b/docs/system/arm/emulation.rst @@ -0,0 +1,132 @@ +A-profile CPU architecture support +================================== + +QEMU's TCG emulation includes support for the Armv5, Armv6, Armv7 and +Armv8 versions of the A-profile architecture. It also has support for +the following architecture extensions: + +- FEAT_AA32BF16 (AArch32 BFloat16 instructions) +- FEAT_AA32HPD (AArch32 hierarchical permission disables) +- FEAT_AA32I8MM (AArch32 Int8 matrix multiplication instructions) +- FEAT_AES (AESD and AESE instructions) +- FEAT_BBM at level 2 (Translation table break-before-make levels) +- FEAT_BF16 (AArch64 BFloat16 instructions) +- FEAT_BTI (Branch Target Identification) +- FEAT_CSV2 (Cache speculation variant 2) +- FEAT_CSV2_1p1 (Cache speculation variant 2, version 1.1) +- FEAT_CSV2_1p2 (Cache speculation variant 2, version 1.2) +- FEAT_CSV2_2 (Cache speculation variant 2, version 2) +- FEAT_CSV3 (Cache speculation variant 3) +- FEAT_DGH (Data gathering hint) +- FEAT_DIT (Data Independent Timing instructions) +- FEAT_DPB (DC CVAP instruction) +- FEAT_Debugv8p2 (Debug changes for v8.2) +- FEAT_Debugv8p4 (Debug changes for v8.4) +- FEAT_DotProd (Advanced SIMD dot product instructions) +- FEAT_DoubleFault (Double Fault Extension) +- FEAT_E0PD (Preventing EL0 access to halves of address maps) +- FEAT_ETS (Enhanced Translation Synchronization) +- FEAT_FCMA (Floating-point complex number instructions) +- FEAT_FHM (Floating-point half-precision multiplication instructions) +- FEAT_FP16 (Half-precision floating-point data processing) +- FEAT_FRINTTS (Floating-point to integer instructions) +- FEAT_FlagM (Flag manipulation instructions v2) +- FEAT_FlagM2 (Enhancements to flag manipulation instructions) +- FEAT_GTG (Guest translation granule size) +- FEAT_HAFDBS (Hardware management of the access flag and dirty bit state) +- FEAT_HCX (Support for the HCRX_EL2 register) +- FEAT_HPDS (Hierarchical permission disables) +- FEAT_I8MM (AArch64 Int8 matrix multiplication instructions) +- FEAT_IDST (ID space trap handling) +- FEAT_IESB (Implicit error synchronization event) +- FEAT_JSCVT (JavaScript conversion instructions) +- FEAT_LOR (Limited ordering regions) +- FEAT_LPA (Large Physical Address space) +- FEAT_LPA2 (Large Physical and virtual Address space v2) +- FEAT_LRCPC (Load-acquire RCpc instructions) +- FEAT_LRCPC2 (Load-acquire RCpc instructions v2) +- FEAT_LSE (Large System Extensions) +- FEAT_LVA (Large Virtual Address space) +- FEAT_MTE (Memory Tagging Extension) +- FEAT_MTE2 (Memory Tagging Extension) +- FEAT_MTE3 (MTE Asymmetric Fault Handling) +- FEAT_PAN (Privileged access never) +- FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN) +- FEAT_PAuth (Pointer authentication) +- FEAT_PMULL (PMULL, PMULL2 instructions) +- FEAT_PMUv3p1 (PMU Extensions v3.1) +- FEAT_PMUv3p4 (PMU Extensions v3.4) +- FEAT_PMUv3p5 (PMU Extensions v3.5) +- FEAT_RAS (Reliability, availability, and serviceability) +- FEAT_RASv1p1 (RAS Extension v1.1) +- FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions) +- FEAT_RNG (Random number generator) +- FEAT_S2FWB (Stage 2 forced Write-Back) +- FEAT_SB (Speculation Barrier) +- FEAT_SEL2 (Secure EL2) +- FEAT_SHA1 (SHA1 instructions) +- FEAT_SHA256 (SHA256 instructions) +- FEAT_SHA3 (Advanced SIMD SHA3 instructions) +- FEAT_SHA512 (Advanced SIMD SHA512 instructions) +- FEAT_SM3 (Advanced SIMD SM3 instructions) +- FEAT_SM4 (Advanced SIMD SM4 instructions) +- FEAT_SME (Scalable Matrix Extension) +- FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode) +- FEAT_SME_F64F64 (Double-precision floating-point outer product instructions) +- FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions) +- FEAT_SPECRES (Speculation restriction instructions) +- FEAT_SSBS (Speculative Store Bypass Safe) +- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain) +- FEAT_TLBIRANGE (TLB invalidate range instructions) +- FEAT_TTCNP (Translation table Common not private translations) +- FEAT_TTL (Translation Table Level) +- FEAT_TTST (Small translation tables) +- FEAT_UAO (Unprivileged Access Override control) +- FEAT_VHE (Virtualization Host Extensions) +- FEAT_VMID16 (16-bit VMID) +- FEAT_XNX (Translation table stage 2 Unprivileged Execute-never) +- SVE (The Scalable Vector Extension) +- SVE2 (The Scalable Vector Extension v2) + +For information on the specifics of these extensions, please refer +to the `Armv8-A Arm Architecture Reference Manual +<https://developer.arm.com/documentation/ddi0487/latest>`_. + +When a specific named CPU is being emulated, only those features which +are present in hardware for that CPU are emulated. (If a feature is +not in the list above then it is not supported, even if the real +hardware should have it.) The ``max`` CPU enables all features. + +R-profile CPU architecture support +================================== + +QEMU's TCG emulation support for R-profile CPUs is currently limited. +We emulate only the Cortex-R5 and Cortex-R5F CPUs. + +M-profile CPU architecture support +================================== + +QEMU's TCG emulation includes support for Armv6-M, Armv7-M, Armv8-M, and +Armv8.1-M versions of the M-profile architucture. It also has support +for the following architecture extensions: + +- FP (Floating-point Extension) +- FPCXT (FPCXT access instructions) +- HP (Half-precision floating-point instructions) +- LOB (Low Overhead loops and Branch future) +- M (Main Extension) +- MPU (Memory Protection Unit Extension) +- PXN (Privileged Execute Never) +- RAS (Reliability, Serviceability and Availability): "minimum RAS Extension" only +- S (Security Extension) +- ST (System Timer Extension) + +For information on the specifics of these extensions, please refer +to the `Armv8-M Arm Architecture Reference Manual +<https://developer.arm.com/documentation/ddi0553/latest>`_. + +When a specific named CPU is being emulated, only those features which +are present in hardware for that CPU are emulated. (If a feature is +not in the list above then it is not supported, even if the real +hardware should have it.) There is no equivalent of the ``max`` CPU for +M-profile. diff --git a/docs/system/arm/gumstix.rst b/docs/system/arm/gumstix.rst new file mode 100644 index 00000000..cb373139 --- /dev/null +++ b/docs/system/arm/gumstix.rst @@ -0,0 +1,21 @@ +Gumstix Connex and Verdex (``connex``, ``verdex``) +================================================== + +These machines model the Gumstix Connex and Verdex boards. +The Connex has a PXA255 CPU and the Verdex has a PXA270. + +Implemented devices: + + * NOR flash + * SMC91C111 ethernet + * Interrupt controller + * DMA + * Timer + * GPIO + * MMC/SD card + * Fast infra-red communications port (FIR) + * LCD controller + * Synchronous serial ports (SPI) + * PCMCIA interface + * I2C + * I2S diff --git a/docs/system/arm/highbank.rst b/docs/system/arm/highbank.rst new file mode 100644 index 00000000..bb4965b3 --- /dev/null +++ b/docs/system/arm/highbank.rst @@ -0,0 +1,19 @@ +Calxeda Highbank and Midway (``highbank``, ``midway``) +====================================================== + +``highbank`` is a model of the Calxeda Highbank (ECX-1000) system, +which has four Cortex-A9 cores. + +``midway`` is a model of the Calxeda Midway (ECX-2000) system, +which has four Cortex-A15 cores. + +Emulated devices: + +- L2x0 cache controller +- SP804 dual timer +- PL011 UART +- PL061 GPIOs +- PL031 RTC +- PL022 synchronous serial port controller +- AHCI +- XGMAC ethernet controllers diff --git a/docs/system/arm/imx25-pdk.rst b/docs/system/arm/imx25-pdk.rst new file mode 100644 index 00000000..2a9711e8 --- /dev/null +++ b/docs/system/arm/imx25-pdk.rst @@ -0,0 +1,19 @@ +NXP i.MX25 PDK board (``imx25-pdk``) +==================================== + +The ``imx25-pdk`` board emulates the NXP i.MX25 Product Development Kit +board, which is based on an i.MX25 SoC which uses an ARM926 CPU. + +Emulated devices: + +- SD controller +- AVIC +- CCM +- GPT +- EPIT timers +- FEC +- RNGC +- I2C +- GPIO controllers +- Watchdog timer +- USB controllers diff --git a/docs/system/arm/integratorcp.rst b/docs/system/arm/integratorcp.rst new file mode 100644 index 00000000..59443800 --- /dev/null +++ b/docs/system/arm/integratorcp.rst @@ -0,0 +1,16 @@ +Arm Integrator/CP (``integratorcp``) +==================================== + +The Arm Integrator/CP board is emulated with the following devices: + +- ARM926E, ARM1026E, ARM946E, ARM1136 or Cortex-A8 CPU + +- Two PL011 UARTs + +- SMC 91c111 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse. + +- PL181 MultiMedia Card Interface with SD card. diff --git a/docs/system/arm/kzm.rst b/docs/system/arm/kzm.rst new file mode 100644 index 00000000..bb018fbd --- /dev/null +++ b/docs/system/arm/kzm.rst @@ -0,0 +1,18 @@ +Kyoto Microcomputer KZM-ARM11-01 (``kzm``) +========================================== + +The ``kzm`` board emulates the Kyoto Microcomputer KZM-ARM11-01 +evaluation board, which is based on an NXP i.MX32 SoC +which uses an ARM1136 CPU. + +Emulated devices: + +- UARTs +- LAN9118 ethernet +- AVIC +- CCM +- GPT +- EPIT timers +- I2C +- GPIO controllers +- Watchdog timer diff --git a/docs/system/arm/mainstone.rst b/docs/system/arm/mainstone.rst new file mode 100644 index 00000000..05310f42 --- /dev/null +++ b/docs/system/arm/mainstone.rst @@ -0,0 +1,25 @@ +Intel Mainstone II board (``mainstone``) +======================================== + +The ``mainstone`` board emulates the Intel Mainstone II development +board, which uses a PXA270 CPU. + +Emulated devices: + +- Flash memory +- Keypad +- MMC controller +- 91C111 ethernet +- PIC +- Timer +- DMA +- GPIO +- FIR +- Serial +- LCD controller +- SSP +- USB controller +- RTC +- PCMCIA +- I2C +- I2S diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst new file mode 100644 index 00000000..8a75beb3 --- /dev/null +++ b/docs/system/arm/mps2.rst @@ -0,0 +1,57 @@ +Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``) +========================================================================================================================================================= + +These board models all use Arm M-profile CPUs. + +The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a +bigger FPGA but is otherwise the same as the 2; the 3 has a bigger +FPGA again, can handle 4GB of RAM and has a USB controller and QSPI flash). + +Since the CPU itself and most of the devices are in the FPGA, the +details of the board as seen by the guest depend significantly on the +FPGA image. + +QEMU models the following FPGA images: + +``mps2-an385`` + Cortex-M3 as documented in Arm Application Note AN385 +``mps2-an386`` + Cortex-M4 as documented in Arm Application Note AN386 +``mps2-an500`` + Cortex-M7 as documented in Arm Application Note AN500 +``mps2-an505`` + Cortex-M33 as documented in Arm Application Note AN505 +``mps2-an511`` + Cortex-M3 'DesignStart' as documented in Arm Application Note AN511 +``mps2-an521`` + Dual Cortex-M33 as documented in Arm Application Note AN521 +``mps3-an524`` + Dual Cortex-M33 on an MPS3, as documented in Arm Application Note AN524 +``mps3-an547`` + Cortex-M55 on an MPS3, as documented in Arm Application Note AN547 + +Differences between QEMU and real hardware: + +- AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to + block RAM is unimplemented (QEMU always maps this to ZBT SSRAM1, as + if zbt_boot_ctrl is always zero) +- AN524 remapping of low memory to either BRAM or to QSPI flash is + unimplemented (QEMU always maps this to BRAM, ignoring the + SCC CFG_REG0 memory-remap bit) +- QEMU provides a LAN9118 ethernet rather than LAN9220; the only guest + visible difference is that the LAN9118 doesn't support checksum + offloading +- QEMU does not model the QSPI flash in MPS3 boards as real QSPI + flash, but only as simple ROM, so attempting to rewrite the flash + from the guest will fail +- QEMU does not model the USB controller in MPS3 boards + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +remap + Supported for ``mps3-an524`` only. + Set ``BRAM``/``QSPI`` to select the initial memory mapping. The + default is ``BRAM``. diff --git a/docs/system/arm/musca.rst b/docs/system/arm/musca.rst new file mode 100644 index 00000000..81e3dc92 --- /dev/null +++ b/docs/system/arm/musca.rst @@ -0,0 +1,31 @@ +Arm Musca boards (``musca-a``, ``musca-b1``) +============================================ + +The Arm Musca development boards are a reference implementation +of a system using the SSE-200 Subsystem for Embedded. They are +dual Cortex-M33 systems. + +QEMU provides models of the A and B1 variants of this board. + +Unimplemented devices: + +- SPI +- |I2C| +- |I2S| +- PWM +- QSPI +- Timer +- SCC +- GPIO +- eFlash +- MHU +- PVT +- SDIO +- CryptoCell + +Note that (like the real hardware) the Musca-A machine is +asymmetric: CPU 0 does not have the FPU or DSP extensions, +but CPU 1 does. Also like the real hardware, the memory maps +for the A and B1 variants differ significantly, so guest +software must be built for the right variant. + diff --git a/docs/system/arm/musicpal.rst b/docs/system/arm/musicpal.rst new file mode 100644 index 00000000..9de380ed --- /dev/null +++ b/docs/system/arm/musicpal.rst @@ -0,0 +1,19 @@ +Freecom MusicPal (``musicpal``) +=============================== + +The Freecom MusicPal internet radio emulation includes the following +elements: + +- Marvell MV88W8618 Arm core. + +- 32 MB RAM, 256 KB SRAM, 8 MB flash. + +- Up to 2 16550 UARTs + +- MV88W8xx8 Ethernet controller + +- MV88W8618 audio controller, WM8750 CODEC and mixer + +- 128x64 display with brightness control + +- 2 buttons, 2 navigation wheels with button function diff --git a/docs/system/arm/nrf.rst b/docs/system/arm/nrf.rst new file mode 100644 index 00000000..eda87bd7 --- /dev/null +++ b/docs/system/arm/nrf.rst @@ -0,0 +1,51 @@ +Nordic nRF boards (``microbit``) +================================ + +The `Nordic nRF`_ chips are a family of ARM-based System-on-Chip that +are designed to be used for low-power and short-range wireless solutions. + +.. _Nordic nRF: https://www.nordicsemi.com/Products + +The nRF51 series is the first series for short range wireless applications. +It is superseded by the nRF52 series. +The following machines are based on this chip : + +- ``microbit`` BBC micro:bit board with nRF51822 SoC + +There are other series such as nRF52, nRF53 and nRF91 which are currently not +supported by QEMU. + +Supported devices +----------------- + + * ARM Cortex-M0 (ARMv6-M) + * Serial ports (UART) + * Clock controller + * Timers + * Random Number Generator (RNG) + * GPIO controller + * NVMC + * SWI + +Missing devices +--------------- + + * Watchdog + * Real-Time Clock (RTC) controller + * TWI (i2c) + * SPI controller + * Analog to Digital Converter (ADC) + * Quadrature decoder + * Radio + +Boot options +------------ + +The Micro:bit machine can be started using the ``-device`` option to load a +firmware in `ihex format`_. Example: + +.. _ihex format: https://en.wikipedia.org/wiki/Intel_HEX + +.. code-block:: bash + + $ qemu-system-arm -M microbit -device loader,file=test.hex diff --git a/docs/system/arm/nseries.rst b/docs/system/arm/nseries.rst new file mode 100644 index 00000000..cd9edf5d --- /dev/null +++ b/docs/system/arm/nseries.rst @@ -0,0 +1,33 @@ +Nokia N800 and N810 tablets (``n800``, ``n810``) +================================================ + +Nokia N800 and N810 internet tablets (known also as RX-34 and RX-44 / +48) emulation supports the following elements: + +- Texas Instruments OMAP2420 System-on-chip (ARM1136 core) + +- RAM and non-volatile OneNAND Flash memories + +- Display connected to EPSON remote framebuffer chip and OMAP on-chip + display controller and a LS041y3 MIPI DBI-C controller + +- TI TSC2301 (in N800) and TI TSC2005 (in N810) touchscreen + controllers driven through SPI bus + +- National Semiconductor LM8323-controlled qwerty keyboard driven + through |I2C| bus + +- Secure Digital card connected to OMAP MMC/SD host + +- Three OMAP on-chip UARTs and on-chip STI debugging console + +- Mentor Graphics \"Inventra\" dual-role USB controller embedded in a + TI TUSB6010 chip - only USB host mode is supported + +- TI TMP105 temperature sensor driven through |I2C| bus + +- TI TWL92230C power management companion with an RTC on + |I2C| bus + +- Nokia RETU and TAHVO multi-purpose chips with an RTC, connected + through CBUS diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst new file mode 100644 index 00000000..c38df32b --- /dev/null +++ b/docs/system/arm/nuvoton.rst @@ -0,0 +1,96 @@ +Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``) +================================================================ + +The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are +designed to be used as Baseboard Management Controllers (BMCs) in various +servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an +assortment of peripherals targeted for either Enterprise or Data Center / +Hyperscale applications. The former is a superset of the latter, so NPCM750 has +all the peripherals of NPCM730 and more. + +.. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/ + +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise +segment. The following machines are based on this chip : + +- ``npcm750-evb`` Nuvoton NPCM750 Evaluation board + +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and +Hyperscale applications. The following machines are based on this chip : + +- ``quanta-gbs-bmc`` Quanta GBS server BMC +- ``quanta-gsj`` Quanta GSJ server BMC +- ``kudo-bmc`` Fii USA Kudo server BMC +- ``mori-bmc`` Fii USA Mori server BMC + +There are also two more SoCs, NPCM710 and NPCM705, which are single-core +variants of NPCM750 and NPCM730, respectively. These are currently not +supported by QEMU. + +Supported devices +----------------- + + * SMP (Dual Core Cortex-A9) + * Cortex-A9MPCore built-in peripherals: SCU, GIC, Global Timer, Private Timer + and Watchdog. + * SRAM, ROM and DRAM mappings + * System Global Control Registers (GCR) + * Clock and reset controller (CLK) + * Timer controller (TIM) + * Serial ports (16550-based) + * DDR4 memory controller (dummy interface indicating memory training is done) + * OTP controllers (no protection features) + * Flash Interface Unit (FIU; no protection features) + * Random Number Generator (RNG) + * USB host (USBH) + * GPIO controller + * Analog to Digital Converter (ADC) + * Pulse Width Modulation (PWM) + * SMBus controller (SMBF) + * Ethernet controller (EMC) + * Tachometer + +Missing devices +--------------- + + * LPC/eSPI host-to-BMC interface, including + + * Keyboard and mouse controller interface (KBCI) + * Keyboard Controller Style (KCS) channels + * BIOS POST code FIFO + * System Wake-up Control (SWC) + * Shared memory (SHM) + * eSPI slave interface + + * Ethernet controller (GMAC) + * USB device (USBD) + * Peripheral SPI controller (PSPI) + * SD/MMC host + * PECI interface + * PCI and PCIe root complex and bridges + * VDM and MCTP support + * Serial I/O expansion + * LPC/eSPI host + * Coprocessor + * Graphics + * Video capture + * Encoding compression engine + * Security features + +Boot options +------------ + +The Nuvoton machines can boot from an OpenBMC firmware image, or directly into +a kernel using the ``-kernel`` option. OpenBMC images for ``quanta-gsj`` and +possibly others can be downloaded from the OpenBMC jenkins : + + https://jenkins.openbmc.org/ + +The firmware image should be attached as an MTD drive. Example : + +.. code-block:: bash + + $ qemu-system-arm -machine quanta-gsj -nographic \ + -drive file=image-bmc,if=mtd,bus=0,unit=0,format=raw + +The default root password for test images is usually ``0penBmc``. diff --git a/docs/system/arm/orangepi.rst b/docs/system/arm/orangepi.rst new file mode 100644 index 00000000..83c74451 --- /dev/null +++ b/docs/system/arm/orangepi.rst @@ -0,0 +1,263 @@ +Orange Pi PC (``orangepi-pc``) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Xunlong Orange Pi PC is an Allwinner H3 System on Chip +based embedded computer with mainline support in both U-Boot +and Linux. The board comes with a Quad Core Cortex-A7 @ 1.3GHz, +1GiB RAM, 100Mbit ethernet, USB, SD/MMC, USB, HDMI and +various other I/O. + +Supported devices +""""""""""""""""" + +The Orange Pi PC machine supports the following devices: + + * SMP (Quad Core Cortex-A7) + * Generic Interrupt Controller configuration + * SRAM mappings + * SDRAM controller + * Real Time Clock + * Timer device (re-used from Allwinner A10) + * UART + * SD/MMC storage controller + * EMAC ethernet + * USB 2.0 interfaces + * Clock Control Unit + * System Control module + * Security Identifier device + +Limitations +""""""""""" + +Currently, Orange Pi PC does *not* support the following features: + +- Graphical output via HDMI, GPU and/or the Display Engine +- Audio output +- Hardware Watchdog + +Also see the 'unimplemented' array in the Allwinner H3 SoC module +for a complete list of unimplemented I/O devices: ``./hw/arm/allwinner-h3.c`` + +Boot options +"""""""""""" + +The Orange Pi PC machine can start using the standard -kernel functionality +for loading a Linux kernel or ELF executable. Additionally, the Orange Pi PC +machine can also emulate the BootROM which is present on an actual Allwinner H3 +based SoC, which loads the bootloader from a SD card, specified via the -sd argument +to qemu-system-arm. + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +- allwinner-rtc.base-year=YYYY + + The Allwinner RTC device is automatically created by the Orange Pi PC machine + and uses a default base year value which can be overridden using the 'base-year' property. + The base year is the actual represented year when the RTC year value is zero. + This option can be used in case the target operating system driver uses a different + base year value. The minimum value for the base year is 1900. + +- allwinner-sid.identifier=abcd1122-a000-b000-c000-12345678ffff + + The Security Identifier value can be read by the guest. + For example, U-Boot uses it to determine a unique MAC address. + +The above machine-specific options can be specified in qemu-system-arm +via the '-global' argument, for example: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -sd mycard.img \ + -global allwinner-rtc.base-year=2000 + +Running mainline Linux +"""""""""""""""""""""" + +Mainline Linux kernels from 4.19 up to latest master are known to work. +To build a Linux mainline kernel that can be booted by the Orange Pi PC machine, +simply configure the kernel using the sunxi_defconfig configuration: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make mrproper + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make sunxi_defconfig + +To be able to use USB storage, you need to manually enable the corresponding +configuration item. Start the kconfig configuration tool: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make menuconfig + +Navigate to the following item, enable it and save your configuration: + + Device Drivers > USB support > USB Mass Storage support + +Build the Linux kernel with: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make + +To boot the newly build linux kernel in QEMU with the Orange Pi PC machine, use: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/linux/arch/arm/boot/zImage \ + -append 'console=ttyS0,115200' \ + -dtb /path/to/linux/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dtb + +Orange Pi PC images +""""""""""""""""""" + +Note that the mainline kernel does not have a root filesystem. You may provide it +with an official Orange Pi PC image from the official website: + + http://www.orangepi.org/downloadresources/ + +Another possibility is to run an Armbian image for Orange Pi PC which +can be downloaded from: + + https://www.armbian.com/orange-pi-pc/ + +Alternatively, you can also choose to build you own image with buildroot +using the orangepi_pc_defconfig. Also see https://buildroot.org for more information. + +When using an image as an SD card, it must be resized to a power of two. This can be +done with the ``qemu-img`` command. It is recommended to only increase the image size +instead of shrinking it to a power of two, to avoid loss of data. For example, +to prepare a downloaded Armbian image, first extract it and then increase +its size to one gigabyte as follows: + +.. code-block:: bash + + $ qemu-img resize Armbian_19.11.3_Orangepipc_bionic_current_5.3.9.img 1G + +You can choose to attach the selected image either as an SD card or as USB mass storage. +For example, to boot using the Orange Pi PC Debian image on SD card, simply add the -sd +argument and provide the proper root= kernel parameter: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/linux/arch/arm/boot/zImage \ + -append 'console=ttyS0,115200 root=/dev/mmcblk0p2' \ + -dtb /path/to/linux/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dtb \ + -sd OrangePi_pc_debian_stretch_server_linux5.3.5_v1.0.img + +To attach the image as an USB mass storage device to the machine, +simply append to the command: + +.. code-block:: bash + + -drive if=none,id=stick,file=myimage.img \ + -device usb-storage,bus=usb-bus.0,drive=stick + +Instead of providing a custom Linux kernel via the -kernel command you may also +choose to let the Orange Pi PC machine load the bootloader from SD card, just like +a real board would do using the BootROM. Simply pass the selected image via the -sd +argument and remove the -kernel, -append, -dbt and -initrd arguments: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -sd Armbian_19.11.3_Orangepipc_buster_current_5.3.9.img + +Note that both the official Orange Pi PC images and Armbian images start +a lot of userland programs via systemd. Depending on the host hardware and OS, +they may be slow to emulate, especially due to emulating the 4 cores. +To help reduce the performance slow down due to emulating the 4 cores, you can +give the following kernel parameters via U-Boot (or via -append): + +.. code-block:: bash + + => setenv extraargs 'systemd.default_timeout_start_sec=9000 loglevel=7 nosmp console=ttyS0,115200' + +Running U-Boot +"""""""""""""" + +U-Boot mainline can be build and configured using the orangepi_pc_defconfig +using similar commands as describe above for Linux. Note that it is recommended +for development/testing to select the following configuration setting in U-Boot: + + Device Tree Control > Provider for DTB for DT Control > Embedded DTB + +To start U-Boot using the Orange Pi PC machine, provide the +u-boot binary to the -kernel argument: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -kernel /path/to/uboot/u-boot -sd disk.img + +Use the following U-boot commands to load and boot a Linux kernel from SD card: + +.. code-block:: bash + + => setenv bootargs console=ttyS0,115200 + => ext2load mmc 0 0x42000000 zImage + => ext2load mmc 0 0x43000000 sun8i-h3-orangepi-pc.dtb + => bootz 0x42000000 - 0x43000000 + +Running NetBSD +"""""""""""""" + +The NetBSD operating system also includes support for Allwinner H3 based boards, +including the Orange Pi PC. NetBSD 9.0 is known to work best for the Orange Pi PC +board and provides a fully working system with serial console, networking and storage. +For the Orange Pi PC machine, get the 'evbarm-earmv7hf' based image from: + + https://cdn.netbsd.org/pub/NetBSD/NetBSD-9.0/evbarm-earmv7hf/binary/gzimg/armv7.img.gz + +The image requires manually installing U-Boot in the image. Build U-Boot with +the orangepi_pc_defconfig configuration as described in the previous section. +Next, unzip the NetBSD image and write the U-Boot binary including SPL using: + +.. code-block:: bash + + $ gunzip armv7.img.gz + $ dd if=/path/to/u-boot-sunxi-with-spl.bin of=armv7.img bs=1024 seek=8 conv=notrunc + +Finally, before starting the machine the SD image must be extended such +that the size of the SD image is a power of two and that the NetBSD kernel +will not conclude the NetBSD partition is larger than the emulated SD card: + +.. code-block:: bash + + $ qemu-img resize armv7.img 2G + +Start the machine using the following command: + +.. code-block:: bash + + $ qemu-system-arm -M orangepi-pc -nic user -nographic \ + -sd armv7.img -global allwinner-rtc.base-year=2000 + +At the U-Boot stage, interrupt the automatic boot process by pressing a key +and set the following environment variables before booting: + +.. code-block:: bash + + => setenv bootargs root=ld0a + => setenv kernel netbsd-GENERIC.ub + => setenv fdtfile dtb/sun8i-h3-orangepi-pc.dtb + => setenv bootcmd 'fatload mmc 0:1 ${kernel_addr_r} ${kernel}; fatload mmc 0:1 ${fdt_addr_r} ${fdtfile}; fdt addr ${fdt_addr_r}; bootm ${kernel_addr_r} - ${fdt_addr_r}' + +Optionally you may save the environment variables to SD card with 'saveenv'. +To continue booting simply give the 'boot' command and NetBSD boots. + +Orange Pi PC integration tests +"""""""""""""""""""""""""""""" + +The Orange Pi PC machine has several integration tests included. +To run the whole set of tests, build QEMU from source and simply +provide the following command: + +.. code-block:: bash + + $ AVOCADO_ALLOW_LARGE_STORAGE=yes avocado --show=app,console run \ + -t machine:orangepi-pc tests/avocado/boot_linux_console.py diff --git a/docs/system/arm/palm.rst b/docs/system/arm/palm.rst new file mode 100644 index 00000000..47ff9b36 --- /dev/null +++ b/docs/system/arm/palm.rst @@ -0,0 +1,23 @@ +Palm Tungsten|E PDA (``cheetah``) +================================= + +The Palm Tungsten|E PDA (codename \"Cheetah\") emulation includes the +following elements: + +- Texas Instruments OMAP310 System-on-chip (ARM925T core) + +- ROM and RAM memories (ROM firmware image can be loaded with + -option-rom) + +- On-chip LCD controller + +- On-chip Real Time Clock + +- TI TSC2102i touchscreen controller / analog-digital converter / + Audio CODEC, connected through MicroWire and |I2S| busses + +- GPIO-connected matrix keypad + +- Secure Digital card connected to OMAP MMC/SD host + +- Three on-chip UARTs diff --git a/docs/system/arm/raspi.rst b/docs/system/arm/raspi.rst new file mode 100644 index 00000000..922fe375 --- /dev/null +++ b/docs/system/arm/raspi.rst @@ -0,0 +1,43 @@ +Raspberry Pi boards (``raspi0``, ``raspi1ap``, ``raspi2b``, ``raspi3ap``, ``raspi3b``) +====================================================================================== + + +QEMU provides models of the following Raspberry Pi boards: + +``raspi0`` and ``raspi1ap`` + ARM1176JZF-S core, 512 MiB of RAM +``raspi2b`` + Cortex-A7 (4 cores), 1 GiB of RAM +``raspi3ap`` + Cortex-A53 (4 cores), 512 MiB of RAM +``raspi3b`` + Cortex-A53 (4 cores), 1 GiB of RAM + + +Implemented devices +------------------- + + * ARM1176JZF-S, Cortex-A7 or Cortex-A53 CPU + * Interrupt controller + * DMA controller + * Clock and reset controller (CPRMAN) + * System Timer + * GPIO controller + * Serial ports (BCM2835 AUX - 16550 based - and PL011) + * Random Number Generator (RNG) + * Frame Buffer + * USB host (USBH) + * GPIO controller + * SD/MMC host controller + * SoC thermal sensor + * USB2 host controller (DWC2 and MPHI) + * MailBox controller (MBOX) + * VideoCore firmware (property) + + +Missing devices +--------------- + + * Peripheral SPI controller (SPI) + * Analog to Digital Converter (ADC) + * Pulse Width Modulation (PWM) diff --git a/docs/system/arm/realview.rst b/docs/system/arm/realview.rst new file mode 100644 index 00000000..65f5be34 --- /dev/null +++ b/docs/system/arm/realview.rst @@ -0,0 +1,34 @@ +Arm Realview boards (``realview-eb``, ``realview-eb-mpcore``, ``realview-pb-a8``, ``realview-pbx-a9``) +====================================================================================================== + +Several variants of the Arm RealView baseboard are emulated, including +the EB, PB-A8 and PBX-A9. Due to interactions with the bootloader, only +certain Linux kernel configurations work out of the box on these boards. + +Kernels for the PB-A8 board should have CONFIG_REALVIEW_HIGH_PHYS_OFFSET +enabled in the kernel, and expect 512M RAM. Kernels for The PBX-A9 board +should have CONFIG_SPARSEMEM enabled, CONFIG_REALVIEW_HIGH_PHYS_OFFSET +disabled and expect 1024M RAM. + +The following devices are emulated: + +- ARM926E, ARM1136, ARM11MPCore, Cortex-A8 or Cortex-A9 MPCore CPU + +- Arm AMBA Generic/Distributed Interrupt Controller + +- Four PL011 UARTs + +- SMC 91c111 or SMSC LAN9118 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse + +- PCI host bridge + +- PCI OHCI USB controller + +- LSI53C895A PCI SCSI Host Bus Adapter with hard disk and CD-ROM + devices + +- PL181 MultiMedia Card Interface with SD card. diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst new file mode 100644 index 00000000..4ccb0560 --- /dev/null +++ b/docs/system/arm/sabrelite.rst @@ -0,0 +1,119 @@ +Boundary Devices SABRE Lite (``sabrelite``) +=========================================== + +Boundary Devices SABRE Lite i.MX6 Development Board is a low-cost development +platform featuring the powerful Freescale / NXP Semiconductor's i.MX 6 Quad +Applications Processor. + +Supported devices +----------------- + +The SABRE Lite machine supports the following devices: + + * Up to 4 Cortex-A9 cores + * Generic Interrupt Controller + * 1 Clock Controller Module + * 1 System Reset Controller + * 5 UARTs + * 2 EPIC timers + * 1 GPT timer + * 2 Watchdog timers + * 1 FEC Ethernet controller + * 3 I2C controllers + * 7 GPIO controllers + * 4 SDHC storage controllers + * 4 USB 2.0 host controllers + * 5 ECSPI controllers + * 1 SST 25VF016B flash + +Please note above list is a complete superset the QEMU SABRE Lite machine can +support. For a normal use case, a device tree blob that represents a real world +SABRE Lite board, only exposes a subset of devices to the guest software. + +Boot options +------------ + +The SABRE Lite machine can start using the standard -kernel functionality +for loading a Linux kernel, U-Boot bootloader or ELF executable. + +Running Linux kernel +-------------------- + +Linux mainline v5.10 release is tested at the time of writing. To build a Linux +mainline kernel that can be booted by the SABRE Lite machine, simply configure +the kernel using the imx_v6_v7_defconfig configuration: + +.. code-block:: bash + + $ export ARCH=arm + $ export CROSS_COMPILE=arm-linux-gnueabihf- + $ make imx_v6_v7_defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the SABRE Lite machine, use: + +.. code-block:: bash + + $ qemu-system-arm -M sabrelite -smp 4 -m 1G \ + -display none -serial null -serial stdio \ + -kernel arch/arm/boot/zImage \ + -dtb arch/arm/boot/dts/imx6q-sabrelite.dtb \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +Running U-Boot +-------------- + +U-Boot mainline v2020.10 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the SABRE Lite machine, use +the mx6qsabrelite_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=arm-linux-gnueabihf- + $ make mx6qsabrelite_defconfig + +Note we need to adjust settings by: + +.. code-block:: bash + + $ make menuconfig + +then manually select the following configuration in U-Boot: + + Device Tree Control > Provider of DTB for DT Control > Embedded DTB + +To start U-Boot using the SABRE Lite machine, provide the u-boot binary to +the -kernel argument, along with an SD card image with rootfs: + +.. code-block:: bash + + $ qemu-system-arm -M sabrelite -smp 4 -m 1G \ + -display none -serial null -serial stdio \ + -kernel u-boot + +The following example shows booting Linux kernel from dhcp, and uses the +rootfs on an SD card. This requires some additional command line parameters +for QEMU: + +.. code-block:: none + + -nic user,tftp=/path/to/kernel/zImage \ + -drive file=sdcard.img,id=rootfs -device sd-card,drive=rootfs + +The directory for the built-in TFTP server should also contain the device tree +blob of the SABRE Lite board. The sample SD card image was populated with the +root file system with one single partition. You may adjust the kernel "root=" +boot parameter accordingly. + +After U-Boot boots, type the following commands in the U-Boot command shell to +boot the Linux kernel: + +.. code-block:: none + + => setenv ethaddr 00:11:22:33:44:55 + => setenv bootfile zImage + => dhcp + => tftpboot 14000000 imx6q-sabrelite.dtb + => setenv bootargs root=/dev/mmcblk3p1 + => bootz 12000000 - 14000000 diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst new file mode 100644 index 00000000..b499d7e9 --- /dev/null +++ b/docs/system/arm/sbsa.rst @@ -0,0 +1,32 @@ +Arm Server Base System Architecture Reference board (``sbsa-ref``) +================================================================== + +While the ``virt`` board is a generic board platform that doesn't match +any real hardware the ``sbsa-ref`` board intends to look like real +hardware. The `Server Base System Architecture +<https://developer.arm.com/documentation/den0029/latest>`_ defines a +minimum base line of hardware support and importantly how the firmware +reports that to any operating system. It is a static system that +reports a very minimal DT to the firmware for non-discoverable +information about components affected by the qemu command line (i.e. +cpus and memory). As a result it must have a firmware specifically +built to expect a certain hardware layout (as you would in a real +machine). + +It is intended to be a machine for developing firmware and testing +standards compliance with operating systems. + +Supported devices +""""""""""""""""" + +The sbsa-ref board supports: + + - A configurable number of AArch64 CPUs + - GIC version 3 + - System bus AHCI controller + - System bus EHCI controller + - CDROM and hard disc on AHCI bus + - E1000E ethernet card on PCIe bus + - VGA display adaptor on PCIe bus + - A generic SBSA watchdog device + diff --git a/docs/system/arm/stellaris.rst b/docs/system/arm/stellaris.rst new file mode 100644 index 00000000..8af4ad79 --- /dev/null +++ b/docs/system/arm/stellaris.rst @@ -0,0 +1,26 @@ +Stellaris boards (``lm3s6965evb``, ``lm3s811evb``) +================================================== + +The Luminary Micro Stellaris LM3S811EVB emulation includes the following +devices: + +- Cortex-M3 CPU core. + +- 64k Flash and 8k SRAM. + +- Timers, UARTs, ADC and |I2C| interface. + +- OSRAM Pictiva 96x16 OLED with SSD0303 controller on + |I2C| bus. + +The Luminary Micro Stellaris LM3S6965EVB emulation includes the +following devices: + +- Cortex-M3 CPU core. + +- 256k Flash and 64k SRAM. + +- Timers, UARTs, ADC, |I2C| and SSI interfaces. + +- OSRAM Pictiva 128x64 OLED with SSD0323 controller connected via + SSI. diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst new file mode 100644 index 00000000..508b92cf --- /dev/null +++ b/docs/system/arm/stm32.rst @@ -0,0 +1,66 @@ +STMicroelectronics STM32 boards (``netduino2``, ``netduinoplus2``, ``stm32vldiscovery``) +======================================================================================== + +The `STM32`_ chips are a family of 32-bit ARM-based microcontroller by +STMicroelectronics. + +.. _STM32: https://www.st.com/en/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus.html + +The STM32F1 series is based on ARM Cortex-M3 core. The following machines are +based on this chip : + +- ``stm32vldiscovery`` STM32VLDISCOVERY board with STM32F100RBT6 microcontroller + +The STM32F2 series is based on ARM Cortex-M3 core. The following machines are +based on this chip : + +- ``netduino2`` Netduino 2 board with STM32F205RFT6 microcontroller + +The STM32F4 series is based on ARM Cortex-M4F core. This series is pin-to-pin +compatible with STM32F2 series. The following machines are based on this chip : + +- ``netduinoplus2`` Netduino Plus 2 board with STM32F405RGT6 microcontroller + +There are many other STM32 series that are currently not supported by QEMU. + +Supported devices +----------------- + + * ARM Cortex-M3, Cortex M4F + * Analog to Digital Converter (ADC) + * EXTI interrupt + * Serial ports (USART) + * SPI controller + * System configuration (SYSCFG) + * Timer controller (TIMER) + +Missing devices +--------------- + + * Camera interface (DCMI) + * Controller Area Network (CAN) + * Cycle Redundancy Check (CRC) calculation unit + * Digital to Analog Converter (DAC) + * DMA controller + * Ethernet controller + * Flash Interface Unit + * GPIO controller + * I2C controller + * Inter-Integrated Sound (I2S) controller + * Power supply configuration (PWR) + * Random Number Generator (RNG) + * Real-Time Clock (RTC) controller + * Reset and Clock Controller (RCC) + * Secure Digital Input/Output (SDIO) interface + * USB OTG + * Watchdog controller (IWDG, WWDG) + +Boot options +------------ + +The STM32 machines can be started using the ``-kernel`` option to load a +firmware. Example: + +.. code-block:: bash + + $ qemu-system-arm -M stm32vldiscovery -kernel firmware.bin diff --git a/docs/system/arm/sx1.rst b/docs/system/arm/sx1.rst new file mode 100644 index 00000000..8bce30d4 --- /dev/null +++ b/docs/system/arm/sx1.rst @@ -0,0 +1,18 @@ +Siemens SX1 (``sx1``, ``sx1-v1``) +================================= + +The Siemens SX1 models v1 and v2 (default) basic emulation. The +emulation includes the following elements: + +- Texas Instruments OMAP310 System-on-chip (ARM925T core) + +- ROM and RAM memories (ROM firmware image can be loaded with + -pflash) V1 1 Flash of 16MB and 1 Flash of 8MB V2 1 Flash of 32MB + +- On-chip LCD controller + +- On-chip Real Time Clock + +- Secure Digital card connected to OMAP MMC/SD host + +- Three on-chip UARTs diff --git a/docs/system/arm/versatile.rst b/docs/system/arm/versatile.rst new file mode 100644 index 00000000..2ae792ba --- /dev/null +++ b/docs/system/arm/versatile.rst @@ -0,0 +1,63 @@ +Arm Versatile boards (``versatileab``, ``versatilepb``) +======================================================= + +The Arm Versatile baseboard is emulated with the following devices: + +- ARM926E, ARM1136 or Cortex-A8 CPU + +- PL190 Vectored Interrupt Controller + +- Four PL011 UARTs + +- SMC 91c111 Ethernet adapter + +- PL110 LCD controller + +- PL050 KMI with PS/2 keyboard and mouse. + +- PCI host bridge. Note the emulated PCI bridge only provides access + to PCI memory space. It does not provide access to PCI IO space. This + means some devices (eg. ne2k_pci NIC) are not usable, and others (eg. + rtl8139 NIC) are only usable when the guest drivers use the memory + mapped control registers. + +- PCI OHCI USB controller. + +- LSI53C895A PCI SCSI Host Bus Adapter with hard disk and CD-ROM + devices. + +- PL181 MultiMedia Card Interface with SD card. + +Booting a Linux kernel +---------------------- + +Building a current Linux kernel with ``versatile_defconfig`` should be +enough to get something running. Nowadays an out-of-tree build is +recommended (and also useful if you build a lot of different targets). +In the following example $BLD points to the build directory and $SRC +points to the root of the Linux source tree. You can drop $SRC if you +are running from there. + +.. code-block:: bash + + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- versatile_defconfig + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- + +You may want to enable some additional modules if you want to boot +something from the SCSI interface:: + + CONFIG_PCI=y + CONFIG_PCI_VERSATILE=y + CONFIG_SCSI=y + CONFIG_SCSI_SYM53C8XX_2=y + +You can then boot with a command line like: + +.. code-block:: bash + + $ qemu-system-arm -machine type=versatilepb \ + -serial mon:stdio \ + -drive if=scsi,driver=file,filename=debian-buster-armel-rootfs.ext4 \ + -kernel zImage \ + -dtb versatile-pb.dtb \ + -append "console=ttyAMA0 ro root=/dev/sda" diff --git a/docs/system/arm/vexpress.rst b/docs/system/arm/vexpress.rst new file mode 100644 index 00000000..3e3839e9 --- /dev/null +++ b/docs/system/arm/vexpress.rst @@ -0,0 +1,88 @@ +Arm Versatile Express boards (``vexpress-a9``, ``vexpress-a15``) +================================================================ + +QEMU models two variants of the Arm Versatile Express development +board family: + +- ``vexpress-a9`` models the combination of the Versatile Express + motherboard and the CoreTile Express A9x4 daughterboard +- ``vexpress-a15`` models the combination of the Versatile Express + motherboard and the CoreTile Express A15x2 daughterboard + +Note that as this hardware does not have PCI, IDE or SCSI, +the only available storage option is emulated SD card. + +Implemented devices: + +- PL041 audio +- PL181 SD controller +- PL050 keyboard and mouse +- PL011 UARTs +- SP804 timers +- I2C controller +- PL031 RTC +- PL111 LCD display controller +- Flash memory +- LAN9118 ethernet + +Unimplemented devices: + +- SP810 system control block +- PCI-express +- USB controller (Philips ISP1761) +- Local DAP ROM +- CoreSight interfaces +- PL301 AXI interconnect +- SCC +- System counter +- HDLCD controller (``vexpress-a15``) +- SP805 watchdog +- PL341 dynamic memory controller +- DMA330 DMA controller +- PL354 static memory controller +- BP147 TrustZone Protection Controller +- TrustZone Address Space Controller + +Other differences between the hardware and the QEMU model: + +- QEMU will default to creating one CPU unless you pass a different + ``-smp`` argument +- QEMU allows the amount of RAM provided to be specified with the + ``-m`` argument +- QEMU defaults to providing a CPU which does not provide either + TrustZone or the Virtualization Extensions: if you want these you + must enable them with ``-machine secure=on`` and ``-machine + virtualization=on`` +- QEMU provides 4 virtio-mmio virtio transports; these start at + address ``0x10013000`` for ``vexpress-a9`` and at ``0x1c130000`` for + ``vexpress-a15``, and have IRQs from 40 upwards. If a dtb is + provided on the command line then QEMU will edit it to include + suitable entries describing these transports for the guest. + +Booting a Linux kernel +---------------------- + +Building a current Linux kernel with ``multi_v7_defconfig`` should be +enough to get something running. Nowadays an out-of-tree build is +recommended (and also useful if you build a lot of different targets). +In the following example $BLD points to the build directory and $SRC +points to the root of the Linux source tree. You can drop $SRC if you +are running from there. + +.. code-block:: bash + + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- multi_v7_defconfig + $ make O=$BLD -C $SRC ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- + +By default you will want to boot your rootfs off the sdcard interface. +Your rootfs will need to be padded to the right size. With a suitable +DTB you could also add devices to the virtio-mmio bus. + +.. code-block:: bash + + $ qemu-system-arm -cpu cortex-a15 -smp 4 -m 4096 \ + -machine type=vexpress-a15 -serial mon:stdio \ + -drive if=sd,driver=file,filename=armel-rootfs.ext4 \ + -kernel zImage \ + -dtb vexpress-v2p-ca15-tc1.dtb \ + -append "console=ttyAMA0 root=/dev/mmcblk0 ro" diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst new file mode 100644 index 00000000..20442ea2 --- /dev/null +++ b/docs/system/arm/virt.rst @@ -0,0 +1,187 @@ +'virt' generic virtual platform (``virt``) +========================================== + +The ``virt`` board is a platform which does not correspond to any +real hardware; it is designed for use in virtual machines. +It is the recommended board type if you simply want to run +a guest such as Linux and do not care about reproducing the +idiosyncrasies and limitations of a particular bit of real-world +hardware. + +This is a "versioned" board model, so as well as the ``virt`` machine +type itself (which may have improvements, bugfixes and other minor +changes between QEMU versions) a version is provided that guarantees +to have the same behaviour as that of previous QEMU releases, so +that VM migration will work between QEMU versions. For instance the +``virt-5.0`` machine type will behave like the ``virt`` machine from +the QEMU 5.0 release, and migration should work between ``virt-5.0`` +of the 5.0 release and ``virt-5.0`` of the 5.1 release. Migration +is not guaranteed to work between different QEMU releases for +the non-versioned ``virt`` machine type. + +Supported devices +""""""""""""""""" + +The virt board supports: + +- PCI/PCIe devices +- Flash memory +- One PL011 UART +- An RTC +- The fw_cfg device that allows a guest to obtain data from QEMU +- A PL061 GPIO controller +- An optional SMMUv3 IOMMU +- hotpluggable DIMMs +- hotpluggable NVDIMMs +- An MSI controller (GICv2M or ITS). GICv2M is selected by default along + with GICv2. ITS is selected by default with GICv3 (>= virt-2.7). Note + that ITS is not modeled in TCG mode. +- 32 virtio-mmio transport devices +- running guests using the KVM accelerator on aarch64 hardware +- large amounts of RAM (at least 255GB, and more if using highmem) +- many CPUs (up to 512 if using a GICv3 and highmem) +- Secure-World-only devices if the CPU has TrustZone: + + - A second PL011 UART + - A second PL061 GPIO controller, with GPIO lines for triggering + a system reset or system poweroff + - A secure flash memory + - 16MB of secure RAM + +Supported guest CPU types: + +- ``cortex-a7`` (32-bit) +- ``cortex-a15`` (32-bit; the default) +- ``cortex-a35`` (64-bit) +- ``cortex-a53`` (64-bit) +- ``cortex-a57`` (64-bit) +- ``cortex-a72`` (64-bit) +- ``cortex-a76`` (64-bit) +- ``a64fx`` (64-bit) +- ``host`` (with KVM only) +- ``neoverse-n1`` (64-bit) +- ``max`` (same as ``host`` for KVM; best possible emulation with TCG) + +Note that the default is ``cortex-a15``, so for an AArch64 guest you must +specify a CPU type. + +Graphics output is available, but unlike the x86 PC machine types +there is no default display device enabled: you should select one from +the Display devices section of "-device help". The recommended option +is ``virtio-gpu-pci``; this is the only one which will work correctly +with KVM. You may also need to ensure your guest kernel is configured +with support for this; see below. + +Machine-specific options +"""""""""""""""""""""""" + +The following machine-specific options are supported: + +secure + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Security Extensions (TrustZone). The default is ``off``. + +virtualization + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Virtualization Extensions. The default is ``off``. + +mte + Set ``on``/``off`` to enable/disable emulating a guest CPU which implements the + Arm Memory Tagging Extensions. The default is ``off``. + +highmem + Set ``on``/``off`` to enable/disable placing devices and RAM in physical + address space above 32 bits. The default is ``on`` for machine types + later than ``virt-2.12``. + +gic-version + Specify the version of the Generic Interrupt Controller (GIC) to provide. + Valid values are: + + ``2`` + GICv2. Note that this limits the number of CPUs to 8. + ``3`` + GICv3. This allows up to 512 CPUs. + ``4`` + GICv4. Requires ``virtualization`` to be ``on``; allows up to 317 CPUs. + ``host`` + Use the same GIC version the host provides, when using KVM + ``max`` + Use the best GIC version possible (same as host when using KVM; + with TCG this is currently ``3`` if ``virtualization`` is ``off`` and + ``4`` if ``virtualization`` is ``on``, but this may change in future) + +its + Set ``on``/``off`` to enable/disable ITS instantiation. The default is ``on`` + for machine types later than ``virt-2.7``. + +iommu + Set the IOMMU type to create for the guest. Valid values are: + + ``none`` + Don't create an IOMMU (the default) + ``smmuv3`` + Create an SMMUv3 + +ras + Set ``on``/``off`` to enable/disable reporting host memory errors to a guest + using ACPI and guest external abort exceptions. The default is off. + +dtb-randomness + Set ``on``/``off`` to pass random seeds via the guest DTB + rng-seed and kaslr-seed nodes (in both "/chosen" and + "/secure-chosen") to use for features like the random number + generator and address space randomisation. The default is + ``on``. You will want to disable it if your trusted boot chain + will verify the DTB it is passed, since this option causes the + DTB to be non-deterministic. It would be the responsibility of + the firmware to come up with a seed and pass it on if it wants to. + +dtb-kaslr-seed + A deprecated synonym for dtb-randomness. + +Linux guest kernel configuration +"""""""""""""""""""""""""""""""" + +The 'defconfig' for Linux arm and arm64 kernels should include the +right device drivers for virtio and the PCI controller; however some older +kernel versions, especially for 32-bit Arm, did not have everything +enabled by default. If you're not seeing PCI devices that you expect, +then check that your guest config has:: + + CONFIG_PCI=y + CONFIG_VIRTIO_PCI=y + CONFIG_PCI_HOST_GENERIC=y + +If you want to use the ``virtio-gpu-pci`` graphics device you will also +need:: + + CONFIG_DRM=y + CONFIG_DRM_VIRTIO_GPU=y + +Hardware configuration information for bare-metal programming +""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +The ``virt`` board automatically generates a device tree blob ("dtb") +which it passes to the guest. This provides information about the +addresses, interrupt lines and other configuration of the various devices +in the system. Guest code can rely on and hard-code the following +addresses: + +- Flash memory starts at address 0x0000_0000 + +- RAM starts at 0x4000_0000 + +All other information about device locations may change between +QEMU versions, so guest code must look in the DTB. + +QEMU supports two types of guest image boot for ``virt``, and +the way for the guest code to locate the dtb binary differs: + +- For guests using the Linux kernel boot protocol (this means any + non-ELF file passed to the QEMU ``-kernel`` option) the address + of the DTB is passed in a register (``r2`` for 32-bit guests, + or ``x0`` for 64-bit guests) + +- For guests booting as "bare-metal" (any other kind of boot), + the DTB is at the start of RAM (0x4000_0000) diff --git a/docs/system/arm/xlnx-versal-virt.rst b/docs/system/arm/xlnx-versal-virt.rst new file mode 100644 index 00000000..92ad10d2 --- /dev/null +++ b/docs/system/arm/xlnx-versal-virt.rst @@ -0,0 +1,226 @@ +Xilinx Versal Virt (``xlnx-versal-virt``) +========================================= + +Xilinx Versal is a family of heterogeneous multi-core SoCs +(System on Chip) that combine traditional hardened CPUs and I/O +peripherals in a Processing System (PS) with runtime programmable +FPGA logic (PL) and an Artificial Intelligence Engine (AIE). + +More details here: +https://www.xilinx.com/products/silicon-devices/acap/versal.html + +The family of Versal SoCs share a single architecture but come in +different parts with different speed grades, amounts of PL and +other differences. + +The Xilinx Versal Virt board in QEMU is a model of a virtual board +(does not exist in reality) with a virtual Versal SoC without I/O +limitations. Currently, we support the following cores and devices: + +Implemented CPU cores: + +- 2 ACPUs (ARM Cortex-A72) + +Implemented devices: + +- Interrupt controller (ARM GICv3) +- 2 UARTs (ARM PL011) +- An RTC (Versal built-in) +- 2 GEMs (Cadence MACB Ethernet MACs) +- 8 ADMA (Xilinx zDMA) channels +- 2 SD Controllers +- OCM (256KB of On Chip Memory) +- XRAM (4MB of on chip Accelerator RAM) +- DDR memory +- BBRAM (36 bytes of Battery-backed RAM) +- eFUSE (3072 bytes of one-time field-programmable bit array) + +QEMU does not yet model any other devices, including the PL and the AI Engine. + +Other differences between the hardware and the QEMU model: + +- QEMU allows the amount of DDR memory provided to be specified with the + ``-m`` argument. If a DTB is provided on the command line then QEMU will + edit it to include suitable entries describing the Versal DDR memory ranges. + +- QEMU provides 8 virtio-mmio virtio transports; these start at + address ``0xa0000000`` and have IRQs from 111 and upwards. + +Running +""""""" +If the user provides an Operating System to be loaded, we expect users +to use the ``-kernel`` command line option. + +Users can load firmware or boot-loaders with the ``-device loader`` options. + +When loading an OS, QEMU generates a DTB and selects an appropriate address +where it gets loaded. This DTB will be passed to the kernel in register x0. + +If there's no ``-kernel`` option, we generate a DTB and place it at 0x1000 +for boot-loaders or firmware to pick it up. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs will have their memory nodes modified to match QEMU's +selected ram_size option before they get passed to the kernel or FW. + +When loading an OS, we turn on QEMU's PSCI implementation with SMC +as the PSCI conduit. When there's no ``-kernel`` option, we assume the user +provides EL3 firmware to handle PSCI. + +A few examples: + +Direct Linux boot of a generic ARM64 upstream Linux kernel: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial mon:stdio -display none \ + -kernel arch/arm64/boot/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0 \ + -drive if=none,index=0,file=hd0.qcow2,id=hd0,snapshot \ + -drive file=qemu_sd.qcow2,if=sd,index=0,snapshot \ + -device virtio-blk-device,drive=hd0 -append root=/dev/vda + +Direct Linux boot of PetaLinux 2019.2: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial mon:stdio -display none \ + -kernel petalinux-v2019.2/Image \ + -append "rdinit=/sbin/init console=ttyAMA0,115200n8 earlycon=pl011,mmio,0xFF000000,115200n8" \ + -net nic,model=cadence_gem,netdev=net0 -netdev user,id=net0 \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Boot PetaLinux 2019.2 via ARM Trusted Firmware (2018.3 because the 2019.2 +version of ATF tries to configure the CCI which we don't model) and U-boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ + -device loader,file=petalinux-v2019.2/u-boot.elf \ + -device loader,addr=0x20000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x40000000 + fdt set /timer clock-frequency <0x3dfd240> + setenv bootargs "rdinit=/sbin/init maxcpus=1 console=ttyAMA0,115200n8 earlycon=pl011,mmio,0xFF000000,115200n8" + booti 20000000 - 40000000 + fdt addr $fdtcontroladdr + +Boot Linux as DOM0 on Xen via U-Boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2019.2/u-boot.elf,cpu-num=0 \ + -device loader,addr=0x30000000,file=linux/2018-04-24/xen \ + -device loader,addr=0x40000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x20000000 + fdt set /timer clock-frequency <0x3dfd240> + fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=/uart@ff000000 dom0_mem=640M bootscrub=0 maxcpus=1 timer_slop=0" + fdt set /chosen xen,dom0-bootargs "rdinit=/sbin/init clk_ignore_unused console=hvc0 maxcpus=1" + fdt mknode /chosen dom0 + fdt set /chosen/dom0 compatible "xen,multiboot-module" + fdt set /chosen/dom0 reg <0x00000000 0x40000000 0x0 0x03100000> + booti 30000000 - 20000000 + +Boot Linux as Dom0 on Xen via ARM Trusted Firmware and U-Boot: + +.. code-block:: bash + + $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + -serial stdio -display none \ + -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ + -device loader,file=petalinux-v2019.2/u-boot.elf \ + -device loader,addr=0x30000000,file=linux/2018-04-24/xen \ + -device loader,addr=0x40000000,file=petalinux-v2019.2/Image \ + -nic user -nic user \ + -device virtio-rng-device,bus=virtio-mmio-bus.0,rng=rng0 \ + -object rng-random,filename=/dev/urandom,id=rng0 + +Run the following at the U-Boot prompt: + +.. code-block:: bash + + Versal> + fdt addr $fdtcontroladdr + fdt move $fdtcontroladdr 0x20000000 + fdt set /timer clock-frequency <0x3dfd240> + fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=/uart@ff000000 dom0_mem=640M bootscrub=0 maxcpus=1 timer_slop=0" + fdt set /chosen xen,dom0-bootargs "rdinit=/sbin/init clk_ignore_unused console=hvc0 maxcpus=1" + fdt mknode /chosen dom0 + fdt set /chosen/dom0 compatible "xen,multiboot-module" + fdt set /chosen/dom0 reg <0x00000000 0x40000000 0x0 0x03100000> + booti 30000000 - 20000000 + +BBRAM File Backend +"""""""""""""""""" +BBRAM can have an optional file backend, which must be a seekable +binary file with a size of 36 bytes or larger. A file with all +binary 0s is a 'blank'. + +To add a file-backend for the BBRAM: + +.. code-block:: bash + + -drive if=pflash,index=0,file=versal-bbram.bin,format=raw + +To use a different index value, N, from default of 0, add: + +.. code-block:: bash + + -global xlnx,bbram-ctrl.drive-index=N + +eFUSE File Backend +"""""""""""""""""" +eFUSE can have an optional file backend, which must be a seekable +binary file with a size of 3072 bytes or larger. A file with all +binary 0s is a 'blank'. + +To add a file-backend for the eFUSE: + +.. code-block:: bash + + -drive if=pflash,index=1,file=versal-efuse.bin,format=raw + +To use a different index value, N, from default of 1, add: + +.. code-block:: bash + + -global xlnx,efuse.drive-index=N + +.. warning:: + In actual physical Versal, BBRAM and eFUSE contain sensitive data. + The QEMU device models do **not** encrypt nor obfuscate any data + when holding them in models' memory or when writing them to their + file backends. + + Thus, a file backend should be used with caution, and 'format=luks' + is highly recommended (albeit with usage complexity). + + Better yet, do not use actual product data when running guest image + on this Xilinx Versal Virt board. diff --git a/docs/system/arm/xscale.rst b/docs/system/arm/xscale.rst new file mode 100644 index 00000000..d2d5949e --- /dev/null +++ b/docs/system/arm/xscale.rst @@ -0,0 +1,35 @@ +Sharp XScale-based PDA models (``akita``, ``borzoi``, ``spitz``, ``terrier``, ``tosa``) +======================================================================================= + +The Sharp Zaurus are PDAs based on XScale, able to run Linux ('SL series'). + +The SL-6000 (\"Tosa\"), released in 2005, uses a PXA255 System-on-chip. + +The SL-C3000 (\"Spitz\"), SL-C1000 (\"Akita\"), SL-C3100 (\"Borzoi\") and +SL-C3200 (\"Terrier\") use a PXA270. + +The clamshell PDA models emulation includes the following peripherals: + +- Intel PXA255/PXA270 System-on-chip (ARMv5TE core) + +- NAND Flash memory - not in \"Tosa\" + +- IBM/Hitachi DSCM microdrive in a PXA PCMCIA slot - not in \"Akita\" + +- On-chip OHCI USB controller - not in \"Tosa\" + +- On-chip LCD controller + +- On-chip Real Time Clock + +- TI ADS7846 touchscreen controller on SSP bus + +- Maxim MAX1111 analog-digital converter on |I2C| bus + +- GPIO-connected keyboard controller and LEDs + +- Secure Digital card connected to PXA MMC/SD host + +- Three on-chip UARTs + +- WM8750 audio CODEC on |I2C| and |I2S| busses diff --git a/docs/system/authz.rst b/docs/system/authz.rst new file mode 100644 index 00000000..55b7315e --- /dev/null +++ b/docs/system/authz.rst @@ -0,0 +1,257 @@ +.. _client authorization: + +Client authorization +-------------------- + +When configuring a QEMU network backend with either TLS certificates or SASL +authentication, access will be granted if the client successfully proves +their identity. If the authorization identity database is scoped to the QEMU +client this may be sufficient. It is common, however, for the identity database +to be much broader and thus authentication alone does not enable sufficient +access control. In this case QEMU provides a flexible system for enforcing +finer grained authorization on clients post-authentication. + +Identity providers +~~~~~~~~~~~~~~~~~~ + +At the time of writing there are two authentication frameworks used by QEMU +that emit an identity upon completion. + + * TLS x509 certificate distinguished name. + + When configuring the QEMU backend as a network server with TLS, there + are a choice of credentials to use. The most common scenario is to utilize + x509 certificates. The simplest configuration only involves issuing + certificates to the servers, allowing the client to avoid a MITM attack + against their intended server. + + It is possible, however, to enable mutual verification by requiring that + the client provide a certificate to the server to prove its own identity. + This is done by setting the property ``verify-peer=yes`` on the + ``tls-creds-x509`` object, which is in fact the default. + + When peer verification is enabled, client will need to be issued with a + certificate by the same certificate authority as the server. If this is + still not sufficiently strong access control the Distinguished Name of + the certificate can be used as an identity in the QEMU authorization + framework. + + * SASL username. + + When configuring the QEMU backend as a network server with SASL, upon + completion of the SASL authentication mechanism, a username will be + provided. The format of this username will vary depending on the choice + of mechanism configured for SASL. It might be a simple UNIX style user + ``joebloggs``, while if using Kerberos/GSSAPI it can have a realm + attached ``joebloggs@QEMU.ORG``. Whatever format the username is presented + in, it can be used with the QEMU authorization framework. + +Authorization drivers +~~~~~~~~~~~~~~~~~~~~~ + +The QEMU authorization framework is a general purpose design with choice of +user customizable drivers. These are provided as objects that can be +created at startup using the ``-object`` argument, or at runtime using the +``object_add`` monitor command. + +Simple +^^^^^^ + +This authorization driver provides a simple mechanism for granting access +based on an exact match against a single identity. This is useful when it is +known that only a single client is to be allowed access. + +A possible use case would be when configuring QEMU for an incoming live +migration. It is known exactly which source QEMU the migration is expected +to arrive from. The x509 certificate associated with this source QEMU would +thus be used as the identity to match against. Alternatively if the virtual +machine is dedicated to a specific tenant, then the VNC server would be +configured with SASL and the username of only that tenant listed. + +To create an instance of this driver via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-simple", + "id": "authz0", + "identity": "fred" + } + } + + +Or via the command line + +:: + + -object authz-simple,id=authz0,identity=fred + + +List +^^^^ + +In some network backends it will be desirable to grant access to a range of +clients. This authorization driver provides a list mechanism for granting +access by matching identities against a list of permitted one. Each match +rule has an associated policy and a catch all policy applies if no rule +matches. The match can either be done as an exact string comparison, or can +use the shell-like glob syntax, which allows for use of wildcards. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-list", + "id": "authz0", + "rules": [ + { "match": "fred", "policy": "allow", "format": "exact" }, + { "match": "bob", "policy": "allow", "format": "exact" }, + { "match": "danb", "policy": "deny", "format": "exact" }, + { "match": "dan*", "policy": "allow", "format": "glob" } + ], + "policy": "deny" + } + } + + +Due to the way this driver requires setting nested properties, creating +it on the command line will require use of the JSON syntax for ``-object``. +In most cases, however, the next driver will be more suitable. + +List file +^^^^^^^^^ + +This is a variant on the previous driver that allows for a more dynamic +access control policy by storing the match rules in a standalone file +that can be reloaded automatically upon change. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-list-file", + "id": "authz0", + "filename": "/etc/qemu/myvm-vnc.acl", + "refresh": true + } + } + + +If ``refresh`` is ``yes``, inotify is used to monitor for changes +to the file and auto-reload the rules. + +The ``myvm-vnc.acl`` file should contain the match rules in a format that +closely matches the previous driver: + +:: + + { + "rules": [ + { "match": "fred", "policy": "allow", "format": "exact" }, + { "match": "bob", "policy": "allow", "format": "exact" }, + { "match": "danb", "policy": "deny", "format": "exact" }, + { "match": "dan*", "policy": "allow", "format": "glob" } + ], + "policy": "deny" + } + + +The object can be created on the command line using + +:: + + -object authz-list-file,id=authz0,\ + filename=/etc/qemu/myvm-vnc.acl,refresh=on + + +PAM +^^^ + +In some scenarios it might be desirable to integrate with authorization +mechanisms that are implemented outside of QEMU. In order to allow maximum +flexibility, QEMU provides a driver that uses the ``PAM`` framework. + +To create an instance of this class via QMP: + +:: + + { + "execute": "object-add", + "arguments": { + "qom-type": "authz-pam", + "id": "authz0", + "parameters": { + "service": "qemu-vnc-tls" + } + } + } + + +The driver only uses the PAM "account" verification +subsystem. The above config would require a config +file /etc/pam.d/qemu-vnc-tls. For a simple file +lookup it would contain + +:: + + account requisite pam_listfile.so item=user sense=allow \ + file=/etc/qemu/vnc.allow + + +The external file would then contain a list of usernames. +If x509 cert was being used as the username, a suitable +entry would match the distinguished name: + +:: + + CN=laptop.berrange.com,O=Berrange Home,L=London,ST=London,C=GB + + +On the command line it can be created using + +:: + + -object authz-pam,id=authz0,service=qemu-vnc-tls + + +There are a variety of PAM plugins that can be used which are not illustrated +here, and it is possible to implement brand new plugins using the PAM API. + + +Connecting backends +~~~~~~~~~~~~~~~~~~~ + +The authorization driver is created using the ``-object`` argument and then +needs to be associated with a network service. The authorization driver object +will be given a unique ID that needs to be referenced. + +The property to set in the network service will vary depending on the type of +identity to verify. By convention, any network server backend that uses TLS +will provide ``tls-authz`` property, while any server using SASL will provide +a ``sasl-authz`` property. + +Thus an example using SASL and authorization for the VNC server would look +like: + +:: + + $QEMU --object authz-simple,id=authz0,identity=fred \ + --vnc 0.0.0.0:1,sasl,sasl-authz=authz0 + +While to validate both the x509 certificate and SASL username: + +:: + + echo "CN=laptop.qemu.org,O=QEMU Project,L=London,ST=London,C=GB" >> tls.acl + $QEMU --object authz-simple,id=authz0,identity=fred \ + --object authz-list-file,id=authz1,filename=tls.acl \ + --object tls-creds-x509,id=tls0,dir=/etc/qemu/tls,verify-peer=yes \ + --vnc 0.0.0.0:1,sasl,sasl-authz=auth0,tls-creds=tls0,tls-authz=authz1 diff --git a/docs/system/barrier.rst b/docs/system/barrier.rst new file mode 100644 index 00000000..155d7d29 --- /dev/null +++ b/docs/system/barrier.rst @@ -0,0 +1,44 @@ +QEMU Barrier Client +=================== + +Generally, mouse and keyboard are grabbed through the QEMU video +interface emulation. + +But when we want to use a video graphic adapter via a PCI passthrough +there is no way to provide the keyboard and mouse inputs to the VM +except by plugging a second set of mouse and keyboard to the host +or by installing a KVM software in the guest OS. + +The QEMU Barrier client avoids this by implementing directly the Barrier +protocol into QEMU. + +`Barrier <https://github.com/debauchee/barrier>`__ +is a KVM (Keyboard-Video-Mouse) software forked from Symless's +synergy 1.9 codebase. + +This protocol is enabled by adding an input-barrier object to QEMU. + +Syntax:: + + input-barrier,id=<object-id>,name=<guest display name> + [,server=<barrier server address>][,port=<barrier server port>] + [,x-origin=<x-origin>][,y-origin=<y-origin>] + [,width=<width>][,height=<height>] + +The object can be added on the QEMU command line, for instance with:: + + -object input-barrier,id=barrier0,name=VM-1 + +where VM-1 is the name the display configured in the Barrier server +on the host providing the mouse and the keyboard events. + +by default ``<barrier server address>`` is ``localhost``, +``<port>`` is ``24800``, ``<x-origin>`` and ``<y-origin>`` are set to ``0``, +``<width>`` and ``<height>`` to ``1920`` and ``1080``. + +If the Barrier server is stopped QEMU needs to be reconnected manually, +by removing and re-adding the input-barrier object, for instance +with the help of the HMP monitor:: + + (qemu) object_del barrier0 + (qemu) object_add input-barrier,id=barrier0,name=VM-1 diff --git a/docs/system/bootindex.rst b/docs/system/bootindex.rst new file mode 100644 index 00000000..8b057f81 --- /dev/null +++ b/docs/system/bootindex.rst @@ -0,0 +1,76 @@ +Managing device boot order with bootindex properties +==================================================== + +QEMU can tell QEMU-aware guest firmware (like the x86 PC BIOS) +which order it should look for a bootable OS on which devices. +A simple way to set this order is to use the ``-boot order=`` option, +but you can also do this more flexibly, by setting a ``bootindex`` +property on the individual block or net devices you specify +on the QEMU command line. + +The ``bootindex`` properties are used to determine the order in which +firmware will consider devices for booting the guest OS. If the +``bootindex`` property is not set for a device, it gets the lowest +boot priority. There is no particular order in which devices with no +``bootindex`` property set will be considered for booting, but they +will still be bootable. + +Some guest machine types (for instance the s390x machines) do +not support ``-boot order=``; on those machines you must always +use ``bootindex`` properties. + +There is no way to set a ``bootindex`` property if you are using +a short-form option like ``-hda`` or ``-cdrom``, so to use +``bootindex`` properties you will need to expand out those options +into long-form ``-drive`` and ``-device`` option pairs. + +Example +------- + +Let's assume we have a QEMU machine with two NICs (virtio, e1000) and two +disks (IDE, virtio): + +.. parsed-literal:: + + |qemu_system| -drive file=disk1.img,if=none,id=disk1 \\ + -device ide-hd,drive=disk1,bootindex=4 \\ + -drive file=disk2.img,if=none,id=disk2 \\ + -device virtio-blk-pci,drive=disk2,bootindex=3 \\ + -netdev type=user,id=net0 \\ + -device virtio-net-pci,netdev=net0,bootindex=2 \\ + -netdev type=user,id=net1 \\ + -device e1000,netdev=net1,bootindex=1 + +Given the command above, firmware should try to boot from the e1000 NIC +first. If this fails, it should try the virtio NIC next; if this fails +too, it should try the virtio disk, and then the IDE disk. + +Limitations +----------- + +Some firmware has limitations on which devices can be considered for +booting. For instance, the PC BIOS boot specification allows only one +disk to be bootable. If boot from disk fails for some reason, the BIOS +won't retry booting from other disk. It can still try to boot from +floppy or net, though. + +Sometimes, firmware cannot map the device path QEMU wants firmware to +boot from to a boot method. It doesn't happen for devices the firmware +can natively boot from, but if firmware relies on an option ROM for +booting, and the same option ROM is used for booting from more then one +device, the firmware may not be able to ask the option ROM to boot from +a particular device reliably. For instance with the PC BIOS, if a SCSI HBA +has three bootable devices target1, target3, target5 connected to it, +the option ROM will have a boot method for each of them, but it is not +possible to map from boot method back to a specific target. This is a +shortcoming of the PC BIOS boot specification. + +Mixing bootindex and boot order parameters +------------------------------------------ + +Note that it does not make sense to use the bootindex property together +with the ``-boot order=...`` (or ``-boot once=...``) parameter. The guest +firmware implementations normally either support the one or the other, +but not both parameters at the same time. Mixing them will result in +undefined behavior, and thus the guest firmware will likely not boot +from the expected devices. diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst new file mode 100644 index 00000000..0c490dbd --- /dev/null +++ b/docs/system/confidential-guest-support.rst @@ -0,0 +1,44 @@ +Confidential Guest Support +========================== + +Traditionally, hypervisors such as QEMU have complete access to a +guest's memory and other state, meaning that a compromised hypervisor +can compromise any of its guests. A number of platforms have added +mechanisms in hardware and/or firmware which give guests at least some +protection from a compromised hypervisor. This is obviously +especially desirable for public cloud environments. + +These mechanisms have different names and different modes of +operation, but are often referred to as Secure Guests or Confidential +Guests. We use the term "Confidential Guest Support" to distinguish +this from other aspects of guest security (such as security against +attacks from other guests, or from network sources). + +Running a Confidential Guest +---------------------------- + +To run a confidential guest you need to add two command line parameters: + +1. Use ``-object`` to create a "confidential guest support" object. The + type and parameters will vary with the specific mechanism to be + used +2. Set the ``confidential-guest-support`` machine parameter to the ID of + the object from (1). + +Example (for AMD SEV):: + + qemu-system-x86_64 \ + <other parameters> \ + -machine ...,confidential-guest-support=sev0 \ + -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 + +Supported mechanisms +-------------------- + +Currently supported confidential guest mechanisms are: + +* AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`) +* POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`) +* s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`) + +Other mechanisms may be supported in future. diff --git a/docs/system/cpu-hotplug.rst b/docs/system/cpu-hotplug.rst new file mode 100644 index 00000000..015ce2b6 --- /dev/null +++ b/docs/system/cpu-hotplug.rst @@ -0,0 +1,142 @@ +=================== +Virtual CPU hotplug +=================== + +A complete example of vCPU hotplug (and hot-unplug) using QMP +``device_add`` and ``device_del``. + +vCPU hotplug +------------ + +(1) Launch QEMU as follows (note that the "maxcpus" is mandatory to + allow vCPU hotplug):: + + $ qemu-system-x86_64 -display none -no-user-config -m 2048 \ + -nodefaults -monitor stdio -machine pc,accel=kvm,usb=off \ + -smp 1,maxcpus=2 -cpu IvyBridge-IBRS \ + -qmp unix:/tmp/qmp-sock,server=on,wait=off + +(2) Run 'qmp-shell' (located in the source tree, under: "scripts/qmp/) + to connect to the just-launched QEMU:: + + $> ./qmp-shell -p -v /tmp/qmp-sock + [...] + (QEMU) + +(3) Find out which CPU types could be plugged, and into which sockets:: + + (QEMU) query-hotpluggable-cpus + { + "execute": "query-hotpluggable-cpus", + "arguments": {} + } + { + "return": [ + { + "type": "IvyBridge-IBRS-x86_64-cpu", + "vcpus-count": 1, + "props": { + "socket-id": 1, + "core-id": 0, + "thread-id": 0 + } + }, + { + "qom-path": "/machine/unattached/device[0]", + "type": "IvyBridge-IBRS-x86_64-cpu", + "vcpus-count": 1, + "props": { + "socket-id": 0, + "core-id": 0, + "thread-id": 0 + } + } + ] + } + (QEMU) + +(4) The ``query-hotpluggable-cpus`` command returns an object for CPUs + that are present (containing a "qom-path" member) or which may be + hot-plugged (no "qom-path" member). From its output in step (3), we + can see that ``IvyBridge-IBRS-x86_64-cpu`` is present in socket 0, + while hot-plugging a CPU into socket 1 requires passing the listed + properties to QMP ``device_add``:: + + (QEMU) device_add id=cpu-2 driver=IvyBridge-IBRS-x86_64-cpu socket-id=1 core-id=0 thread-id=0 + { + "execute": "device_add", + "arguments": { + "socket-id": 1, + "driver": "IvyBridge-IBRS-x86_64-cpu", + "id": "cpu-2", + "core-id": 0, + "thread-id": 0 + } + } + { + "return": {} + } + (QEMU) + +(5) Optionally, run QMP ``query-cpus-fast`` for some details about the + vCPUs:: + + (QEMU) query-cpus-fast + { + "execute": "query-cpus-fast", + "arguments": {} + } + { + "return": [ + { + "qom-path": "/machine/unattached/device[0]", + "target": "x86_64", + "thread-id": 11534, + "cpu-index": 0, + "props": { + "socket-id": 0, + "core-id": 0, + "thread-id": 0 + }, + "arch": "x86" + }, + { + "qom-path": "/machine/peripheral/cpu-2", + "target": "x86_64", + "thread-id": 12106, + "cpu-index": 1, + "props": { + "socket-id": 1, + "core-id": 0, + "thread-id": 0 + }, + "arch": "x86" + } + ] + } + (QEMU) + +vCPU hot-unplug +--------------- + +From the 'qmp-shell', invoke the QMP ``device_del`` command:: + + (QEMU) device_del id=cpu-2 + { + "execute": "device_del", + "arguments": { + "id": "cpu-2" + } + } + { + "return": {} + } + (QEMU) + +.. note:: + vCPU hot-unplug requires guest cooperation; so the ``device_del`` + command above does not guarantee vCPU removal -- it's a "request to + unplug". At this point, the guest will get a System Control + Interrupt (SCI) and calls the ACPI handler for the affected vCPU + device. Then the guest kernel will bring the vCPU offline and tell + QEMU to unplug it. diff --git a/docs/system/cpu-models-mips.rst.inc b/docs/system/cpu-models-mips.rst.inc new file mode 100644 index 00000000..02cc4bb8 --- /dev/null +++ b/docs/system/cpu-models-mips.rst.inc @@ -0,0 +1,111 @@ +Supported CPU model configurations on MIPS hosts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU supports variety of MIPS CPU models: + +Supported CPU models for MIPS32 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on MIPS32 hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``mips32r6-generic`` + MIPS32 Processor (Release 6, 2015) + +``P5600`` + MIPS32 Processor (P5600, 2014) + +``M14K``, ``M14Kc`` + MIPS32 Processor (M14K, 2009) + +``74Kf`` + MIPS32 Processor (74K, 2007) + +``34Kf`` + MIPS32 Processor (34K, 2006) + +``24Kc``, ``24KEc``, ``24Kf`` + MIPS32 Processor (24K, 2003) + +``4Kc``, ``4Km``, ``4KEcR1``, ``4KEmR1``, ``4KEc``, ``4KEm`` + MIPS32 Processor (4K, 1999) + + +Supported CPU models for MIPS64 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on MIPS64 hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``I6400`` + MIPS64 Processor (Release 6, 2014) + +``Loongson-2E`` + MIPS64 Processor (Loongson 2, 2006) + +``Loongson-2F`` + MIPS64 Processor (Loongson 2, 2008) + +``Loongson-3A1000`` + MIPS64 Processor (Loongson 3, 2010) + +``Loongson-3A4000`` + MIPS64 Processor (Loongson 3, 2018) + +``mips64dspr2`` + MIPS64 Processor (Release 2, 2006) + +``MIPS64R2-generic``, ``5KEc``, ``5KEf`` + MIPS64 Processor (Release 2, 2002) + +``20Kc`` + MIPS64 Processor (20K, 2000 + +``5Kc``, ``5Kf`` + MIPS64 Processor (5K, 1999) + +``VR5432`` + MIPS64 Processor (VR, 1998) + +``R4000`` + MIPS64 Processor (MIPS III, 1991) + + +Supported CPU models for nanoMIPS hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are supported for use on nanoMIPS hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``I7200`` + MIPS I7200 (nanoMIPS, 2018) + +Preferred CPU models for MIPS hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on different MIPS hosts: + +``MIPS III`` + R4000 + +``MIPS32R2`` + 34Kf + +``MIPS64R6`` + I6400 + +``nanoMIPS`` + I7200 + diff --git a/docs/system/cpu-models-x86-abi.csv b/docs/system/cpu-models-x86-abi.csv new file mode 100644 index 00000000..f3f3b60b --- /dev/null +++ b/docs/system/cpu-models-x86-abi.csv @@ -0,0 +1,67 @@ +Model,baseline,v2,v3,v4 +486-v1,,,, +Broadwell-v1,✅,✅,✅, +Broadwell-v2,✅,✅,✅, +Broadwell-v3,✅,✅,✅, +Broadwell-v4,✅,✅,✅, +Cascadelake-Server-v1,✅,✅,✅,✅ +Cascadelake-Server-v2,✅,✅,✅,✅ +Cascadelake-Server-v3,✅,✅,✅,✅ +Cascadelake-Server-v4,✅,✅,✅,✅ +Conroe-v1,✅,,, +Cooperlake-v1,✅,✅,✅,✅ +Denverton-v1,✅,✅,, +Denverton-v2,✅,✅,, +Dhyana-v1,✅,✅,✅, +EPYC-Milan-v1,✅,✅,✅, +EPYC-Rome-v1,✅,✅,✅, +EPYC-Rome-v2,✅,✅,✅, +EPYC-v1,✅,✅,✅, +EPYC-v2,✅,✅,✅, +EPYC-v3,✅,✅,✅, +Haswell-v1,✅,✅,✅, +Haswell-v2,✅,✅,✅, +Haswell-v3,✅,✅,✅, +Haswell-v4,✅,✅,✅, +Icelake-Client-v1,✅,✅,✅, +Icelake-Client-v2,✅,✅,✅, +Icelake-Server-v1,✅,✅,✅,✅ +Icelake-Server-v2,✅,✅,✅,✅ +Icelake-Server-v3,✅,✅,✅,✅ +Icelake-Server-v4,✅,✅,✅,✅ +IvyBridge-v1,✅,✅,, +IvyBridge-v2,✅,✅,, +KnightsMill-v1,✅,✅,✅, +Nehalem-v1,✅,✅,, +Nehalem-v2,✅,✅,, +Opteron_G1-v1,✅,,, +Opteron_G2-v1,✅,,, +Opteron_G3-v1,✅,,, +Opteron_G4-v1,✅,✅,, +Opteron_G5-v1,✅,✅,, +Penryn-v1,✅,,, +SandyBridge-v1,✅,✅,, +SandyBridge-v2,✅,✅,, +Skylake-Client-v1,✅,✅,✅, +Skylake-Client-v2,✅,✅,✅, +Skylake-Client-v3,✅,✅,✅, +Skylake-Server-v1,✅,✅,✅,✅ +Skylake-Server-v2,✅,✅,✅,✅ +Skylake-Server-v3,✅,✅,✅,✅ +Skylake-Server-v4,✅,✅,✅,✅ +Snowridge-v1,✅,✅,, +Snowridge-v2,✅,✅,, +Westmere-v1,✅,✅,, +Westmere-v2,✅,✅,, +athlon-v1,,,, +core2duo-v1,✅,,, +coreduo-v1,,,, +kvm32-v1,,,, +kvm64-v1,✅,,, +n270-v1,,,, +pentium-v1,,,, +pentium2-v1,,,, +pentium3-v1,,,, +phenom-v1,✅,,, +qemu32-v1,,,, +qemu64-v1,✅,,, diff --git a/docs/system/cpu-models-x86.rst.inc b/docs/system/cpu-models-x86.rst.inc new file mode 100644 index 00000000..7f6368f9 --- /dev/null +++ b/docs/system/cpu-models-x86.rst.inc @@ -0,0 +1,440 @@ +Recommendations for KVM CPU model configuration on x86 hosts +============================================================ + +The information that follows provides recommendations for configuring +CPU models on x86 hosts. The goals are to maximise performance, while +protecting guest OS against various CPU hardware flaws, and optionally +enabling live migration between hosts with heterogeneous CPU models. + + +Two ways to configure CPU models with QEMU / KVM +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +(1) **Host passthrough** + + This passes the host CPU model features, model, stepping, exactly to + the guest. Note that KVM may filter out some host CPU model features + if they cannot be supported with virtualization. Live migration is + unsafe when this mode is used as libvirt / QEMU cannot guarantee a + stable CPU is exposed to the guest across hosts. This is the + recommended CPU to use, provided live migration is not required. + +(2) **Named model** + + QEMU comes with a number of predefined named CPU models, that + typically refer to specific generations of hardware released by + Intel and AMD. These allow the guest VMs to have a degree of + isolation from the host CPU, allowing greater flexibility in live + migrating between hosts with differing hardware. @end table + +In both cases, it is possible to optionally add or remove individual CPU +features, to alter what is presented to the guest by default. + +Libvirt supports a third way to configure CPU models known as "Host +model". This uses the QEMU "Named model" feature, automatically picking +a CPU model that is similar the host CPU, and then adding extra features +to approximate the host model as closely as possible. This does not +guarantee the CPU family, stepping, etc will precisely match the host +CPU, as they would with "Host passthrough", but gives much of the +benefit of passthrough, while making live migration safe. + + +ABI compatibility levels for CPU models +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The x86_64 architecture has a number of `ABI compatibility levels`_ +defined. Traditionally most operating systems and toolchains would +only target the original baseline ABI. It is expected that in +future OS and toolchains are likely to target newer ABIs. The +table that follows illustrates which ABI compatibility levels +can be satisfied by the QEMU CPU models. Note that the table only +lists the long term stable CPU model versions (eg Haswell-v4). +In addition to what is listed, there are also many CPU model +aliases which resolve to a different CPU model version, +depending on the machine type is in use. + +.. _ABI compatibility levels: https://gitlab.com/x86-psABIs/x86-64-ABI/ + +.. csv-table:: x86-64 ABI compatibility levels + :file: cpu-models-x86-abi.csv + :widths: 40,15,15,15,15 + :header-rows: 2 + + +Preferred CPU models for Intel x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on Intel hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``Cascadelake-Server``, ``Cascadelake-Server-noTSX`` + Intel Xeon Processor (Cascade Lake, 2019), with "stepping" levels 6 + or 7 only. (The Cascade Lake Xeon processor with *stepping 5 is + vulnerable to MDS variants*.) + +``Skylake-Server``, ``Skylake-Server-IBRS``, ``Skylake-Server-IBRS-noTSX`` + Intel Xeon Processor (Skylake, 2016) + +``Skylake-Client``, ``Skylake-Client-IBRS``, ``Skylake-Client-noTSX-IBRS}`` + Intel Core Processor (Skylake, 2015) + +``Broadwell``, ``Broadwell-IBRS``, ``Broadwell-noTSX``, ``Broadwell-noTSX-IBRS`` + Intel Core Processor (Broadwell, 2014) + +``Haswell``, ``Haswell-IBRS``, ``Haswell-noTSX``, ``Haswell-noTSX-IBRS`` + Intel Core Processor (Haswell, 2013) + +``IvyBridge``, ``IvyBridge-IBR`` + Intel Xeon E3-12xx v2 (Ivy Bridge, 2012) + +``SandyBridge``, ``SandyBridge-IBRS`` + Intel Xeon E312xx (Sandy Bridge, 2011) + +``Westmere``, ``Westmere-IBRS`` + Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010) + +``Nehalem``, ``Nehalem-IBRS`` + Intel Core i7 9xx (Nehalem Class Core i7, 2008) + +``Penryn`` + Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007) + +``Conroe`` + Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006) + + +Important CPU features for Intel x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following are important CPU features that should be used on Intel +x86 hosts, when available in the host CPU. Some of them require explicit +configuration to enable, as they are not included by default in some, or +all, of the named CPU models listed above. In general all of these +features are included if using "Host passthrough" or "Host model". + +``pcid`` + Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. + + Included by default in Haswell, Broadwell & Skylake Intel CPU models. + + Should be explicitly turned on for Westmere, SandyBridge, and + IvyBridge Intel CPU models. Note that some desktop/mobile Westmere + CPUs cannot support this feature. + +``spec-ctrl`` + Required to enable the Spectre v2 (CVE-2017-5715) fix. + + Included by default in Intel CPU models with -IBRS suffix. + + Must be explicitly turned on for Intel CPU models without -IBRS + suffix. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``stibp`` + Required to enable stronger Spectre v2 (CVE-2017-5715) fixes in some + operating systems. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it can + be used for guest CPUs. + +``ssbd`` + Required to enable the CVE-2018-3639 fix. + + Not included by default in any Intel CPU model. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``pdpe1gb`` + Recommended to allow guest OS to use 1GB size pages. + + Not included by default in any Intel CPU model. + + Should be explicitly turned on for all Intel CPU models. + + Note that not all CPU hardware will support this feature. + +``md-clear`` + Required to confirm the MDS (CVE-2018-12126, CVE-2018-12127, + CVE-2018-12130, CVE-2019-11091) fixes. + + Not included by default in any Intel CPU model. + + Must be explicitly turned on for all Intel CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``mds-no`` + Recommended to inform the guest OS that the host is *not* vulnerable + to any of the MDS variants ([MFBDS] CVE-2018-12130, [MLPDS] + CVE-2018-12127, [MSBDS] CVE-2018-12126). + + This is an MSR (Model-Specific Register) feature rather than a CPUID feature, + so it will not appear in the Linux ``/proc/cpuinfo`` in the host or + guest. Instead, the host kernel uses it to populate the MDS + vulnerability file in ``sysfs``. + + So it should only be enabled for VMs if the host reports @code{Not + affected} in the ``/sys/devices/system/cpu/vulnerabilities/mds`` file. + +``taa-no`` + Recommended to inform that the guest that the host is ``not`` + vulnerable to CVE-2019-11135, TSX Asynchronous Abort (TAA). + + This too is an MSR feature, so it does not show up in the Linux + ``/proc/cpuinfo`` in the host or guest. + + It should only be enabled for VMs if the host reports ``Not affected`` + in the ``/sys/devices/system/cpu/vulnerabilities/tsx_async_abort`` + file. + +``tsx-ctrl`` + Recommended to inform the guest that it can disable the Intel TSX + (Transactional Synchronization Extensions) feature; or, if the + processor is vulnerable, use the Intel VERW instruction (a + processor-level instruction that performs checks on memory access) as + a mitigation for the TAA vulnerability. (For details, refer to + Intel's `deep dive into MDS + <https://software.intel.com/security-software-guidance/insights/deep-dive-intel-analysis-microarchitectural-data-sampling>`_.) + + Expose this to the guest OS if and only if: (a) the host has TSX + enabled; *and* (b) the guest has ``rtm`` CPU flag enabled. + + By disabling TSX, KVM-based guests can avoid paying the price of + mitigating TSX-based attacks. + + Note that ``tsx-ctrl`` too is an MSR feature, so it does not show + up in the Linux ``/proc/cpuinfo`` in the host or guest. + + To validate that Intel TSX is indeed disabled for the guest, there are + two ways: (a) check for the *absence* of ``rtm`` in the guest's + ``/proc/cpuinfo``; or (b) the + ``/sys/devices/system/cpu/vulnerabilities/tsx_async_abort`` file in + the guest should report ``Mitigation: TSX disabled``. + + +Preferred CPU models for AMD x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPU models are preferred for use on AMD hosts. +Administrators / applications are recommended to use the CPU model that +matches the generation of the host CPUs in use. In a deployment with a +mixture of host CPU models between machines, if live migration +compatibility is required, use the newest CPU model that is compatible +across all desired hosts. + +``EPYC``, ``EPYC-IBPB`` + AMD EPYC Processor (2017) + +``Opteron_G5`` + AMD Opteron 63xx class CPU (2012) + +``Opteron_G4`` + AMD Opteron 62xx class CPU (2011) + +``Opteron_G3`` + AMD Opteron 23xx (Gen 3 Class Opteron, 2009) + +``Opteron_G2`` + AMD Opteron 22xx (Gen 2 Class Opteron, 2006) + +``Opteron_G1`` + AMD Opteron 240 (Gen 1 Class Opteron, 2004) + + +Important CPU features for AMD x86 hosts +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following are important CPU features that should be used on AMD x86 +hosts, when available in the host CPU. Some of them require explicit +configuration to enable, as they are not included by default in some, or +all, of the named CPU models listed above. In general all of these +features are included if using "Host passthrough" or "Host model". + +``ibpb`` + Required to enable the Spectre v2 (CVE-2017-5715) fix. + + Included by default in AMD CPU models with -IBPB suffix. + + Must be explicitly turned on for AMD CPU models without -IBPB suffix. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``stibp`` + Required to enable stronger Spectre v2 (CVE-2017-5715) fixes in some + operating systems. + + Must be explicitly turned on for all AMD CPU models. + + Requires the host CPU microcode to support this feature before it + can be used for guest CPUs. + +``virt-ssbd`` + Required to enable the CVE-2018-3639 fix + + Not included by default in any AMD CPU model. + + Must be explicitly turned on for all AMD CPU models. + + This should be provided to guests, even if amd-ssbd is also provided, + for maximum guest compatibility. + + Note for some QEMU / libvirt versions, this must be force enabled when + when using "Host model", because this is a virtual feature that + doesn't exist in the physical host CPUs. + +``amd-ssbd`` + Required to enable the CVE-2018-3639 fix + + Not included by default in any AMD CPU model. + + Must be explicitly turned on for all AMD CPU models. + + This provides higher performance than ``virt-ssbd`` so should be + exposed to guests whenever available in the host. ``virt-ssbd`` should + none the less also be exposed for maximum guest compatibility as some + kernels only know about ``virt-ssbd``. + +``amd-no-ssb`` + Recommended to indicate the host is not vulnerable CVE-2018-3639 + + Not included by default in any AMD CPU model. + + Future hardware generations of CPU will not be vulnerable to + CVE-2018-3639, and thus the guest should be told not to enable + its mitigations, by exposing amd-no-ssb. This is mutually + exclusive with virt-ssbd and amd-ssbd. + +``pdpe1gb`` + Recommended to allow guest OS to use 1GB size pages + + Not included by default in any AMD CPU model. + + Should be explicitly turned on for all AMD CPU models. + + Note that not all CPU hardware will support this feature. + + +Default x86 CPU models +^^^^^^^^^^^^^^^^^^^^^^ + +The default QEMU CPU models are designed such that they can run on all +hosts. If an application does not wish to do perform any host +compatibility checks before launching guests, the default is guaranteed +to work. + +The default CPU models will, however, leave the guest OS vulnerable to +various CPU hardware flaws, so their use is strongly discouraged. +Applications should follow the earlier guidance to setup a better CPU +configuration, with host passthrough recommended if live migration is +not needed. + +``qemu32``, ``qemu64`` + QEMU Virtual CPU version 2.5+ (32 & 64 bit variants) + +``qemu64`` is used for x86_64 guests and ``qemu32`` is used for i686 +guests, when no ``-cpu`` argument is given to QEMU, or no ``<cpu>`` is +provided in libvirt XML. + +Other non-recommended x86 CPUs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following CPUs models are compatible with most AMD and Intel x86 +hosts, but their usage is discouraged, as they expose a very limited +featureset, which prevents guests having optimal performance. + +``kvm32``, ``kvm64`` + Common KVM processor (32 & 64 bit variants). + + Legacy models just for historical compatibility with ancient QEMU + versions. + +``486``, ``athlon``, ``phenom``, ``coreduo``, ``core2duo``, ``n270``, ``pentium``, ``pentium2``, ``pentium3`` + Various very old x86 CPU models, mostly predating the introduction + of hardware assisted virtualization, that should thus not be + required for running virtual machines. + + +Syntax for configuring CPU models +================================= + +The examples below illustrate the approach to configuring the various +CPU models / features in QEMU and libvirt. + +QEMU command line +^^^^^^^^^^^^^^^^^ + +Host passthrough: + +.. parsed-literal:: + + |qemu_system| -cpu host + +Host passthrough with feature customization: + +.. parsed-literal:: + + |qemu_system| -cpu host,vmx=off,... + +Named CPU models: + +.. parsed-literal:: + + |qemu_system| -cpu Westmere + +Named CPU models with feature customization: + +.. parsed-literal:: + + |qemu_system| -cpu Westmere,pcid=on,... + +Libvirt guest XML +^^^^^^^^^^^^^^^^^ + +Host passthrough:: + + <cpu mode='host-passthrough'/> + +Host passthrough with feature customization:: + + <cpu mode='host-passthrough'> + <feature name="vmx" policy="disable"/> + ... + </cpu> + +Host model:: + + <cpu mode='host-model'/> + +Host model with feature customization:: + + <cpu mode='host-model'> + <feature name="vmx" policy="disable"/> + ... + </cpu> + +Named model:: + + <cpu mode='custom'> + <model name="Westmere"/> + </cpu> + +Named model with feature customization:: + + <cpu mode='custom'> + <model name="Westmere"/> + <feature name="pcid" policy="require"/> + ... + </cpu> diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst new file mode 100644 index 00000000..05060060 --- /dev/null +++ b/docs/system/device-emulation.rst @@ -0,0 +1,95 @@ +.. _device-emulation: + +Device Emulation +---------------- + +QEMU supports the emulation of a large number of devices from +peripherals such network cards and USB devices to integrated systems +on a chip (SoCs). Configuration of these is often a source of +confusion so it helps to have an understanding of some of the terms +used to describes devices within QEMU. + +Common Terms +~~~~~~~~~~~~ + +Device Front End +================ + +A device front end is how a device is presented to the guest. The type +of device presented should match the hardware that the guest operating +system is expecting to see. All devices can be specified with the +``--device`` command line option. Running QEMU with the command line +options ``--device help`` will list all devices it is aware of. Using +the command line ``--device foo,help`` will list the additional +configuration options available for that device. + +A front end is often paired with a back end, which describes how the +host's resources are used in the emulation. + +Device Buses +============ + +Most devices will exist on a BUS of some sort. Depending on the +machine model you choose (``-M foo``) a number of buses will have been +automatically created. In most cases the BUS a device is attached to +can be inferred, for example PCI devices are generally automatically +allocated to the next free address of first PCI bus found. However in +complicated configurations you can explicitly specify what bus +(``bus=ID``) a device is attached to along with its address +(``addr=N``). + +Some devices, for example a PCI SCSI host controller, will add an +additional buses to the system that other devices can be attached to. +A hypothetical chain of devices might look like: + + --device foo,bus=pci.0,addr=0,id=foo + --device bar,bus=foo.0,addr=1,id=baz + +which would be a bar device (with the ID of baz) which is attached to +the first foo bus (foo.0) at address 1. The foo device which provides +that bus is itself is attached to the first PCI bus (pci.0). + + +Device Back End +=============== + +The back end describes how the data from the emulated device will be +processed by QEMU. The configuration of the back end is usually +specific to the class of device being emulated. For example serial +devices will be backed by a ``--chardev`` which can redirect the data +to a file or socket or some other system. Storage devices are handled +by ``--blockdev`` which will specify how blocks are handled, for +example being stored in a qcow2 file or accessing a raw host disk +partition. Back ends can sometimes be stacked to implement features +like snapshots. + +While the choice of back end is generally transparent to the guest, +there are cases where features will not be reported to the guest if +the back end is unable to support it. + +Device Pass Through +=================== + +Device pass through is where the device is actually given access to +the underlying hardware. This can be as simple as exposing a single +USB device on the host system to the guest or dedicating a video card +in a PCI slot to the exclusive use of the guest. + + +Emulated Devices +~~~~~~~~~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + devices/can.rst + devices/ccid.rst + devices/cxl.rst + devices/ivshmem.rst + devices/net.rst + devices/nvme.rst + devices/usb.rst + devices/vhost-user.rst + devices/virtio-pmem.rst + devices/vhost-user-rng.rst + devices/canokey.rst diff --git a/docs/system/device-url-syntax.rst.inc b/docs/system/device-url-syntax.rst.inc new file mode 100644 index 00000000..7dbc525f --- /dev/null +++ b/docs/system/device-url-syntax.rst.inc @@ -0,0 +1,210 @@ + +In addition to using normal file images for the emulated storage +devices, QEMU can also use networked resources such as iSCSI devices. +These are specified using a special URL syntax. + +``iSCSI`` + iSCSI support allows QEMU to access iSCSI resources directly and use + as images for the guest storage. Both disk and cdrom images are + supported. + + Syntax for specifying iSCSI LUNs is + "iscsi://<target-ip>[:<port>]/<target-iqn>/<lun>" + + By default qemu will use the iSCSI initiator-name + 'iqn.2008-11.org.linux-kvm[:<name>]' but this can also be set from + the command line or a configuration file. + + Since version QEMU 2.4 it is possible to specify a iSCSI request + timeout to detect stalled requests and force a reestablishment of the + session. The timeout is specified in seconds. The default is 0 which + means no timeout. Libiscsi 1.15.0 or greater is required for this + feature. + + Example (without authentication): + + .. parsed-literal:: + + |qemu_system| -iscsi initiator-name=iqn.2001-04.com.example:my-initiator \\ + -cdrom iscsi://192.0.2.1/iqn.2001-04.com.example/2 \\ + -drive file=iscsi://192.0.2.1/iqn.2001-04.com.example/1 + + Example (CHAP username/password via URL): + + .. parsed-literal:: + + |qemu_system| -drive file=iscsi://user%password@192.0.2.1/iqn.2001-04.com.example/1 + + Example (CHAP username/password via environment variables): + + .. parsed-literal:: + + LIBISCSI_CHAP_USERNAME="user" \\ + LIBISCSI_CHAP_PASSWORD="password" \\ + |qemu_system| -drive file=iscsi://192.0.2.1/iqn.2001-04.com.example/1 + +``NBD`` + QEMU supports NBD (Network Block Devices) both using TCP protocol as + well as Unix Domain Sockets. With TCP, the default port is 10809. + + Syntax for specifying a NBD device using TCP, in preferred URI form: + "nbd://<server-ip>[:<port>]/[<export>]" + + Syntax for specifying a NBD device using Unix Domain Sockets; + remember that '?' is a shell glob character and may need quoting: + "nbd+unix:///[<export>]?socket=<domain-socket>" + + Older syntax that is also recognized: + "nbd:<server-ip>:<port>[:exportname=<export>]" + + Syntax for specifying a NBD device using Unix Domain Sockets + "nbd:unix:<domain-socket>[:exportname=<export>]" + + Example for TCP + + .. parsed-literal:: + + |qemu_system| --drive file=nbd:192.0.2.1:30000 + + Example for Unix Domain Sockets + + .. parsed-literal:: + + |qemu_system| --drive file=nbd:unix:/tmp/nbd-socket + +``SSH`` + QEMU supports SSH (Secure Shell) access to remote disks. + + Examples: + + .. parsed-literal:: + + |qemu_system| -drive file=ssh://user@host/path/to/disk.img + |qemu_system| -drive file.driver=ssh,file.user=user,file.host=host,file.port=22,file.path=/path/to/disk.img + + Currently authentication must be done using ssh-agent. Other + authentication methods may be supported in future. + +``GlusterFS`` + GlusterFS is a user space distributed file system. QEMU supports the + use of GlusterFS volumes for hosting VM disk images using TCP, Unix + Domain Sockets and RDMA transport protocols. + + Syntax for specifying a VM disk image on GlusterFS volume is + + .. parsed-literal:: + + URI: + gluster[+type]://[host[:port]]/volume/path[?socket=...][,debug=N][,logfile=...] + + JSON: + 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol","path":"a.img","debug":N,"logfile":"...", + "server":[{"type":"tcp","host":"...","port":"..."}, + {"type":"unix","socket":"..."}]}}' + + Example + + .. parsed-literal:: + + URI: + |qemu_system| --drive file=gluster://192.0.2.1/testvol/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log + + JSON: + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img", + "debug":9,"logfile":"/var/log/qemu-gluster.log", + "server":[{"type":"tcp","host":"1.2.3.4","port":24007}, + {"type":"unix","socket":"/var/run/glusterd.socket"}]}}' + |qemu_system| -drive driver=qcow2,file.driver=gluster,file.volume=testvol,file.path=/path/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log, + file.server.0.type=tcp,file.server.0.host=1.2.3.4,file.server.0.port=24007, + file.server.1.type=unix,file.server.1.socket=/var/run/glusterd.socket + + See also http://www.gluster.org. + +``HTTP/HTTPS/FTP/FTPS`` + QEMU supports read-only access to files accessed over http(s) and + ftp(s). + + Syntax using a single filename: + + :: + + <protocol>://[<username>[:<password>]@]<host>/<path> + + where: + + ``protocol`` + 'http', 'https', 'ftp', or 'ftps'. + + ``username`` + Optional username for authentication to the remote server. + + ``password`` + Optional password for authentication to the remote server. + + ``host`` + Address of the remote server. + + ``path`` + Path on the remote server, including any query string. + + The following options are also supported: + + ``url`` + The full URL when passing options to the driver explicitly. + + ``readahead`` + The amount of data to read ahead with each range request to the + remote server. This value may optionally have the suffix 'T', 'G', + 'M', 'K', 'k' or 'b'. If it does not have a suffix, it will be + assumed to be in bytes. The value must be a multiple of 512 bytes. + It defaults to 256k. + + ``sslverify`` + Whether to verify the remote server's certificate when connecting + over SSL. It can have the value 'on' or 'off'. It defaults to + 'on'. + + ``cookie`` + Send this cookie (it can also be a list of cookies separated by + ';') with each outgoing request. Only supported when using + protocols such as HTTP which support cookies, otherwise ignored. + + ``timeout`` + Set the timeout in seconds of the CURL connection. This timeout is + the time that CURL waits for a response from the remote server to + get the size of the image to be downloaded. If not set, the + default timeout of 5 seconds is used. + + Note that when passing options to qemu explicitly, ``driver`` is the + value of <protocol>. + + Example: boot from a remote Fedora 20 live ISO image + + .. parsed-literal:: + + |qemu_system_x86| --drive media=cdrom,file=https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso,readonly + + |qemu_system_x86| --drive media=cdrom,file.driver=http,file.url=http://archives.fedoraproject.org/pub/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso,readonly + + Example: boot from a remote Fedora 20 cloud image using a local + overlay for writes, copy-on-read, and a readahead of 64k + + .. parsed-literal:: + + qemu-img create -f qcow2 -o backing_file='json:{"file.driver":"http",, "file.url":"http://archives.fedoraproject.org/pub/archive/fedora/linux/releases/20/Images/x86_64/Fedora-x86_64-20-20131211.1-sda.qcow2",, "file.readahead":"64k"}' /tmp/Fedora-x86_64-20-20131211.1-sda.qcow2 + + |qemu_system_x86| -drive file=/tmp/Fedora-x86_64-20-20131211.1-sda.qcow2,copy-on-read=on + + Example: boot from an image stored on a VMware vSphere server with a + self-signed certificate using a local overlay for writes, a readahead + of 64k and a timeout of 10 seconds. + + .. parsed-literal:: + + qemu-img create -f qcow2 -o backing_file='json:{"file.driver":"https",, "file.url":"https://user:password@vsphere.example.com/folder/test/test-flat.vmdk?dcPath=Datacenter&dsName=datastore1",, "file.sslverify":"off",, "file.readahead":"64k",, "file.timeout":10}' /tmp/test.qcow2 + + |qemu_system_x86| -drive file=/tmp/test.qcow2 diff --git a/docs/system/devices/can.rst b/docs/system/devices/can.rst new file mode 100644 index 00000000..0af3d991 --- /dev/null +++ b/docs/system/devices/can.rst @@ -0,0 +1,189 @@ +CAN Bus Emulation Support +========================= +The CAN bus emulation provides mechanism to connect multiple +emulated CAN controller chips together by one or multiple CAN busses +(the controller device "canbus" parameter). The individual busses +can be connected to host system CAN API (at this time only Linux +SocketCAN is supported). + +The concept of busses is generic and different CAN controllers +can be implemented. + +The initial submission implemented SJA1000 controller which +is common and well supported by by drivers for the most operating +systems. + +The PCI addon card hardware has been selected as the first CAN +interface to implement because such device can be easily connected +to systems with different CPU architectures (x86, PowerPC, Arm, etc.). + +In 2020, CTU CAN FD controller model has been added as part +of the bachelor thesis of Jan Charvat. This controller is complete +open-source/design/hardware solution. The core designer +of the project is Ondrej Ille, the financial support has been +provided by CTU, and more companies including Volkswagen subsidiaries. + +The project has been initially started in frame of RTEMS GSoC 2013 +slot by Jin Yang under our mentoring The initial idea was to provide generic +CAN subsystem for RTEMS. But lack of common environment for code and RTEMS +testing lead to goal change to provide environment which provides complete +emulated environment for testing and RTEMS GSoC slot has been donated +to work on CAN hardware emulation on QEMU. + +Examples how to use CAN emulation for SJA1000 based boards +---------------------------------------------------------- +When QEMU with CAN PCI support is compiled then one of the next +CAN boards can be selected + +(1) CAN bus Kvaser PCI CAN-S (single SJA1000 channel) board. QEMU startup options:: + + -object can-bus,id=canbus0 + -device kvaser_pci,canbus=canbus0 + +Add "can-host-socketcan" object to connect device to host system CAN bus:: + + -object can-host-socketcan,id=canhost0,if=can0,canbus=canbus0 + +(2) CAN bus PCM-3680I PCI (dual SJA1000 channel) emulation:: + + -object can-bus,id=canbus0 + -device pcm3680_pci,canbus0=canbus0,canbus1=canbus0 + +Another example:: + + -object can-bus,id=canbus0 + -object can-bus,id=canbus1 + -device pcm3680_pci,canbus0=canbus0,canbus1=canbus1 + +(3) CAN bus MIOe-3680 PCI (dual SJA1000 channel) emulation:: + + -device mioe3680_pci,canbus0=canbus0 + +The ''kvaser_pci'' board/device model is compatible with and has been tested with +the ''kvaser_pci'' driver included in mainline Linux kernel. +The tested setup was Linux 4.9 kernel on the host and guest side. + +Example for qemu-system-x86_64:: + + qemu-system-x86_64 -accel kvm -kernel /boot/vmlinuz-4.9.0-4-amd64 \ + -initrd ramdisk.cpio \ + -virtfs local,path=shareddir,security_model=none,mount_tag=shareddir \ + -object can-bus,id=canbus0 \ + -object can-host-socketcan,id=canhost0,if=can0,canbus=canbus0 \ + -device kvaser_pci,canbus=canbus0 \ + -nographic -append "console=ttyS0" + +Example for qemu-system-arm:: + + qemu-system-arm -cpu arm1176 -m 256 -M versatilepb \ + -kernel kernel-qemu-arm1176-versatilepb \ + -hda rpi-wheezy-overlay \ + -append "console=ttyAMA0 root=/dev/sda2 ro init=/sbin/init-overlay" \ + -nographic \ + -virtfs local,path=shareddir,security_model=none,mount_tag=shareddir \ + -object can-bus,id=canbus0 \ + -object can-host-socketcan,id=canhost0,if=can0,canbus=canbus0 \ + -device kvaser_pci,canbus=canbus0,host=can0 \ + +The CAN interface of the host system has to be configured for proper +bitrate and set up. Configuration is not propagated from emulated +devices through bus to the physical host device. Example configuration +for 1 Mbit/s:: + + ip link set can0 type can bitrate 1000000 + ip link set can0 up + +Virtual (host local only) can interface can be used on the host +side instead of physical interface:: + + ip link add dev can0 type vcan + +The CAN interface on the host side can be used to analyze CAN +traffic with "candump" command which is included in "can-utils":: + + candump can0 + +CTU CAN FD support examples +--------------------------- +This open-source core provides CAN FD support. CAN FD drames are +delivered even to the host systems when SocketCAN interface is found +CAN FD capable. + +The PCIe board emulation is provided for now (the device identifier is +ctucan_pci). The default build defines two CTU CAN FD cores +on the board. + +Example how to connect the canbus0-bus (virtual wire) to the host +Linux system (SocketCAN used) and to both CTU CAN FD cores emulated +on the corresponding PCI card expects that host system CAN bus +is setup according to the previous SJA1000 section:: + + qemu-system-x86_64 -enable-kvm -kernel /boot/vmlinuz-4.19.52+ \ + -initrd ramdisk.cpio \ + -virtfs local,path=shareddir,security_model=none,mount_tag=shareddir \ + -vga cirrus \ + -append "console=ttyS0" \ + -object can-bus,id=canbus0-bus \ + -object can-host-socketcan,if=can0,canbus=canbus0-bus,id=canbus0-socketcan \ + -device ctucan_pci,canbus0=canbus0-bus,canbus1=canbus0-bus \ + -nographic + +Setup of CTU CAN FD controller in a guest Linux system:: + + insmod ctucanfd.ko || modprobe ctucanfd + insmod ctucanfd_pci.ko || modprobe ctucanfd_pci + + for ifc in /sys/class/net/can* ; do + if [ -e $ifc/device/vendor ] ; then + if ! grep -q 0x1760 $ifc/device/vendor ; then + continue; + fi + else + continue; + fi + if [ -e $ifc/device/device ] ; then + if ! grep -q 0xff00 $ifc/device/device ; then + continue; + fi + else + continue; + fi + ifc=$(basename $ifc) + /bin/ip link set $ifc type can bitrate 1000000 dbitrate 10000000 fd on + /bin/ip link set $ifc up + done + +The test can run for example:: + + candump can1 + +in the guest system and next commands in the host system for basic CAN:: + + cangen can0 + +for CAN FD without bitrate switch:: + + cangen can0 -f + +and with bitrate switch:: + + cangen can0 -b + +The test can also be run the other way around, generating messages in the +guest system and capturing them in the host system. Other combinations are +also possible. + +Links to other resources +------------------------ + + (1) `CAN related projects at Czech Technical University, Faculty of Electrical Engineering <http://canbus.pages.fel.cvut.cz>`_ + (2) `Repository with development can-pci branch at Czech Technical University <https://gitlab.fel.cvut.cz/canbus/qemu-canbus>`_ + (3) `RTEMS page describing project <https://devel.rtems.org/wiki/Developer/Simulators/QEMU/CANEmulation>`_ + (4) `RTLWS 2015 article about the project and its use with CANopen emulation <http://cmp.felk.cvut.cz/~pisa/can/doc/rtlws-17-pisa-qemu-can.pdf>`_ + (5) `GNU/Linux, CAN and CANopen in Real-time Control Applications Slides from LinuxDays 2017 (include updated RTLWS 2015 content) <https://www.linuxdays.cz/2017/video/Pavel_Pisa-CAN_canopen.pdf>`_ + (6) `Linux SocketCAN utilities <https://github.com/linux-can/can-utils>`_ + (7) `CTU CAN FD project including core VHDL design, Linux driver, test utilities etc. <https://gitlab.fel.cvut.cz/canbus/ctucanfd_ip_core>`_ + (8) `CTU CAN FD Core Datasheet Documentation <http://canbus.pages.fel.cvut.cz/ctucanfd_ip_core/doc/Datasheet.pdf>`_ + (9) `CTU CAN FD Core System Architecture Documentation <http://canbus.pages.fel.cvut.cz/ctucanfd_ip_core/doc/System_Architecture.pdf>`_ + (10) `CTU CAN FD Driver Documentation <https://canbus.pages.fel.cvut.cz/ctucanfd_ip_core/doc/linux_driver/build/ctucanfd-driver.html>`_ + (11) `Integration with PCIe interfacing for Intel/Altera Cyclone IV based board <https://gitlab.fel.cvut.cz/canbus/pcie-ctu_can_fd>`_ diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst new file mode 100644 index 00000000..cfa6186e --- /dev/null +++ b/docs/system/devices/canokey.rst @@ -0,0 +1,158 @@ +.. _canokey: + +CanoKey QEMU +------------ + +CanoKey [1]_ is an open-source secure key with supports of + +* U2F / FIDO2 with Ed25519 and HMAC-secret +* OpenPGP Card V3.4 with RSA4096, Ed25519 and more [2]_ +* PIV (NIST SP 800-73-4) +* HOTP / TOTP +* NDEF + +All these platform-independent features are in canokey-core [3]_. + +For different platforms, CanoKey has different implementations, +including both hardware implementions and virtual cards: + +* CanoKey STM32 [4]_ +* CanoKey Pigeon [5]_ +* (virt-card) CanoKey USB/IP +* (virt-card) CanoKey FunctionFS + +In QEMU, yet another CanoKey virt-card is implemented. +CanoKey QEMU exposes itself as a USB device to the guest OS. + +With the same software configuration as a hardware key, +the guest OS can use all the functionalities of a secure key as if +there was actually an hardware key plugged in. + +CanoKey QEMU provides much convenience for debugging: + +* libcanokey-qemu supports debugging output thus developers can + inspect what happens inside a secure key +* CanoKey QEMU supports trace event thus event +* QEMU USB stack supports pcap thus USB packet between the guest + and key can be captured and analysed + +Then for developers: + +* For developers on software with secure key support (e.g. FIDO2, OpenPGP), + they can see what happens inside the secure key +* For secure key developers, USB packets between guest OS and CanoKey + can be easily captured and analysed + +Also since this is a virtual card, it can be easily used in CI for testing +on code coping with secure key. + +Building +======== + +libcanokey-qemu is required to use CanoKey QEMU. + +.. code-block:: shell + + git clone https://github.com/canokeys/canokey-qemu + mkdir canokey-qemu/build + pushd canokey-qemu/build + +If you want to install libcanokey-qemu in a different place, +add ``-DCMAKE_INSTALL_PREFIX=/path/to/your/place`` to cmake below. + +.. code-block:: shell + + cmake .. + make + make install # may need sudo + popd + +Then configuring and building: + +.. code-block:: shell + + # depending on your env, lib/pkgconfig can be lib64/pkgconfig + export PKG_CONFIG_PATH=/path/to/your/place/lib/pkgconfig:$PKG_CONFIG_PATH + ./configure --enable-canokey && make + +Using CanoKey QEMU +================== + +CanoKey QEMU stores all its data on a file of the host specified by the argument +when invoking qemu. + +.. parsed-literal:: + + |qemu_system| -usb -device canokey,file=$HOME/.canokey-file + +Note: you should keep this file carefully as it may contain your private key! + +The first time when the file is used, it is created and initialized by CanoKey, +afterwards CanoKey QEMU would just read this file. + +After the guest OS boots, you can check that there is a USB device. + +For example, If the guest OS is an Linux machine. You may invoke lsusb +and find CanoKey QEMU there: + +.. code-block:: shell + + $ lsusb + Bus 001 Device 002: ID 20a0:42d4 Clay Logic CanoKey QEMU + +You may setup the key as guided in [6]_. The console for the key is at [7]_. + +Debugging +========= + +CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``, +the latter of which resides in QEMU. The former provides core functionality +of a secure key while the latter provides platform-dependent functions: +USB packet handling. + +If you want to trace what happens inside the secure key, when compiling +libcanokey-qemu, you should add ``-DQEMU_DEBUG_OUTPUT=ON`` in cmake command +line: + +.. code-block:: shell + + cmake .. -DQEMU_DEBUG_OUTPUT=ON + +If you want to trace events happened in canokey.c, use + +.. parsed-literal:: + + |qemu_system| --trace "canokey_*" \\ + -usb -device canokey,file=$HOME/.canokey-file + +If you want to capture USB packets between the guest and the host, you can: + +.. parsed-literal:: + + |qemu_system| -usb -device canokey,file=$HOME/.canokey-file,pcap=key.pcap + +Limitations +=========== + +Currently libcanokey-qemu.so has dozens of global variables as it was originally +designed for embedded systems. Thus one qemu instance can not have +multiple CanoKey QEMU running, namely you can not + +.. parsed-literal:: + + |qemu_system| -usb -device canokey,file=$HOME/.canokey-file \\ + -device canokey,file=$HOME/.canokey-file2 + +Also, there is no lock on canokey-file, thus two CanoKey QEMU instance +can not read one canokey-file at the same time. + +References +========== + +.. [1] `<https://canokeys.org>`_ +.. [2] `<https://docs.canokeys.org/userguide/openpgp/#supported-algorithm>`_ +.. [3] `<https://github.com/canokeys/canokey-core>`_ +.. [4] `<https://github.com/canokeys/canokey-stm32>`_ +.. [5] `<https://github.com/canokeys/canokey-pigeon>`_ +.. [6] `<https://docs.canokeys.org/>`_ +.. [7] `<https://console.canokeys.org/>`_ diff --git a/docs/system/devices/ccid.rst b/docs/system/devices/ccid.rst new file mode 100644 index 00000000..3b8c2ab4 --- /dev/null +++ b/docs/system/devices/ccid.rst @@ -0,0 +1,171 @@ +Chip Card Interface Device (CCID) +================================= + +USB CCID device +--------------- +The USB CCID device is a USB device implementing the CCID specification, which +lets one connect smart card readers that implement the same spec. For more +information see the specification:: + + Universal Serial Bus + Device Class: Smart Card + CCID + Specification for + Integrated Circuit(s) Cards Interface Devices + Revision 1.1 + April 22rd, 2005 + +Smartcards are used for authentication, single sign on, decryption in +public/private schemes and digital signatures. A smartcard reader on the client +cannot be used on a guest with simple usb passthrough since it will then not be +available on the client, possibly locking the computer when it is "removed". On +the other hand this device can let you use the smartcard on both the client and +the guest machine. It is also possible to have a completely virtual smart card +reader and smart card (i.e. not backed by a physical device) using this device. + +Building +-------- +The cryptographic functions and access to the physical card is done via the +libcacard library, whose development package must be installed prior to +building QEMU: + +In redhat/fedora:: + + yum install libcacard-devel + +In ubuntu:: + + apt-get install libcacard-dev + +Configuring and building:: + + ./configure --enable-smartcard && make + +Using ccid-card-emulated with hardware +-------------------------------------- +Assuming you have a working smartcard on the host with the current +user, using libcacard, QEMU acts as another client using ccid-card-emulated:: + + qemu -usb -device usb-ccid -device ccid-card-emulated + +Using ccid-card-emulated with certificates stored in files +---------------------------------------------------------- +You must create the CA and card certificates. This is a one time process. +We use NSS certificates:: + + mkdir fake-smartcard + cd fake-smartcard + certutil -N -d sql:$PWD + certutil -S -d sql:$PWD -s "CN=Fake Smart Card CA" -x -t TC,TC,TC -n fake-smartcard-ca + certutil -S -d sql:$PWD -t ,, -s "CN=John Doe" -n id-cert -c fake-smartcard-ca + certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (signing)" --nsCertType smime -n signing-cert -c fake-smartcard-ca + certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (encryption)" --nsCertType sslClient -n encryption-cert -c fake-smartcard-ca + +Note: you must have exactly three certificates. + +You can use the emulated card type with the certificates backend:: + + qemu -usb -device usb-ccid -device ccid-card-emulated,backend=certificates,db=sql:$PWD,cert1=id-cert,cert2=signing-cert,cert3=encryption-cert + +To use the certificates in the guest, export the CA certificate:: + + certutil -L -r -d sql:$PWD -o fake-smartcard-ca.cer -n fake-smartcard-ca + +and import it in the guest:: + + certutil -A -d /etc/pki/nssdb -i fake-smartcard-ca.cer -t TC,TC,TC -n fake-smartcard-ca + +In a Linux guest you can then use the CoolKey PKCS #11 module to access +the card:: + + certutil -d /etc/pki/nssdb -L -h all + +It will prompt you for the PIN (which is the password you assigned to the +certificate database early on), and then show you all three certificates +together with the manually imported CA cert:: + + Certificate Nickname Trust Attributes + fake-smartcard-ca CT,C,C + John Doe:CAC ID Certificate u,u,u + John Doe:CAC Email Signature Certificate u,u,u + John Doe:CAC Email Encryption Certificate u,u,u + +If this does not happen, CoolKey is not installed or not registered with +NSS. Registration can be done from Firefox or the command line:: + + modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile /usr/lib64/pkcs11/libcoolkeypk11.so + modutil -dbdir /etc/pki/nssdb -list + +Using ccid-card-passthru with client side hardware +-------------------------------------------------- +On the host specify the ccid-card-passthru device with a suitable chardev:: + + qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \ + -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid + +On the client run vscclient, built when you built QEMU:: + + vscclient <qemu-host> 2001 + +Using ccid-card-passthru with client side certificates +------------------------------------------------------ +This case is not particularly useful, but you can use it to debug +your setup. + +Follow instructions above, except run QEMU and vscclient as follows. + +Run qemu as per above, and run vscclient from the "fake-smartcard" +directory as follows:: + + qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \ + -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid + vscclient -e "db=\"sql:$PWD\" use_hw=no soft=(,Test,CAC,,id-cert,signing-cert,encryption-cert)" <qemu-host> 2001 + + +Passthrough protocol scenario +----------------------------- +This is a typical interchange of messages when using the passthru card device. +usb-ccid is a usb device. It defaults to an unattached usb device on startup. +usb-ccid expects a chardev and expects the protocol defined in +cac_card/vscard_common.h to be passed over that. +The usb-ccid device can be in one of three modes: + +* detached +* attached with no card +* attached with card + +A typical interchange is (the arrow shows who started each exchange, it can be client +originated or guest originated):: + + client event | vscclient | passthru | usb-ccid | guest event + ------------------------------------------------------------------------------------------------ + | VSC_Init | | | + | VSC_ReaderAdd | | attach | + | | | | sees new usb device. + card inserted -> | | | | + | VSC_ATR | insert | insert | see new card + | | | | + | VSC_APDU | VSC_APDU | | <- guest sends APDU + client <-> physical | | | | + card APDU exchange | | | | + client response -> | VSC_APDU | VSC_APDU | | receive APDU response + ... + [APDU<->APDU repeats several times] + ... + card removed -> | | | | + | VSC_CardRemove | remove | remove | card removed + ... + [(card insert, apdu's, card remove) repeat] + ... + kill/quit | | | | + vscclient | | | | + | VSC_ReaderRemove | | detach | + | | | | usb device removed. + +libcacard +--------- +Both ccid-card-emulated and vscclient use libcacard as the card emulator. +libcacard implements a completely virtual CAC (DoD standard for smart +cards) compliant card and uses NSS to retrieve certificates and do +any encryption. The backend can then be a real reader and card, or +certificates stored in files. diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst new file mode 100644 index 00000000..f25783a4 --- /dev/null +++ b/docs/system/devices/cxl.rst @@ -0,0 +1,386 @@ +Compute Express Link (CXL) +========================== +From the view of a single host, CXL is an interconnect standard that +targets accelerators and memory devices attached to a CXL host. +This description will focus on those aspects visible either to +software running on a QEMU emulated host or to the internals of +functional emulation. As such, it will skip over many of the +electrical and protocol elements that would be more of interest +for real hardware and will dominate more general introductions to CXL. +It will also completely ignore the fabric management aspects of CXL +by considering only a single host and a static configuration. + +CXL shares many concepts and much of the infrastructure of PCI Express, +with CXL Host Bridges, which have CXL Root Ports which may be directly +attached to CXL or PCI End Points. Alternatively there may be CXL Switches +with CXL and PCI Endpoints attached below them. In many cases additional +control and capabilities are exposed via PCI Express interfaces. +This sharing of interfaces and hence emulation code is reflected +in how the devices are emulated in QEMU. In most cases the various +CXL elements are built upon an equivalent PCIe devices. + +CXL devices support the following interfaces: + +* Most conventional PCIe interfaces + + - Configuration space access + - BAR mapped memory accesses used for registers and mailboxes. + - MSI/MSI-X + - AER + - DOE mailboxes + - IDE + - Many other PCI express defined interfaces.. + +* Memory operations + + - Equivalent of accessing DRAM / NVDIMMs. Any access / feature + supported by the host for normal memory should also work for + CXL attached memory devices. + +* Cache operations. The are mostly irrelevant to QEMU emulation as + QEMU is not emulating a coherency protocol. Any emulation related + to these will be device specific and is out of the scope of this + document. + +CXL 2.0 Device Types +-------------------- +CXL 2.0 End Points are often categorized into three types. + +**Type 1:** These support coherent caching of host memory. Example might +be a crypto accelerators. May also have device private memory accessible +via means such as PCI memory reads and writes to BARs. + +**Type 2:** These support coherent caching of host memory and host +managed device memory (HDM) for which the coherency protocol is managed +by the host. This is a complex topic, so for more information on CXL +coherency see the CXL 2.0 specification. + +**Type 3 Memory devices:** These devices act as a means of attaching +additional memory (HDM) to a CXL host including both volatile and +persistent memory. The CXL topology may support interleaving across a +number of Type 3 memory devices using HDM Decoders in the host, host +bridge, switch upstream port and endpoints. + +Scope of CXL emulation in QEMU +------------------------------ +The focus of CXL emulation is CXL revision 2.0 and later. Earlier CXL +revisions defined a smaller set of features, leaving much of the control +interface as implementation defined or device specific, making generic +emulation challenging with host specific firmware being responsible +for setup and the Endpoints being presented to operating systems +as Root Complex Integrated End Points. CXL rev 2.0 looks a lot +more like PCI Express, with fully specified discoverability +of the CXL topology. + +CXL System components +---------------------- +A CXL system is made up a Host with a number of 'standard components' +the control and capabilities of which are discoverable by system software +using means described in the CXL 2.0 specification. + +CXL Fixed Memory Windows (CFMW) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A CFMW consists of a particular range of Host Physical Address space +which is routed to particular CXL Host Bridges. At time of generic +software initialization it will have a particularly interleaving +configuration and associated Quality of Service Throttling Group (QTG). +This information is available to system software, when making +decisions about how to configure interleave across available CXL +memory devices. It is provide as CFMW Structures (CFMWS) in +the CXL Early Discovery Table, an ACPI table. + +Note: QTG 0 is the only one currently supported in QEMU. + +CXL Host Bridge (CXL HB) +~~~~~~~~~~~~~~~~~~~~~~~~ +A CXL host bridge is similar to the PCIe equivalent, but with a +specification defined register interface called CXL Host Bridge +Component Registers (CHBCR). The location of this CHBCR MMIO +space is described to system software via a CXL Host Bridge +Structure (CHBS) in the CEDT ACPI table. The actual interfaces +are identical to those used for other parts of the CXL hierarchy +as CXL Component Registers in PCI BARs. + +Interfaces provided include: + +* Configuration of HDM Decoders to route CXL Memory accesses with + a particularly Host Physical Address range to the target port + below which the CXL device servicing that address lies. This + may be a mapping to a single Root Port (RP) or across a set of + target RPs. + +CXL Root Ports (CXL RP) +~~~~~~~~~~~~~~~~~~~~~~~ +A CXL Root Port servers te same purpose as a PCIe Root Port. +There are a number of CXL specific Designated Vendor Specific +Extended Capabilities (DVSEC) in PCIe Configuration Space +and associated component register access via PCI bars. + +CXL Switch +~~~~~~~~~~ +Here we consider a simple CXL switch with only a single +virtual hierarchy. Whilst more complex devices exist, their +visibility to a particular host is generally the same as for +a simple switch design. Hosts often have no awareness +of complex rerouting and device pooling, they simply see +devices being hot added or hot removed. + +A CXL switch has a similar architecture to those in PCIe, +with a single upstream port, internal PCI bus and multiple +downstream ports. + +Both the CXL upstream and downstream ports have CXL specific +DVSECs in configuration space, and component registers in PCI +BARs. The Upstream Port has the configuration interfaces for +the HDM decoders which route incoming memory accesses to the +appropriate downstream port. + +A CXL switch is created in a similar fashion to PCI switches +by creating an upstream port (cxl-upstream) and a number of +downstream ports on the internal switch bus (cxl-downstream). + +CXL Memory Devices - Type 3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +CXL type 3 devices use a PCI class code and are intended to be supported +by a generic operating system driver. They have HDM decoders +though in these EP devices, the decoder is responsible not for +routing but for translation of the incoming host physical address (HPA) +into a Device Physical Address (DPA). + +CXL Memory Interleave +--------------------- +To understand the interaction of different CXL hardware components which +are emulated in QEMU, let us consider a memory read in a fully configured +CXL topology. Note that system software is responsible for configuration +of all components with the exception of the CFMWs. System software is +responsible for allocating appropriate ranges from within the CFMWs +and exposing those via normal memory configurations as would be done +for system RAM. + +Example system Topology. x marks the match in each decoder level:: + + |<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->| + | __________ __________________________________ __________ | + | | | | | | | | + | | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | | + | | HB0 only | | Configured to interleave memory | | HB1 only | | + | | | | memory accesses across HB0/HB1 | | | | + | |__________| |_____x____________________________| |__________| | + | | | | + | | | | + | | | | + | Interleave Decoder | | + | Matches this HB | | + \_____________| |_____________/ + __________|__________ _____|_______________ + | | | | + (2) | CXL HB 0 | | CXL HB 1 | + | HB IntLv Decoders | | HB IntLv Decoders | + | PCI/CXL Root Bus 0c | | PCI/CXL Root Bus 0d | + | | | | + |___x_________________| |_____________________| + | | | | + | | | | + A HB 0 HDM Decoder | | | + matches this Port | | | + | | | | + ___________|___ __________|__ __|_________ ___|_________ + (3)| Root Port 0 | | Root Port 1 | | Root Port 2| | Root Port 3 | + | Appears in | | Appears in | | Appears in | | Appear in | + | PCI topology | | PCI Topology| | PCI Topo | | PCI Topo | + | As 0c:00.0 | | as 0c:01.0 | | as de:00.0 | | as de:01.0 | + |_______________| |_____________| |____________| |_____________| + | | | | + | | | | + _____|_________ ______|______ ______|_____ ______|_______ + (4)| x | | | | | | | + | CXL Type3 0 | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 | + | | | | | | | | + | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...) | + | Decoder to go | | | | | | | + | from host PA | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0 | + | to device PA | | | | | | | + | PCI as 0d:00.0| | | | | | | + |_______________| |_____________| |____________| |______________| + +Notes: + +(1) **3 CXL Fixed Memory Windows (CFMW)** corresponding to different + ranges of the system physical address map. Each CFMW has + particular interleave setup across the CXL Host Bridges (HB) + CFMW0 provides uninterleaved access to HB0, CFW2 provides + uninterleaved access to HB1. CFW1 provides interleaved memory access + across HB0 and HB1. + +(2) **Two CXL Host Bridges**. Each of these has 2 CXL Root Ports and + programmable HDM decoders to route memory accesses either to + a single port or interleave them across multiple ports. + A complex configuration here, might be to use the following HDM + decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence + part of CXL Type3 0. HDM1 routes CFMW0 requests from a + different region of the CFMW0 PA range to RP2 and hence part + of CXL Type 3 1. HDM2 routes yet another PA range from within + CFMW0 to be interleaved across RP0 and RP1, providing 2 way + interleave of part of the memory provided by CXL Type3 0 and + CXL Type 3 1. HDM3 routes those interleaved accesses from + CFMW1 that target HB0 to RP 0 and another part of the memory of + CXL Type 3 0 (as part of a 2 way interleave at the system level + across for example CXL Type3 0 and CXL Type3 2. + HDM4 is used to enable system wide 4 way interleave across all + the present CXL type3 devices, by interleaving those (interleaved) + requests that HB0 receives from from CFMW1 across RP 0 and + RP 1 and hence to yet more regions of the memory of the + attached Type3 devices. Note this is a representative subset + of the full range of possible HDM decoder configurations in this + topology. + +(3) **Four CXL Root Ports.** In this case the CXL Type 3 devices are + directly attached to these ports. + +(4) **Four CXL Type3 memory expansion devices.** These will each have + HDM decoders, but in this case rather than performing interleave + they will take the Host Physical Addresses of accesses and map + them to their own local Device Physical Address Space (DPA). + +Example topology involving a switch:: + + |<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->| + | __________ __________________________________ __________ | + | | | | | | | | + | | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | | + | | HB0 only | | Configured to interleave memory | | HB1 only | | + | | | | memory accesses across HB0/HB1 | | | | + | |____x_____| |__________________________________| |__________| | + | | | | + | | | | + | | | + Interleave Decoder | | | + Matches this HB | | | + \_____________| |_____________/ + __________|__________ _____|_______________ + | | | | + | CXL HB 0 | | CXL HB 1 | + | HB IntLv Decoders | | HB IntLv Decoders | + | PCI/CXL Root Bus 0c | | PCI/CXL Root Bus 0d | + | | | | + |___x_________________| |_____________________| + | | | | + | + A HB 0 HDM Decoder + matches this Port + ___________|___ + | Root Port 0 | + | Appears in | + | PCI topology | + | As 0c:00.0 | + |___________x___| + | + | + \_____________________ + | + | + --------------------------------------------------- + | Switch 0 USP as PCI 0d:00.0 | + | USP has HDM decoder which direct traffic to | + | appropriate downstream port | + | Switch BUS appears as 0e | + |x__________________________________________________| + | | | | + | | | | + _____|_________ ______|______ ______|_____ ______|_______ + (4)| x | | | | | | | + | CXL Type3 0 | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 | + | | | | | | | | + | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...) | + | Decoder to go | | | | | | | + | from host PA | | PCI 10:00.0 | | PCI 11:00.0| | PCI 12:00.0 | + | to device PA | | | | | | | + | PCI as 0f:00.0| | | | | | | + |_______________| |_____________| |____________| |______________| + +Example command lines +--------------------- +A very simple setup with just one directly attached CXL Type 3 device:: + + qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \ + ... + -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ + -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G + +A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way +interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with +the CXL Type3 device directly attached (no switches).:: + + qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \ + ... + -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \ + -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \ + -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \ + -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ + -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ + -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \ + -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \ + -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \ + -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \ + -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \ + -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \ + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k + +An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave:: + + qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \ + ... + -object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \ + -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M \ + -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \ + -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \ + -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ + -device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \ + -device cxl-upstream,bus=root_port0,id=us0 \ + -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ + -device cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \ + -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ + -device cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \ + -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ + -device cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \ + -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ + -device cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \ + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k + +Kernel Configuration Options +---------------------------- + +In Linux 5.18 the following options are necessary to make use of +OS management of CXL memory devices as described here. + +* CONFIG_CXL_BUS +* CONFIG_CXL_PCI +* CONFIG_CXL_ACPI +* CONFIG_CXL_PMEM +* CONFIG_CXL_MEM +* CONFIG_CXL_PORT +* CONFIG_CXL_REGION + +References +---------- + + - Consortium website for specifications etc: + http://www.computeexpresslink.org + - Compute Express link Revision 2 specification, October 2020 + - CEDT CFMWS & QTG _DSM ECN May 2021 diff --git a/docs/system/devices/ivshmem.rst b/docs/system/devices/ivshmem.rst new file mode 100644 index 00000000..b03a48af --- /dev/null +++ b/docs/system/devices/ivshmem.rst @@ -0,0 +1,64 @@ +.. _pcsys_005fivshmem: + +Inter-VM Shared Memory device +----------------------------- + +On Linux hosts, a shared memory device is available. The basic syntax +is: + +.. parsed-literal:: + + |qemu_system_x86| -device ivshmem-plain,memdev=hostmem + +where hostmem names a host memory backend. For a POSIX shared memory +backend, use something like + +:: + + -object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=hostmem + +If desired, interrupts can be sent between guest VMs accessing the same +shared memory region. Interrupt support requires using a shared memory +server and using a chardev socket to connect to it. The code for the +shared memory server is qemu.git/contrib/ivshmem-server. An example +syntax when using the shared memory server is: + +.. parsed-literal:: + + # First start the ivshmem server once and for all + ivshmem-server -p pidfile -S path -m shm-name -l shm-size -n vectors + + # Then start your qemu instances with matching arguments + |qemu_system_x86| -device ivshmem-doorbell,vectors=vectors,chardev=id + -chardev socket,path=path,id=id + +When using the server, the guest will be assigned a VM ID (>=0) that +allows guests using the same server to communicate via interrupts. +Guests can read their VM ID from a device register (see +ivshmem-spec.txt). + +Migration with ivshmem +~~~~~~~~~~~~~~~~~~~~~~ + +With device property ``master=on``, the guest will copy the shared +memory on migration to the destination host. With ``master=off``, the +guest will not be able to migrate with the device attached. In the +latter case, the device should be detached and then reattached after +migration using the PCI hotplug support. + +At most one of the devices sharing the same memory can be master. The +master must complete migration before you plug back the other devices. + +ivshmem and hugepages +~~~~~~~~~~~~~~~~~~~~~ + +Instead of specifying the <shm size> using POSIX shm, you may specify a +memory backend that has hugepage support: + +.. parsed-literal:: + + |qemu_system_x86| -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 + -device ivshmem-plain,memdev=mb1 + +ivshmem-server also supports hugepages mount points with the ``-m`` +memory path argument. diff --git a/docs/system/devices/net.rst b/docs/system/devices/net.rst new file mode 100644 index 00000000..4b2640c4 --- /dev/null +++ b/docs/system/devices/net.rst @@ -0,0 +1,100 @@ +.. _pcsys_005fnetwork: + +Network emulation +----------------- + +QEMU can simulate several network cards (e.g. PCI or ISA cards on the PC +target) and can connect them to a network backend on the host or an +emulated hub. The various host network backends can either be used to +connect the NIC of the guest to a real network (e.g. by using a TAP +devices or the non-privileged user mode network stack), or to other +guest instances running in another QEMU process (e.g. by using the +socket host network backend). + +Using TAP network interfaces +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This is the standard way to connect QEMU to a real network. QEMU adds a +virtual network device on your host (called ``tapN``), and you can then +configure it as if it was a real ethernet card. + +Linux host +^^^^^^^^^^ + +As an example, you can download the ``linux-test-xxx.tar.gz`` archive +and copy the script ``qemu-ifup`` in ``/etc`` and configure properly +``sudo`` so that the command ``ifconfig`` contained in ``qemu-ifup`` can +be executed as root. You must verify that your host kernel supports the +TAP network interfaces: the device ``/dev/net/tun`` must be present. + +See :ref:`sec_005finvocation` to have examples of command +lines using the TAP network interfaces. + +Windows host +^^^^^^^^^^^^ + +There is a virtual ethernet driver for Windows 2000/XP systems, called +TAP-Win32. But it is not included in standard QEMU for Windows, so you +will need to get it separately. It is part of OpenVPN package, so +download OpenVPN from : https://openvpn.net/. + +Using the user mode network stack +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By using the option ``-net user`` (default configuration if no ``-net`` +option is specified), QEMU uses a completely user mode network stack +(you don't need root privilege to use the virtual network). The virtual +network configuration is the following:: + + guest (10.0.2.15) <------> Firewall/DHCP server <-----> Internet + | (10.0.2.2) + | + ----> DNS server (10.0.2.3) + | + ----> SMB server (10.0.2.4) + +The QEMU VM behaves as if it was behind a firewall which blocks all +incoming connections. You can use a DHCP client to automatically +configure the network in the QEMU VM. The DHCP server assign addresses +to the hosts starting from 10.0.2.15. + +In order to check that the user mode network is working, you can ping +the address 10.0.2.2 and verify that you got an address in the range +10.0.2.x from the QEMU virtual DHCP server. + +Note that ICMP traffic in general does not work with user mode +networking. ``ping``, aka. ICMP echo, to the local router (10.0.2.2) +shall work, however. If you're using QEMU on Linux >= 3.0, it can use +unprivileged ICMP ping sockets to allow ``ping`` to the Internet. The +host admin has to set the ping_group_range in order to grant access to +those sockets. To allow ping for GID 100 (usually users group):: + + echo 100 100 > /proc/sys/net/ipv4/ping_group_range + +When using the built-in TFTP server, the router is also the TFTP server. + +When using the ``'-netdev user,hostfwd=...'`` option, TCP or UDP +connections can be redirected from the host to the guest. It allows for +example to redirect X11, telnet or SSH connections. + +Hubs +~~~~ + +QEMU can simulate several hubs. A hub can be thought of as a virtual +connection between several network devices. These devices can be for +example QEMU virtual ethernet cards or virtual Host ethernet devices +(TAP devices). You can connect guest NICs or host network backends to +such a hub using the ``-netdev +hubport`` or ``-nic hubport`` options. The legacy ``-net`` option also +connects the given device to the emulated hub with ID 0 (i.e. the +default hub) unless you specify a netdev with ``-net nic,netdev=xxx`` +here. + +Connecting emulated networks between QEMU instances +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Using the ``-netdev socket`` (or ``-nic socket`` or ``-net socket``) +option, it is possible to create emulated networks that span several +QEMU instances. See the description of the ``-netdev socket`` option in +:ref:`sec_005finvocation` to have a basic +example. diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst new file mode 100644 index 00000000..30f841ef --- /dev/null +++ b/docs/system/devices/nvme.rst @@ -0,0 +1,323 @@ +============== +NVMe Emulation +============== + +QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and +``nvme-subsys`` devices. + +See the following sections for specific information on + + * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. + * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, + `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data + Protection`_, + +Adding NVMe Devices +=================== + +Controller Emulation +-------------------- + +The QEMU emulated NVMe controller implements version 1.4 of the NVM Express +specification. All mandatory features are implement with a couple of exceptions +and limitations: + + * Accounting numbers in the SMART/Health log page are reset when the device + is power cycled. + * Interrupt Coalescing is not supported and is disabled by default. + +The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the +following parameters: + +.. code-block:: console + + -drive file=nvm.img,if=none,id=nvm + -device nvme,serial=deadbeef,drive=nvm + +There are a number of optional general parameters for the ``nvme`` device. Some +are mentioned here, but see ``-device nvme,help`` to list all possible +parameters. + +``max_ioqpairs=UINT32`` (default: ``64``) + Set the maximum number of allowed I/O queue pairs. This replaces the + deprecated ``num_queues`` parameter. + +``msix_qsize=UINT16`` (default: ``65``) + The number of MSI-X vectors that the device should support. + +``mdts=UINT8`` (default: ``7``) + Set the Maximum Data Transfer Size of the device. + +``use-intel-id`` (default: ``off``) + Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and + Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID + previously used. + +Additional Namespaces +--------------------- + +In the simplest possible invocation sketched above, the device only support a +single namespace with the namespace identifier ``1``. To support multiple +namespaces and additional features, the ``nvme-ns`` device must be used. + +.. code-block:: console + + -device nvme,id=nvme-ctrl-0,serial=deadbeef + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2 + +The namespaces defined by the ``nvme-ns`` device will attach to the most +recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace +identifiers are allocated automatically, starting from ``1``. + +There are a number of parameters available: + +``nsid`` (default: ``0``) + Explicitly set the namespace identifier. + +``uuid`` (default: *autogenerated*) + Set the UUID of the namespace. This will be reported as a "Namespace UUID" + descriptor in the Namespace Identification Descriptor List. + +``eui64`` + Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended + Unique Identifier" descriptor in the Namespace Identification Descriptor List. + Since machine type 6.1 a non-zero default value is used if the parameter + is not provided. For earlier machine types the field defaults to 0. + +``bus`` + If there are more ``nvme`` devices defined, this parameter may be used to + attach the namespace to a specific ``nvme`` device (identified by an ``id`` + parameter on the controller device). + +NVM Subsystems +-------------- + +Additional features becomes available if the controller device (``nvme``) is +linked to an NVM Subsystem device (``nvme-subsys``). + +The NVM Subsystem emulation allows features such as shared namespaces and +multipath I/O. + +.. code-block:: console + + -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0 + -device nvme,serial=deadbeef,subsys=nvme-subsys-0 + -device nvme,serial=deadbeef,subsys=nvme-subsys-0 + +This will create an NVM subsystem with two controllers. Having controllers +linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters: + +``shared`` (default: ``on`` since 6.2) + Specifies that the namespace will be attached to all controllers in the + subsystem. If set to ``off``, the namespace will remain a private namespace + and may only be attached to a single controller at a time. Shared namespaces + are always automatically attached to all controllers (also when controllers + are hotplugged). + +``detached`` (default: ``off``) + If set to ``on``, the namespace will be be available in the subsystem, but + not attached to any controllers initially. A shared namespace with this set + to ``on`` will never be automatically attached to controllers. + +Thus, adding + +.. code-block:: console + + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1,nsid=1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on + +will cause NSID 1 will be a shared namespace that is initially attached to both +controllers. NSID 3 will be a private namespace due to ``shared=off`` and only +attachable to a single controller at a time. Additionally it will not be +attached to any controller initially (due to ``detached=on``) or to hotplugged +controllers. + +Optional Features +================= + +Controller Memory Buffer +------------------------ + +``nvme`` device parameters related to the Controller Memory Buffer support: + +``cmb_size_mb=UINT32`` (default: ``0``) + This adds a Controller Memory Buffer of the given size at offset zero in BAR + 2. + +``legacy-cmb`` (default: ``off``) + By default, the device uses the "v1.4 scheme" for the Controller Memory + Buffer support (i.e, the CMB is initially disabled and must be explicitly + enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the + CMB. + +Simple Copy +----------- + +The device includes support for TP 4065 ("Simple Copy Command"). A number of +additional ``nvme-ns`` device parameters may be used to control the Copy +command limits: + +``mssrl=UINT16`` (default: ``128``) + Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum + number of logical blocks that may be specified in each source range. + +``mcl=UINT32`` (default: ``128``) + Set the Maximum Copy Length (``MCL``). This is the maximum number of logical + blocks that may be specified in a Copy command (the total for all source + ranges). + +``msrc=UINT8`` (default: ``127``) + Set the Maximum Source Range Count (``MSRC``). This is the maximum number of + source ranges that may be used in a Copy command. This is a 0's based value. + +Zoned Namespaces +---------------- + +A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set +``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace. + +The namespace may be configured with additional parameters + +``zoned.zone_size=SIZE`` (default: ``128MiB``) + Define the zone size (``ZSZE``). + +``zoned.zone_capacity=SIZE`` (default: ``0``) + Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone + capacity will equal the zone size. + +``zoned.descr_ext_size=UINT32`` (default: ``0``) + Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64 + bytes. + +``zoned.cross_read=BOOL`` (default: ``off``) + Set to ``on`` to allow reads to cross zone boundaries. + +``zoned.max_active=UINT32`` (default: ``0``) + Set the maximum number of active resources (``MAR``). The default (``0``) + allows all zones to be active. + +``zoned.max_open=UINT32`` (default: ``0``) + Set the maximum number of open resources (``MOR``). The default (``0``) + allows all zones to be open. If ``zoned.max_active`` is specified, this value + must be less than or equal to that. + +``zoned.zasl=UINT8`` (default: ``0``) + Set the maximum data transfer size for the Zone Append command. Like + ``mdts``, the value is specified as a power of two (2^n) and is in units of + the minimum memory page size (CAP.MPSMIN). The default value (``0``) + has this property inherit the ``mdts`` value. + +Metadata +-------- + +The virtual namespace device supports LBA metadata in the form separate +metadata (``MPTR``-based) and extended LBAs. + +``ms=UINT16`` (default: ``0``) + Defines the number of metadata bytes per LBA. + +``mset=UINT8`` (default: ``0``) + Set to ``1`` to enable extended LBAs. + +End-to-End Data Protection +-------------------------- + +The virtual namespace device supports DIF- and DIX-based protection information +(depending on ``mset``). + +``pi=UINT8`` (default: ``0``) + Enable protection information of the specified type (type ``1``, ``2`` or + ``3``). + +``pil=UINT8`` (default: ``0``) + Controls the location of the protection information within the metadata. Set + to ``1`` to transfer protection information as the first eight bytes of + metadata. Otherwise, the protection information is transferred as the last + eight bytes. + +Virtualization Enhancements and SR-IOV (Experimental Support) +------------------------------------------------------------- + +The ``nvme`` device supports Single Root I/O Virtualization and Sharing +along with Virtualization Enhancements. The controller has to be linked to +an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV. + +A number of parameters are present (**please note, that they may be +subject to change**): + +``sriov_max_vfs`` (default: ``0``) + Indicates the maximum number of PCIe virtual functions supported + by the controller. Specifying a non-zero value enables reporting of both + SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities + by the NVMe device. Virtual function controllers will not report SR-IOV. + +``sriov_vq_flexible`` + Indicates the total number of flexible queue resources assignable to all + the secondary controllers. Implicitly sets the number of primary + controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``. + +``sriov_vi_flexible`` + Indicates the total number of flexible interrupt resources assignable to + all the secondary controllers. Implicitly sets the number of primary + controller's private resources to ``(msix_qsize - sriov_vi_flexible)``. + +``sriov_max_vi_per_vf`` (default: ``0``) + Indicates the maximum number of virtual interrupt resources assignable + to a secondary controller. The default ``0`` resolves to + ``(sriov_vi_flexible / sriov_max_vfs)`` + +``sriov_max_vq_per_vf`` (default: ``0``) + Indicates the maximum number of virtual queue resources assignable to + a secondary controller. The default ``0`` resolves to + ``(sriov_vq_flexible / sriov_max_vfs)`` + +The simplest possible invocation enables the capability to set up one VF +controller and assign an admin queue, an IO queue, and a MSI-X interrupt. + +.. code-block:: console + + -device nvme-subsys,id=subsys0 + -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1, + sriov_vq_flexible=2,sriov_vi_flexible=1 + +The minimum steps required to configure a functional NVMe secondary +controller are: + + * unbind flexible resources from the primary controller + +.. code-block:: console + + nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 + nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 + + * perform a Function Level Reset on the primary controller to actually + release the resources + +.. code-block:: console + + echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset + + * enable VF + +.. code-block:: console + + echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs + + * assign the flexible resources to the VF and set it ONLINE + +.. code-block:: console + + nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 + nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 + nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 + + * bind the NVMe driver to the VF + +.. code-block:: console + + echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind
\ No newline at end of file diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst new file mode 100644 index 00000000..37cb9b33 --- /dev/null +++ b/docs/system/devices/usb.rst @@ -0,0 +1,408 @@ +.. _pcsys_005fusb: + +USB emulation +------------- + +QEMU can emulate a PCI UHCI, OHCI, EHCI or XHCI USB controller. You can +plug virtual USB devices or real host USB devices (only works with +certain host operating systems). QEMU will automatically create and +connect virtual USB hubs as necessary to connect multiple USB devices. + +USB controllers +~~~~~~~~~~~~~~~ + +XHCI controller support +^^^^^^^^^^^^^^^^^^^^^^^ + +QEMU has XHCI host adapter support. The XHCI hardware design is much +more virtualization-friendly when compared to EHCI and UHCI, thus XHCI +emulation uses less resources (especially CPU). So if your guest +supports XHCI (which should be the case for any operating system +released around 2010 or later) we recommend using it: + + qemu -device qemu-xhci + +XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the +only controller you need. With only a single USB controller (and +therefore only a single USB bus) present in the system there is no +need to use the bus= parameter when adding USB devices. + + +EHCI controller support +^^^^^^^^^^^^^^^^^^^^^^^ + +The QEMU EHCI Adapter supports USB 2.0 devices. It can be used either +standalone or with companion controllers (UHCI, OHCI) for USB 1.1 +devices. The companion controller setup is more convenient to use +because it provides a single USB bus supporting both USB 2.0 and USB +1.1 devices. See next section for details. + +When running EHCI in standalone mode you can add UHCI or OHCI +controllers for USB 1.1 devices too. Each controller creates its own +bus though, so there are two completely separate USB buses: One USB +1.1 bus driven by the UHCI controller and one USB 2.0 bus driven by +the EHCI controller. Devices must be attached to the correct +controller manually. + +The easiest way to add a UHCI controller to a ``pc`` machine is the +``-usb`` switch. QEMU will create the UHCI controller as function of +the PIIX3 chipset. The USB 1.1 bus will carry the name ``usb-bus.0``. + +You can use the standard ``-device`` switch to add a EHCI controller to +your virtual machine. It is strongly recommended to specify an ID for +the controller so the USB 2.0 bus gets an individual name, for example +``-device usb-ehci,id=ehci``. This will give you a USB 2.0 bus named +``ehci.0``. + +When adding USB devices using the ``-device`` switch you can specify the +bus they should be attached to. Here is a complete example: + +.. parsed-literal:: + + |qemu_system| -M pc ${otheroptions} \\ + -drive if=none,id=usbstick,format=raw,file=/path/to/image \\ + -usb \\ + -device usb-ehci,id=ehci \\ + -device usb-tablet,bus=usb-bus.0 \\ + -device usb-storage,bus=ehci.0,drive=usbstick + +This attaches a USB tablet to the UHCI adapter and a USB mass storage +device to the EHCI adapter. + + +Companion controller support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The UHCI and OHCI controllers can attach to a USB bus created by EHCI +as companion controllers. This is done by specifying the ``masterbus`` +and ``firstport`` properties. ``masterbus`` specifies the bus name the +controller should attach to. ``firstport`` specifies the first port the +controller should attach to, which is needed as usually one EHCI +controller with six ports has three UHCI companion controllers with +two ports each. + +There is a config file in docs which will do all this for +you, which you can use like this: + +.. parsed-literal:: + + |qemu_system| -readconfig docs/config/ich9-ehci-uhci.cfg + +Then use ``bus=ehci.0`` to assign your USB devices to that bus. + +Using the ``-usb`` switch for ``q35`` machines will create a similar +USB controller configuration. + + +.. _Connecting USB devices: + +Connecting USB devices +~~~~~~~~~~~~~~~~~~~~~~ + +USB devices can be connected with the ``-device usb-...`` command line +option or the ``device_add`` monitor command. Available devices are: + +``usb-mouse`` + Virtual Mouse. This will override the PS/2 mouse emulation when + activated. + +``usb-tablet`` + Pointer device that uses absolute coordinates (like a touchscreen). + This means QEMU is able to report the mouse position without having + to grab the mouse. Also overrides the PS/2 mouse emulation when + activated. + +``usb-storage,drive=drive_id`` + Mass storage device backed by drive_id (see the :ref:`disk images` + chapter in the System Emulation Users Guide). This is the classic + bulk-only transport protocol used by 99% of USB sticks. This + example shows it connected to an XHCI USB controller and with + a drive backed by a raw format disk image: + + .. parsed-literal:: + + |qemu_system| [...] \\ + -drive if=none,id=stick,format=raw,file=/path/to/file.img \\ + -device nec-usb-xhci,id=xhci \\ + -device usb-storage,bus=xhci.0,drive=stick + +``usb-uas`` + USB attached SCSI device. This does not create a SCSI disk, so + you need to explicitly create a ``scsi-hd`` or ``scsi-cd`` device + on the command line, as well as using the ``-drive`` option to + specify what those disks are backed by. One ``usb-uas`` device can + handle multiple logical units (disks). This example creates three + logical units: two disks and one cdrom drive: + + .. parsed-literal:: + + |qemu_system| [...] \\ + -drive if=none,id=uas-disk1,format=raw,file=/path/to/file1.img \\ + -drive if=none,id=uas-disk2,format=raw,file=/path/to/file2.img \\ + -drive if=none,id=uas-cdrom,media=cdrom,format=raw,file=/path/to/image.iso \\ + -device nec-usb-xhci,id=xhci \\ + -device usb-uas,id=uas,bus=xhci.0 \\ + -device scsi-hd,bus=uas.0,scsi-id=0,lun=0,drive=uas-disk1 \\ + -device scsi-hd,bus=uas.0,scsi-id=0,lun=1,drive=uas-disk2 \\ + -device scsi-cd,bus=uas.0,scsi-id=0,lun=5,drive=uas-cdrom + +``usb-bot`` + Bulk-only transport storage device. This presents the guest with the + same USB bulk-only transport protocol interface as ``usb-storage``, but + the QEMU command line option works like ``usb-uas`` and does not + automatically create SCSI disks for you. ``usb-bot`` supports up to + 16 LUNs. Unlike ``usb-uas``, the LUN numbers must be continuous, + i.e. for three devices you must use 0+1+2. The 0+1+5 numbering from the + ``usb-uas`` example above won't work with ``usb-bot``. + +``usb-mtp,rootdir=dir`` + Media transfer protocol device, using dir as root of the file tree + that is presented to the guest. + +``usb-host,hostbus=bus,hostaddr=addr`` + Pass through the host device identified by bus and addr + +``usb-host,vendorid=vendor,productid=product`` + Pass through the host device identified by vendor and product ID + +``usb-wacom-tablet`` + Virtual Wacom PenPartner tablet. This device is similar to the + ``tablet`` above but it can be used with the tslib library because in + addition to touch coordinates it reports touch pressure. + +``usb-kbd`` + Standard USB keyboard. Will override the PS/2 keyboard (if present). + +``usb-serial,chardev=id`` + Serial converter. This emulates an FTDI FT232BM chip connected to + host character device id. + +``usb-braille,chardev=id`` + Braille device. This emulates a Baum Braille device USB port. id has to + specify a character device defined with ``-chardev …,id=id``. One will + normally use BrlAPI to display the braille output on a BRLTTY-supported + device with + + .. parsed-literal:: + + |qemu_system| [...] -chardev braille,id=brl -device usb-braille,chardev=brl + + or alternatively, use the following equivalent shortcut: + + .. parsed-literal:: + + |qemu_system| [...] -usbdevice braille + +``usb-net[,netdev=id]`` + Network adapter that supports CDC ethernet and RNDIS protocols. id + specifies a netdev defined with ``-netdev …,id=id``. For instance, + user-mode networking can be used with + + .. parsed-literal:: + + |qemu_system| [...] -netdev user,id=net0 -device usb-net,netdev=net0 + +``usb-ccid`` + Smartcard reader device + +``usb-audio`` + USB audio device + +``u2f-{emulated,passthru}`` + Universal Second Factor device + +``canokey`` + An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more. + For more information, see :ref:`canokey`. + +Physical port addressing +^^^^^^^^^^^^^^^^^^^^^^^^ + +For all the above USB devices, by default QEMU will plug the device +into the next available port on the specified USB bus, or onto +some available USB bus if you didn't specify one explicitly. +If you need to, you can also specify the physical port where +the device will show up in the guest. This can be done using the +``port`` property. UHCI has two root ports (1,2). EHCI has six root +ports (1-6), and the emulated (1.1) USB hub has eight ports. + +Plugging a tablet into UHCI port 1 works like this:: + + -device usb-tablet,bus=usb-bus.0,port=1 + +Plugging a hub into UHCI port 2 works like this:: + + -device usb-hub,bus=usb-bus.0,port=2 + +Plugging a virtual USB stick into port 4 of the hub just plugged works +this way:: + + -device usb-storage,bus=usb-bus.0,port=2.4,drive=... + +In the monitor, the ``device_add` command also accepts a ``port`` +property specification. If you want to unplug devices too you should +specify some unique id which you can use to refer to the device. +You can then use ``device_del`` to unplug the device later. +For example:: + + (qemu) device_add usb-tablet,bus=usb-bus.0,port=1,id=my-tablet + (qemu) device_del my-tablet + +Hotplugging USB storage +~~~~~~~~~~~~~~~~~~~~~~~ + +The ``usb-bot`` and ``usb-uas`` devices can be hotplugged. In the hotplug +case they are added with ``attached = false`` so the guest will not see +the device until the ``attached`` property is explicitly set to true. +That allows you to attach one or more scsi devices before making the +device visible to the guest. The workflow looks like this: + +#. ``device-add usb-bot,id=foo`` +#. ``device-add scsi-{hd,cd},bus=foo.0,lun=0`` +#. optionally add more devices (luns 1 ... 15) +#. ``scripts/qmp/qom-set foo.attached = true`` + +.. _host_005fusb_005fdevices: + +Using host USB devices on a Linux host +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +WARNING: this is an experimental feature. QEMU will slow down when using +it. USB devices requiring real time streaming (i.e. USB Video Cameras) +are not supported yet. + +1. If you use an early Linux 2.4 kernel, verify that no Linux driver is + actually using the USB device. A simple way to do that is simply to + disable the corresponding kernel module by renaming it from + ``mydriver.o`` to ``mydriver.o.disabled``. + +2. Verify that ``/proc/bus/usb`` is working (most Linux distributions + should enable it by default). You should see something like that: + + :: + + ls /proc/bus/usb + 001 devices drivers + +3. Since only root can access to the USB devices directly, you can + either launch QEMU as root or change the permissions of the USB + devices you want to use. For testing, the following suffices: + + :: + + chown -R myuid /proc/bus/usb + +4. Launch QEMU and do in the monitor: + + :: + + info usbhost + Device 1.2, speed 480 Mb/s + Class 00: USB device 1234:5678, USB DISK + + You should see the list of the devices you can use (Never try to use + hubs, it won't work). + +5. Add the device in QEMU by using: + + :: + + device_add usb-host,vendorid=0x1234,productid=0x5678 + + Normally the guest OS should report that a new USB device is plugged. + You can use the option ``-device usb-host,...`` to do the same. + +6. Now you can try to use the host USB device in QEMU. + +When relaunching QEMU, you may have to unplug and plug again the USB +device to make it work again (this is a bug). + +``usb-host`` properties for specifying the host device +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The example above uses the ``vendorid`` and ``productid`` to +specify which host device to pass through, but this is not +the only way to specify the host device. ``usb-host`` supports +the following properties: + +``hostbus=<nr>`` + Specifies the bus number the device must be attached to +``hostaddr=<nr>`` + Specifies the device address the device got assigned by the guest os +``hostport=<str>`` + Specifies the physical port the device is attached to +``vendorid=<hexnr>`` + Specifies the vendor ID of the device +``productid=<hexnr>`` + Specifies the product ID of the device. + +In theory you can combine all these properties as you like. In +practice only a few combinations are useful: + +- ``vendorid`` and ``productid`` -- match for a specific device, pass it to + the guest when it shows up somewhere in the host. + +- ``hostbus`` and ``hostport`` -- match for a specific physical port in the + host, any device which is plugged in there gets passed to the + guest. + +- ``hostbus`` and ``hostaddr`` -- most useful for ad-hoc pass through as the + hostaddr isn't stable. The next time you plug the device into the host it + will get a new hostaddr. + +Note that on the host USB 1.1 devices are handled by UHCI/OHCI and USB +2.0 by EHCI. That means different USB devices plugged into the very +same physical port on the host may show up on different host buses +depending on the speed. Supposing that devices plugged into a given +physical port appear as bus 1 + port 1 for 2.0 devices and bus 3 + port 1 +for 1.1 devices, you can pass through any device plugged into that port +and also assign it to the correct USB bus in QEMU like this: + +.. parsed-literal:: + + |qemu_system| -M pc [...] \\ + -usb \\ + -device usb-ehci,id=ehci \\ + -device usb-host,bus=usb-bus.0,hostbus=3,hostport=1 \\ + -device usb-host,bus=ehci.0,hostbus=1,hostport=1 + +``usb-host`` properties for reset behavior +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``guest-reset`` and ``guest-reset-all`` properties control +whenever the guest is allowed to reset the physical usb device on the +host. There are three cases: + +``guest-reset=false`` + The guest is not allowed to reset the (physical) usb device. + +``guest-reset=true,guest-resets-all=false`` + The guest is allowed to reset the device when it is not yet + initialized (aka no usb bus address assigned). Usually this results + in one guest reset being allowed. This is the default behavior. + +``guest-reset=true,guest-resets-all=true`` + The guest is allowed to reset the device as it pleases. + +The reason for this existing are broken usb devices. In theory one +should be able to reset (and re-initialize) usb devices at any time. +In practice that may result in shitty usb device firmware crashing and +the device not responding any more until you power-cycle (aka un-plug +and re-plug) it. + +What works best pretty much depends on the behavior of the specific +usb device at hand, so it's a trial-and-error game. If the default +doesn't work, try another option and see whenever the situation +improves. + +record usb transfers +^^^^^^^^^^^^^^^^^^^^ + +All usb devices have support for recording the usb traffic. This can +be enabled using the ``pcap=<file>`` property, for example: + +``-device usb-mouse,pcap=mouse.pcap`` + +The pcap files are compatible with the linux kernels usbmon. Many +tools, including ``wireshark``, can decode and inspect these trace +files. diff --git a/docs/system/devices/vhost-user-rng.rst b/docs/system/devices/vhost-user-rng.rst new file mode 100644 index 00000000..a145d410 --- /dev/null +++ b/docs/system/devices/vhost-user-rng.rst @@ -0,0 +1,39 @@ +QEMU vhost-user-rng - RNG emulation +=================================== + +Background +---------- + +What follows builds on the material presented in vhost-user.rst - it should +be reviewed before moving forward with the content in this file. + +Description +----------- + +The vhost-user-rng device implementation was designed to work with a random +number generator daemon such as the one found in the vhost-device crate of +the rust-vmm project available on github [1]. + +[1]. https://github.com/rust-vmm/vhost-device + +Examples +-------- + +The daemon should be started first: + +:: + + host# vhost-device-rng --socket-path=rng.sock -c 1 -m 512 -p 1000 + +The QEMU invocation needs to create a chardev socket the device can +use to communicate as well as share the guests memory over a memfd. + +:: + + host# qemu-system \ + -chardev socket,path=$(PATH)/rng.sock,id=rng0 \ + -device vhost-user-rng-pci,chardev=rng0 \ + -m 4096 \ + -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \ + -numa node,memdev=mem \ + ... diff --git a/docs/system/devices/vhost-user.rst b/docs/system/devices/vhost-user.rst new file mode 100644 index 00000000..86128114 --- /dev/null +++ b/docs/system/devices/vhost-user.rst @@ -0,0 +1,59 @@ +.. _vhost_user: + +vhost-user back ends +-------------------- + +vhost-user back ends are way to service the request of VirtIO devices +outside of QEMU itself. To do this there are a number of things +required. + +vhost-user device +=================== + +These are simple stub devices that ensure the VirtIO device is visible +to the guest. The code is mostly boilerplate although each device has +a ``chardev`` option which specifies the ID of the ``--chardev`` +device that connects via a socket to the vhost-user *daemon*. + +vhost-user daemon +================= + +This is a separate process that is connected to by QEMU via a socket +following the :ref:`vhost_user_proto`. There are a number of daemons +that can be built when enabled by the project although any daemon that +meets the specification for a given device can be used. + +Shared memory object +==================== + +In order for the daemon to access the VirtIO queues to process the +requests it needs access to the guest's address space. This is +achieved via the ``memory-backend-file`` or ``memory-backend-memfd`` +objects. A reference to a file-descriptor which can access this object +will be passed via the socket as part of the protocol negotiation. + +Currently the shared memory object needs to match the size of the main +system memory as defined by the ``-m`` argument. + +Example +======= + +First start you daemon. + +.. parsed-literal:: + + $ virtio-foo --socket-path=/var/run/foo.sock $OTHER_ARGS + +The you start your QEMU instance specifying the device, chardev and +memory objects. + +.. parsed-literal:: + + $ |qemu_system| \\ + -m 4096 \\ + -chardev socket,id=ba1,path=/var/run/foo.sock \\ + -device vhost-user-foo,chardev=ba1,$OTHER_ARGS \\ + -object memory-backend-memfd,id=mem,size=4G,share=on \\ + -numa node,memdev=mem \\ + ... + diff --git a/docs/system/devices/virtio-pmem.rst b/docs/system/devices/virtio-pmem.rst new file mode 100644 index 00000000..c82ac067 --- /dev/null +++ b/docs/system/devices/virtio-pmem.rst @@ -0,0 +1,76 @@ + +=========== +virtio pmem +=========== + +This document explains the setup and usage of the virtio pmem device. +The virtio pmem device is a paravirtualized persistent memory device +on regular (i.e non-NVDIMM) storage. + +Usecase +------- + +Virtio pmem allows to bypass the guest page cache and directly use +host page cache. This reduces guest memory footprint as the host can +make efficient memory reclaim decisions under memory pressure. + +How does virtio-pmem compare to the nvdimm emulation? +----------------------------------------------------- + +NVDIMM emulation on regular (i.e. non-NVDIMM) host storage does not +persist the guest writes as there are no defined semantics in the device +specification. The virtio pmem device provides guest write persistence +on non-NVDIMM host storage. + +virtio pmem usage +----------------- + +A virtio pmem device backed by a memory-backend-file can be created on +the QEMU command line as in the following example:: + + -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G + -device virtio-pmem-pci,memdev=mem1,id=nv1 + +where: + + - "object memory-backend-file,id=mem1,share,mem-path=<image>, size=<image size>" + creates a backend file with the specified size. + + - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem + pci device whose storage is provided by above memory backend device. + +Multiple virtio pmem devices can be created if multiple pairs of "-object" +and "-device" are provided. + +Hotplug +------- + +Virtio pmem devices can be hotplugged via the QEMU monitor. First, the +memory backing has to be added via 'object_add'; afterwards, the virtio +pmem device can be added via 'device_add'. + +For example, the following commands add another 4GB virtio pmem device to +the guest:: + + (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G + (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2 + +Guest Data Persistence +---------------------- + +Guest data persistence on non-NVDIMM requires guest userspace applications +to perform fsync/msync. This is different from a real nvdimm backend where +no additional fsync/msync is required. This is to persist guest writes in +host backing file which otherwise remains in host page cache and there is +risk of losing the data in case of power failure. + +With virtio pmem device, MAP_SYNC mmap flag is not supported. This provides +a hint to application to perform fsync for write persistence. + +Limitations +----------- + +- Real nvdimm device backend is not supported. +- virtio pmem hotunplug is not supported. +- ACPI NVDIMM features like regions/namespaces are not supported. +- ndctl command is not supported. diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst new file mode 100644 index 00000000..453eb73f --- /dev/null +++ b/docs/system/gdb.rst @@ -0,0 +1,194 @@ +.. _GDB usage: + +GDB usage +--------- + +QEMU supports working with gdb via gdb's remote-connection facility +(the "gdbstub"). This allows you to debug guest code in the same +way that you might with a low-level debug facility like JTAG +on real hardware. You can stop and start the virtual machine, +examine state like registers and memory, and set breakpoints and +watchpoints. + +In order to use gdb, launch QEMU with the ``-s`` and ``-S`` options. +The ``-s`` option will make QEMU listen for an incoming connection +from gdb on TCP port 1234, and ``-S`` will make QEMU not start the +guest until you tell it to from gdb. (If you want to specify which +TCP port to use or to use something other than TCP for the gdbstub +connection, use the ``-gdb dev`` option instead of ``-s``. See +`Using unix sockets`_ for an example.) + +.. parsed-literal:: + + |qemu_system| -s -S -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + +QEMU will launch but will silently wait for gdb to connect. + +Then launch gdb on the 'vmlinux' executable:: + + > gdb vmlinux + +In gdb, connect to QEMU:: + + (gdb) target remote localhost:1234 + +Then you can use gdb normally. For example, type 'c' to launch the +kernel:: + + (gdb) c + +Here are some useful tips in order to use gdb on system code: + +1. Use ``info reg`` to display all the CPU registers. + +2. Use ``x/10i $eip`` to display the code at the PC position. + +3. Use ``set architecture i8086`` to dump 16 bit code. Then use + ``x/10i $cs*16+$eip`` to dump the code at the PC position. + +Debugging multicore machines +============================ + +GDB's abstraction for debugging targets with multiple possible +parallel flows of execution is a two layer one: it supports multiple +"inferiors", each of which can have multiple "threads". When the QEMU +machine has more than one CPU, QEMU exposes each CPU cluster as a +separate "inferior", where each CPU within the cluster is a separate +"thread". Most QEMU machine types have identical CPUs, so there is a +single cluster which has all the CPUs in it. A few machine types are +heterogeneous and have multiple clusters: for example the ``sifive_u`` +machine has a cluster with one E51 core and a second cluster with four +U54 cores. Here the E51 is the only thread in the first inferior, and +the U54 cores are all threads in the second inferior. + +When you connect gdb to the gdbstub, it will automatically +connect to the first inferior; you can display the CPUs in this +cluster using the gdb ``info thread`` command, and switch between +them using gdb's usual thread-management commands. + +For multi-cluster machines, unfortunately gdb does not by default +handle multiple inferiors, and so you have to explicitly connect +to them. First, you must connect with the ``extended-remote`` +protocol, not ``remote``:: + + (gdb) target extended-remote localhost:1234 + +Once connected, gdb will have a single inferior, for the +first cluster. You need to create inferiors for the other +clusters and attach to them, like this:: + + (gdb) add-inferior + Added inferior 2 + (gdb) inferior 2 + [Switching to inferior 2 [<null>] (<noexec>)] + (gdb) attach 2 + Attaching to process 2 + warning: No executable has been specified and target does not support + determining executable automatically. Try using the "file" command. + 0x00000000 in ?? () + +Once you've done this, ``info threads`` will show CPUs in +all the clusters you have attached to:: + + (gdb) info threads + Id Target Id Frame + 1.1 Thread 1.1 (cortex-m33-arm-cpu cpu [running]) 0x00000000 in ?? () + * 2.1 Thread 2.2 (cortex-m33-arm-cpu cpu [halted ]) 0x00000000 in ?? () + +You probably also want to set gdb to ``schedule-multiple`` mode, +so that when you tell gdb to ``continue`` it resumes all CPUs, +not just those in the cluster you are currently working on:: + + (gdb) set schedule-multiple on + +Using unix sockets +================== + +An alternate method for connecting gdb to the QEMU gdbstub is to use +a unix socket (if supported by your operating system). This is useful when +running several tests in parallel, or if you do not have a known free TCP +port (e.g. when running automated tests). + +First create a chardev with the appropriate options, then +instruct the gdbserver to use that device: + +.. parsed-literal:: + + |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ... + +Start gdb as before, but this time connect using the path to +the socket:: + + (gdb) target remote /tmp/gdb-socket + +Note that to use a unix socket for the connection you will need +gdb version 9.0 or newer. + +Advanced debugging options +========================== + +Changing single-stepping behaviour +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The default single stepping behavior is step with the IRQs and timer +service routines off. It is set this way because when gdb executes a +single step it expects to advance beyond the current instruction. With +the IRQs and timer service routines on, a single step might jump into +the one of the interrupt or exception vectors instead of executing the +current instruction. This means you may hit the same breakpoint a number +of times before executing the instruction gdb wants to have executed. +Because there are rare circumstances where you want to single step into +an interrupt vector the behavior can be controlled from GDB. There are +three commands you can query and set the single step behavior: + +``maintenance packet qqemu.sstepbits`` + This will display the MASK bits used to control the single stepping + IE: + + :: + + (gdb) maintenance packet qqemu.sstepbits + sending: "qqemu.sstepbits" + received: "ENABLE=1,NOIRQ=2,NOTIMER=4" + +``maintenance packet qqemu.sstep`` + This will display the current value of the mask used when single + stepping IE: + + :: + + (gdb) maintenance packet qqemu.sstep + sending: "qqemu.sstep" + received: "0x7" + +``maintenance packet Qqemu.sstep=HEX_VALUE`` + This will change the single step mask, so if wanted to enable IRQs on + the single step, but not timers, you would use: + + :: + + (gdb) maintenance packet Qqemu.sstep=0x5 + sending: "qemu.sstep=0x5" + received: "OK" + +Examining physical memory +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Another feature that QEMU gdbstub provides is to toggle the memory GDB +works with, by default GDB will show the current process memory respecting +the virtual address translation. + +If you want to examine/change the physical memory you can set the gdbstub +to work with the physical memory rather with the virtual one. + +The memory mode can be checked by sending the following command: + +``maintenance packet qqemu.PhyMemMode`` + This will return either 0 or 1, 1 indicates you are currently in the + physical memory mode. + +``maintenance packet Qqemu.PhyMemMode:1`` + This will change the memory mode to physical memory. + +``maintenance packet Qqemu.PhyMemMode:0`` + This will change it back to normal memory mode. diff --git a/docs/system/generic-loader.rst b/docs/system/generic-loader.rst new file mode 100644 index 00000000..4f9fb005 --- /dev/null +++ b/docs/system/generic-loader.rst @@ -0,0 +1,120 @@ +.. + Copyright (c) 2016, Xilinx Inc. + + This work is licensed under the terms of the GNU GPL, version 2 or later. See + the COPYING file in the top-level directory. + +Generic Loader +-------------- + +The 'loader' device allows the user to load multiple images or values into +QEMU at startup. + +Loading Data into Memory Values +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The loader device allows memory values to be set from the command line. This +can be done by following the syntax below:: + + -device loader,addr=<addr>,data=<data>,data-len=<data-len> \ + [,data-be=<data-be>][,cpu-num=<cpu-num>] + +``<addr>`` + The address to store the data in. + +``<data>`` + The value to be written to the address. The maximum size of the data + is 8 bytes. + +``<data-len>`` + The length of the data in bytes. This argument must be included if + the data argument is. + +``<data-be>`` + Set to true if the data to be stored on the guest should be written + as big endian data. The default is to write little endian data. + +``<cpu-num>`` + The number of the CPU's address space where the data should be + loaded. If not specified the address space of the first CPU is used. + +All values are parsed using the standard QemuOps parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of loading value 0x8000000e to address 0xfd1a0104 is:: + + -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 + +Setting a CPU's Program Counter +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The loader device allows the CPU's PC to be set from the command line. This +can be done by following the syntax below:: + + -device loader,addr=<addr>,cpu-num=<cpu-num> + +``<addr>`` + The value to use as the CPU's PC. + +``<cpu-num>`` + The number of the CPU whose PC should be set to the specified value. + +All values are parsed using the standard QemuOpts parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of setting CPU 0's PC to 0x8000 is:: + + -device loader,addr=0x8000,cpu-num=0 + +Loading Files +^^^^^^^^^^^^^ + +The loader device also allows files to be loaded into memory. It can load ELF, +U-Boot, and Intel HEX executable formats as well as raw images. The syntax is +shown below: + + -device loader,file=<file>[,addr=<addr>][,cpu-num=<cpu-num>][,force-raw=<raw>] + +``<file>`` + A file to be loaded into memory + +``<addr>`` + The memory address where the file should be loaded. This is required + for raw images and ignored for non-raw files. + +``<cpu-num>`` + This specifies the CPU that should be used. This is an + optional argument and will cause the CPU's PC to be set to the + memory address where the raw file is loaded or the entry point + specified in the executable format header. This option should only + be used for the boot image. This will also cause the image to be + written to the specified CPU's address space. If not specified, the + default is CPU 0. + +``<force-raw>`` + Setting 'force-raw=on' forces the file to be treated as a raw image. + This can be used to load supported executable formats as if they + were raw. + +All values are parsed using the standard QemuOpts parsing. This allows the user +to specify any values in any format supported. By default the values +will be parsed as decimal. To use hex values the user should prefix the number +with a '0x'. + +An example of loading an ELF file which CPU0 will boot is shown below:: + + -device loader,file=./images/boot.elf,cpu-num=0 + +Restrictions and ToDos +^^^^^^^^^^^^^^^^^^^^^^ + +At the moment it is just assumed that if you specify a cpu-num then +you want to set the PC as well. This might not always be the case. In +future the internal state 'set_pc' (which exists in the generic loader +now) should be exposed to the user so that they can choose if the PC +is set or not. + + diff --git a/docs/system/guest-loader.rst b/docs/system/guest-loader.rst new file mode 100644 index 00000000..9ef9776b --- /dev/null +++ b/docs/system/guest-loader.rst @@ -0,0 +1,54 @@ +.. + Copyright (c) 2020, Linaro + +Guest Loader +------------ + +The guest loader is similar to the ``generic-loader`` although it is +aimed at a particular use case of loading hypervisor guests. This is +useful for debugging hypervisors without having to jump through the +hoops of firmware and boot-loaders. + +The guest loader does two things: + + - load blobs (kernels and initial ram disks) into memory + - sets platform FDT data so hypervisors can find and boot them + +This is what is typically done by a boot-loader like grub using it's +multi-boot capability. A typical example would look like: + +.. parsed-literal:: + + |qemu_system| -kernel ~/xen.git/xen/xen \ + -append "dom0_mem=1G,max:1G loglvl=all guest_loglvl=all" \ + -device guest-loader,addr=0x42000000,kernel=Image,bootargs="root=/dev/sda2 ro console=hvc0 earlyprintk=xen" \ + -device guest-loader,addr=0x47000000,initrd=rootfs.cpio + +In the above example the Xen hypervisor is loaded by the -kernel +parameter and passed it's boot arguments via -append. The Dom0 guest +is loaded into the areas of memory. Each blob will get +``/chosen/module@<addr>`` entry in the FDT to indicate it's location and +size. Additional information can be passed with by using additional +arguments. + +Currently the only supported machines which use FDT data to boot are +the ARM and RiscV ``virt`` machines. + +Arguments +^^^^^^^^^ + +The full syntax of the guest-loader is:: + + -device guest-loader,addr=<addr>[,kernel=<file>,[bootargs=<args>]][,initrd=<file>] + +``addr=<addr>`` + This is mandatory and indicates the start address of the blob. + +``kernel|initrd=<file>`` + Indicates the filename of the kernel or initrd blob. Both blobs will + have the "multiboot,module" compatibility string as well as + "multiboot,kernel" or "multiboot,ramdisk" as appropriate. + +``bootargs=<args>`` + This is an optional field for kernel blobs which will pass command + like via the ``/chosen/module@<addr>/bootargs`` node. diff --git a/docs/system/i386/amd-memory-encryption.rst b/docs/system/i386/amd-memory-encryption.rst new file mode 100644 index 00000000..dcf4add0 --- /dev/null +++ b/docs/system/i386/amd-memory-encryption.rst @@ -0,0 +1,206 @@ +AMD Secure Encrypted Virtualization (SEV) +========================================= + +Secure Encrypted Virtualization (SEV) is a feature found on AMD processors. + +SEV is an extension to the AMD-V architecture which supports running encrypted +virtual machines (VMs) under the control of KVM. Encrypted VMs have their pages +(code and data) secured such that only the guest itself has access to the +unencrypted version. Each encrypted VM is associated with a unique encryption +key; if its data is accessed by a different entity using a different key the +encrypted guests data will be incorrectly decrypted, leading to unintelligible +data. + +Key management for this feature is handled by a separate processor known as the +AMD secure processor (AMD-SP), which is present in AMD SOCs. Firmware running +inside the AMD-SP provides commands to support a common VM lifecycle. This +includes commands for launching, snapshotting, migrating and debugging the +encrypted guest. These SEV commands can be issued via KVM_MEMORY_ENCRYPT_OP +ioctls. + +Secure Encrypted Virtualization - Encrypted State (SEV-ES) builds on the SEV +support to additionally protect the guest register state. In order to allow a +hypervisor to perform functions on behalf of a guest, there is architectural +support for notifying a guest's operating system when certain types of VMEXITs +are about to occur. This allows the guest to selectively share information with +the hypervisor to satisfy the requested function. + +Launching +--------- + +Boot images (such as bios) must be encrypted before a guest can be booted. The +``MEMORY_ENCRYPT_OP`` ioctl provides commands to encrypt the images: ``LAUNCH_START``, +``LAUNCH_UPDATE_DATA``, ``LAUNCH_MEASURE`` and ``LAUNCH_FINISH``. These four commands +together generate a fresh memory encryption key for the VM, encrypt the boot +images and provide a measurement than can be used as an attestation of a +successful launch. + +For a SEV-ES guest, the ``LAUNCH_UPDATE_VMSA`` command is also used to encrypt the +guest register state, or VM save area (VMSA), for all of the guest vCPUs. + +``LAUNCH_START`` is called first to create a cryptographic launch context within +the firmware. To create this context, guest owner must provide a guest policy, +its public Diffie-Hellman key (PDH) and session parameters. These inputs +should be treated as a binary blob and must be passed as-is to the SEV firmware. + +The guest policy is passed as plaintext. A hypervisor may choose to read it, +but should not modify it (any modification of the policy bits will result +in bad measurement). The guest policy is a 4-byte data structure containing +several flags that restricts what can be done on a running SEV guest. +See SEV API Spec ([SEVAPI]_) section 3 and 6.2 for more details. + +The guest policy can be provided via the ``policy`` property:: + + # ${QEMU} \ + sev-guest,id=sev0,policy=0x1...\ + +Setting the "SEV-ES required" policy bit (bit 2) will launch the guest as a +SEV-ES guest:: + + # ${QEMU} \ + sev-guest,id=sev0,policy=0x5...\ + +The guest owner provided DH certificate and session parameters will be used to +establish a cryptographic session with the guest owner to negotiate keys used +for the attestation. + +The DH certificate and session blob can be provided via the ``dh-cert-file`` and +``session-file`` properties:: + + # ${QEMU} \ + sev-guest,id=sev0,dh-cert-file=<file1>,session-file=<file2> + +``LAUNCH_UPDATE_DATA`` encrypts the memory region using the cryptographic context +created via the ``LAUNCH_START`` command. If required, this command can be called +multiple times to encrypt different memory regions. The command also calculates +the measurement of the memory contents as it encrypts. + +``LAUNCH_UPDATE_VMSA`` encrypts all the vCPU VMSAs for a SEV-ES guest using the +cryptographic context created via the ``LAUNCH_START`` command. The command also +calculates the measurement of the VMSAs as it encrypts them. + +``LAUNCH_MEASURE`` can be used to retrieve the measurement of encrypted memory and, +for a SEV-ES guest, encrypted VMSAs. This measurement is a signature of the +memory contents and, for a SEV-ES guest, the VMSA contents, that can be sent +to the guest owner as an attestation that the memory and VMSAs were encrypted +correctly by the firmware. The guest owner may wait to provide the guest +confidential information until it can verify the attestation measurement. +Since the guest owner knows the initial contents of the guest at boot, the +attestation measurement can be verified by comparing it to what the guest owner +expects. + +``LAUNCH_FINISH`` finalizes the guest launch and destroys the cryptographic +context. + +See SEV API Spec ([SEVAPI]_) 'Launching a guest' usage flow (Appendix A) for the +complete flow chart. + +To launch a SEV guest:: + + # ${QEMU} \ + -machine ...,confidential-guest-support=sev0 \ + -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 + +To launch a SEV-ES guest:: + + # ${QEMU} \ + -machine ...,confidential-guest-support=sev0 \ + -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x5 + +An SEV-ES guest has some restrictions as compared to a SEV guest. Because the +guest register state is encrypted and cannot be updated by the VMM/hypervisor, +a SEV-ES guest: + + - Does not support SMM - SMM support requires updating the guest register + state. + - Does not support reboot - a system reset requires updating the guest register + state. + - Requires in-kernel irqchip - the burden is placed on the hypervisor to + manage booting APs. + +Calculating expected guest launch measurement +--------------------------------------------- + +In order to verify the guest launch measurement, The Guest Owner must compute +it in the exact same way as it is calculated by the AMD-SP. SEV API Spec +([SEVAPI]_) section 6.5.1 describes the AMD-SP operations: + + GCTX.LD is finalized, producing the hash digest of all plaintext data + imported into the guest. + + The launch measurement is calculated as: + + HMAC(0x04 || API_MAJOR || API_MINOR || BUILD || GCTX.POLICY || GCTX.LD || MNONCE; GCTX.TIK) + + where "||" represents concatenation. + +The values of API_MAJOR, API_MINOR, BUILD, and GCTX.POLICY can be obtained +from the ``query-sev`` qmp command. + +The value of MNONCE is part of the response of ``query-sev-launch-measure``: it +is the last 16 bytes of the base64-decoded data field (see SEV API Spec +([SEVAPI]_) section 6.5.2 Table 52: LAUNCH_MEASURE Measurement Buffer). + +The value of GCTX.LD is +``SHA256(firmware_blob || kernel_hashes_blob || vmsas_blob)``, where: + +* ``firmware_blob`` is the content of the entire firmware flash file (for + example, ``OVMF.fd``). Note that you must build a stateless firmware file + which doesn't use an NVRAM store, because the NVRAM area is not measured, and + therefore it is not secure to use a firmware which uses state from an NVRAM + store. +* if kernel is used, and ``kernel-hashes=on``, then ``kernel_hashes_blob`` is + the content of PaddedSevHashTable (including the zero padding), which itself + includes the hashes of kernel, initrd, and cmdline that are passed to the + guest. The PaddedSevHashTable struct is defined in ``target/i386/sev.c``. +* if SEV-ES is enabled (``policy & 0x4 != 0``), ``vmsas_blob`` is the + concatenation of all VMSAs of the guest vcpus. Each VMSA is 4096 bytes long; + its content is defined inside Linux kernel code as ``struct vmcb_save_area``, + or in AMD APM Volume 2 ([APMVOL2]_) Table B-2: VMCB Layout, State Save Area. + +If kernel hashes are not used, or SEV-ES is disabled, use empty blobs for +``kernel_hashes_blob`` and ``vmsas_blob`` as needed. + +Debugging +--------- + +Since the memory contents of a SEV guest are encrypted, hypervisor access to +the guest memory will return cipher text. If the guest policy allows debugging, +then a hypervisor can use the DEBUG_DECRYPT and DEBUG_ENCRYPT commands to access +the guest memory region for debug purposes. This is not supported in QEMU yet. + +Snapshot/Restore +---------------- + +TODO + +Live Migration +--------------- + +TODO + +References +---------- + +`AMD Memory Encryption whitepaper +<https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf>`_ + +.. [SEVAPI] `Secure Encrypted Virtualization API + <https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf>`_ + +.. [APMVOL2] `AMD64 Architecture Programmer's Manual Volume 2: System Programming + <https://www.amd.com/system/files/TechDocs/24593.pdf>`_ + +KVM Forum slides: + +* `AMD’s Virtualization Memory Encryption (2016) + <http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf>`_ +* `Extending Secure Encrypted Virtualization With SEV-ES (2018) + <https://www.linux-kvm.org/images/9/94/Extending-Secure-Encrypted-Virtualization-with-SEV-ES-Thomas-Lendacky-AMD.pdf>`_ + +`AMD64 Architecture Programmer's Manual: +<http://support.amd.com/TechDocs/24593.pdf>`_ + +* SME is section 7.10 +* SEV is section 15.34 +* SEV-ES is section 15.35 diff --git a/docs/system/i386/cpu.rst b/docs/system/i386/cpu.rst new file mode 100644 index 00000000..738719da --- /dev/null +++ b/docs/system/i386/cpu.rst @@ -0,0 +1 @@ +.. include:: ../cpu-models-x86.rst.inc diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst new file mode 100644 index 00000000..2505dc4c --- /dev/null +++ b/docs/system/i386/hyperv.rst @@ -0,0 +1,288 @@ +Hyper-V Enlightenments +====================== + + +Description +----------- + +In some cases when implementing a hardware interface in software is slow, KVM +implements its own paravirtualized interfaces. This works well for Linux as +guest support for such features is added simultaneously with the feature itself. +It may, however, be hard-to-impossible to add support for these interfaces to +proprietary OSes, namely, Microsoft Windows. + +KVM on x86 implements Hyper-V Enlightenments for Windows guests. These features +make Windows and Hyper-V guests think they're running on top of a Hyper-V +compatible hypervisor and use Hyper-V specific features. + + +Setup +----- + +No Hyper-V enlightenments are enabled by default by either KVM or QEMU. In +QEMU, individual enlightenments can be enabled through CPU flags, e.g: + +.. parsed-literal:: + + |qemu_system| --enable-kvm --cpu host,hv_relaxed,hv_vpindex,hv_time, ... + +Sometimes there are dependencies between enlightenments, QEMU is supposed to +check that the supplied configuration is sane. + +When any set of the Hyper-V enlightenments is enabled, QEMU changes hypervisor +identification (CPUID 0x40000000..0x4000000A) to Hyper-V. KVM identification +and features are kept in leaves 0x40000100..0x40000101. + + +Existing enlightenments +----------------------- + +``hv-relaxed`` + This feature tells guest OS to disable watchdog timeouts as it is running on a + hypervisor. It is known that some Windows versions will do this even when they + see 'hypervisor' CPU flag. + +``hv-vapic`` + Provides so-called VP Assist page MSR to guest allowing it to work with APIC + more efficiently. In particular, this enlightenment allows paravirtualized + (exit-less) EOI processing. + +``hv-spinlocks`` = xxx + Enables paravirtualized spinlocks. The parameter indicates how many times + spinlock acquisition should be attempted before indicating the situation to the + hypervisor. A special value 0xffffffff indicates "never notify". + +``hv-vpindex`` + Provides HV_X64_MSR_VP_INDEX (0x40000002) MSR to the guest which has Virtual + processor index information. This enlightenment makes sense in conjunction with + hv-synic, hv-stimer and other enlightenments which require the guest to know its + Virtual Processor indices (e.g. when VP index needs to be passed in a + hypercall). + +``hv-runtime`` + Provides HV_X64_MSR_VP_RUNTIME (0x40000010) MSR to the guest. The MSR keeps the + virtual processor run time in 100ns units. This gives guest operating system an + idea of how much time was 'stolen' from it (when the virtual CPU was preempted + to perform some other work). + +``hv-crash`` + Provides HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 (0x40000100..0x40000105) and + HV_X64_MSR_CRASH_CTL (0x40000105) MSRs to the guest. These MSRs are written to + by the guest when it crashes, HV_X64_MSR_CRASH_P0..HV_X64_MSR_CRASH_P5 MSRs + contain additional crash information. This information is outputted in QEMU log + and through QAPI. + Note: unlike under genuine Hyper-V, write to HV_X64_MSR_CRASH_CTL causes guest + to shutdown. This effectively blocks crash dump generation by Windows. + +``hv-time`` + Enables two Hyper-V-specific clocksources available to the guest: MSR-based + Hyper-V clocksource (HV_X64_MSR_TIME_REF_COUNT, 0x40000020) and Reference TSC + page (enabled via MSR HV_X64_MSR_REFERENCE_TSC, 0x40000021). Both clocksources + are per-guest, Reference TSC page clocksource allows for exit-less time stamp + readings. Using this enlightenment leads to significant speedup of all timestamp + related operations. + +``hv-synic`` + Enables Hyper-V Synthetic interrupt controller - an extension of a local APIC. + When enabled, this enlightenment provides additional communication facilities + to the guest: SynIC messages and Events. This is a pre-requisite for + implementing VMBus devices (not yet in QEMU). Additionally, this enlightenment + is needed to enable Hyper-V synthetic timers. SynIC is controlled through MSRs + HV_X64_MSR_SCONTROL..HV_X64_MSR_EOM (0x40000080..0x40000084) and + HV_X64_MSR_SINT0..HV_X64_MSR_SINT15 (0x40000090..0x4000009F) + + Requires: ``hv-vpindex`` + +``hv-stimer`` + Enables Hyper-V synthetic timers. There are four synthetic timers per virtual + CPU controlled through HV_X64_MSR_STIMER0_CONFIG..HV_X64_MSR_STIMER3_COUNT + (0x400000B0..0x400000B7) MSRs. These timers can work either in single-shot or + periodic mode. It is known that certain Windows versions revert to using HPET + (or even RTC when HPET is unavailable) extensively when this enlightenment is + not provided; this can lead to significant CPU consumption, even when virtual + CPU is idle. + + Requires: ``hv-vpindex``, ``hv-synic``, ``hv-time`` + +``hv-tlbflush`` + Enables paravirtualized TLB shoot-down mechanism. On x86 architecture, remote + TLB flush procedure requires sending IPIs and waiting for other CPUs to perform + local TLB flush. In virtualized environment some virtual CPUs may not even be + scheduled at the time of the call and may not require flushing (or, flushing + may be postponed until the virtual CPU is scheduled). hv-tlbflush enlightenment + implements TLB shoot-down through hypervisor enabling the optimization. + + Requires: ``hv-vpindex`` + +``hv-ipi`` + Enables paravirtualized IPI send mechanism. HvCallSendSyntheticClusterIpi + hypercall may target more than 64 virtual CPUs simultaneously, doing the same + through APIC requires more than one access (and thus exit to the hypervisor). + + Requires: ``hv-vpindex`` + +``hv-vendor-id`` = xxx + This changes Hyper-V identification in CPUID 0x40000000.EBX-EDX from the default + "Microsoft Hv". The parameter should be no longer than 12 characters. According + to the specification, guests shouldn't use this information and it is unknown + if there is a Windows version which acts differently. + Note: hv-vendor-id is not an enlightenment and thus doesn't enable Hyper-V + identification when specified without some other enlightenment. + +``hv-reset`` + Provides HV_X64_MSR_RESET (0x40000003) MSR to the guest allowing it to reset + itself by writing to it. Even when this MSR is enabled, it is not a recommended + way for Windows to perform system reboot and thus it may not be used. + +``hv-frequencies`` + Provides HV_X64_MSR_TSC_FREQUENCY (0x40000022) and HV_X64_MSR_APIC_FREQUENCY + (0x40000023) allowing the guest to get its TSC/APIC frequencies without doing + measurements. + +``hv-reenlightenment`` + The enlightenment is nested specific, it targets Hyper-V on KVM guests. When + enabled, it provides HV_X64_MSR_REENLIGHTENMENT_CONTROL (0x40000106), + HV_X64_MSR_TSC_EMULATION_CONTROL (0x40000107)and HV_X64_MSR_TSC_EMULATION_STATUS + (0x40000108) MSRs allowing the guest to get notified when TSC frequency changes + (only happens on migration) and keep using old frequency (through emulation in + the hypervisor) until it is ready to switch to the new one. This, in conjunction + with ``hv-frequencies``, allows Hyper-V on KVM to pass stable clocksource + (Reference TSC page) to its own guests. + + Note, KVM doesn't fully support re-enlightenment notifications and doesn't + emulate TSC accesses after migration so 'tsc-frequency=' CPU option also has to + be specified to make migration succeed. The destination host has to either have + the same TSC frequency or support TSC scaling CPU feature. + + Recommended: ``hv-frequencies`` + +``hv-evmcs`` + The enlightenment is nested specific, it targets Hyper-V on KVM guests. When + enabled, it provides Enlightened VMCS version 1 feature to the guest. The feature + implements paravirtualized protocol between L0 (KVM) and L1 (Hyper-V) + hypervisors making L2 exits to the hypervisor faster. The feature is Intel-only. + + Note: some virtualization features (e.g. Posted Interrupts) are disabled when + hv-evmcs is enabled. It may make sense to measure your nested workload with and + without the feature to find out if enabling it is beneficial. + + Requires: ``hv-vapic`` + +``hv-stimer-direct`` + Hyper-V specification allows synthetic timer operation in two modes: "classic", + when expiration event is delivered as SynIC message and "direct", when the event + is delivered via normal interrupt. It is known that nested Hyper-V can only + use synthetic timers in direct mode and thus ``hv-stimer-direct`` needs to be + enabled. + + Requires: ``hv-vpindex``, ``hv-synic``, ``hv-time``, ``hv-stimer`` + +``hv-avic`` (``hv-apicv``) + The enlightenment allows to use Hyper-V SynIC with hardware APICv/AVIC enabled. + Normally, Hyper-V SynIC disables these hardware feature and suggests the guest + to use paravirtualized AutoEOI feature. + Note: enabling this feature on old hardware (without APICv/AVIC support) may + have negative effect on guest's performance. + +``hv-no-nonarch-coresharing`` = on/off/auto + This enlightenment tells guest OS that virtual processors will never share a + physical core unless they are reported as sibling SMT threads. This information + is required by Windows and Hyper-V guests to properly mitigate SMT related CPU + vulnerabilities. + + When the option is set to 'auto' QEMU will enable the feature only when KVM + reports that non-architectural coresharing is impossible, this means that + hyper-threading is not supported or completely disabled on the host. This + setting also prevents migration as SMT settings on the destination may differ. + When the option is set to 'on' QEMU will always enable the feature, regardless + of host setup. To keep guests secure, this can only be used in conjunction with + exposing correct vCPU topology and vCPU pinning. + +``hv-version-id-build``, ``hv-version-id-major``, ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``, ``hv-version-id-snumber`` + This changes Hyper-V version identification in CPUID 0x40000002.EAX-EDX from the + default (WS2016). + + - ``hv-version-id-build`` sets 'Build Number' (32 bits) + - ``hv-version-id-major`` sets 'Major Version' (16 bits) + - ``hv-version-id-minor`` sets 'Minor Version' (16 bits) + - ``hv-version-id-spack`` sets 'Service Pack' (32 bits) + - ``hv-version-id-sbranch`` sets 'Service Branch' (8 bits) + - ``hv-version-id-snumber`` sets 'Service Number' (24 bits) + + Note: hv-version-id-* are not enlightenments and thus don't enable Hyper-V + identification when specified without any other enlightenments. + +``hv-syndbg`` + Enables Hyper-V synthetic debugger interface, this is a special interface used + by Windows Kernel debugger to send the packets through, rather than sending + them via serial/network . + When enabled, this enlightenment provides additional communication facilities + to the guest: SynDbg messages. + This new communication is used by Windows Kernel debugger rather than sending + packets via serial/network, adding significant performance boost over the other + comm channels. + This enlightenment requires a VMBus device (-device vmbus-bridge,irq=15). + + Requires: ``hv-relaxed``, ``hv_time``, ``hv-vapic``, ``hv-vpindex``, ``hv-synic``, ``hv-runtime``, ``hv-stimer`` + +``hv-emsr-bitmap`` + The enlightenment is nested specific, it targets Hyper-V on KVM guests. When + enabled, it allows L0 (KVM) and L1 (Hyper-V) hypervisors to collaborate to + avoid unnecessary updates to L2 MSR-Bitmap upon vmexits. While the protocol is + supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires + Enlightened VMCS (``hv-evmcs``) feature to also be enabled. + + Recommended: ``hv-evmcs`` (Intel) + +``hv-xmm-input`` + Hyper-V specification allows to pass parameters for certain hypercalls using XMM + registers ("XMM Fast Hypercall Input"). When the feature is in use, it allows + for faster hypercalls processing as KVM can avoid reading guest's memory. + +``hv-tlbflush-ext`` + Allow for extended GVA ranges to be passed to Hyper-V TLB flush hypercalls + (HvFlushVirtualAddressList/HvFlushVirtualAddressListEx). + + Requires: ``hv-tlbflush`` + +``hv-tlbflush-direct`` + The enlightenment is nested specific, it targets Hyper-V on KVM guests. When + enabled, it allows L0 (KVM) to directly handle TLB flush hypercalls from L2 + guest without the need to exit to L1 (Hyper-V) hypervisor. While the feature is + supported for both VMX (Intel) and SVM (AMD), the VMX implementation requires + Enlightened VMCS (``hv-evmcs``) feature to also be enabled. + + Requires: ``hv-vapic`` + + Recommended: ``hv-evmcs`` (Intel) + +Supplementary features +---------------------- + +``hv-passthrough`` + In some cases (e.g. during development) it may make sense to use QEMU in + 'pass-through' mode and give Windows guests all enlightenments currently + supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU + flag. + + Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU + (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` + values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on + the command line. Also, enabling this flag effectively prevents migration as the + list of enabled enlightenments may differ between target and destination hosts. + +``hv-enforce-cpuid`` + By default, KVM allows the guest to use all currently supported Hyper-V + enlightenments when Hyper-V CPUID interface was exposed, regardless of if + some features were not announced in guest visible CPUIDs. ``hv-enforce-cpuid`` + feature alters this behavior and only allows the guest to use exposed Hyper-V + enlightenments. + + +Useful links +------------ +Hyper-V Top Level Functional specification and other information: + +- https://github.com/MicrosoftDocs/Virtualization-Documentation +- https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs + diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst new file mode 100644 index 00000000..1e5a9923 --- /dev/null +++ b/docs/system/i386/kvm-pv.rst @@ -0,0 +1,100 @@ +Paravirtualized KVM features +============================ + +Description +----------- + +In some cases when implementing hardware interfaces in software is slow, ``KVM`` +implements its own paravirtualized interfaces. + +Setup +----- + +Paravirtualized ``KVM`` features are represented as CPU flags. The following +features are enabled by default for any CPU model when ``KVM`` acceleration is +enabled: + +- ``kvmclock`` +- ``kvm-nopiodelay`` +- ``kvm-asyncpf`` +- ``kvm-steal-time`` +- ``kvm-pv-eoi`` +- ``kvmclock-stable-bit`` + +``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with split +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic"). + +Note: when CPU model ``host`` is used, QEMU passes through all supported +paravirtualized ``KVM`` features to the guest. + +Existing features +----------------- + +``kvmclock`` + Expose a ``KVM`` specific paravirtualized clocksource to the guest. Supported + since Linux v2.6.26. + +``kvm-nopiodelay`` + The guest doesn't need to perform delays on PIO operations. Supported since + Linux v2.6.26. + +``kvm-mmu`` + This feature is deprecated. + +``kvm-asyncpf`` + Enable asynchronous page fault mechanism. Supported since Linux v2.6.38. + Note: since Linux v5.10 the feature is deprecated and not enabled by ``KVM``. + Use ``kvm-asyncpf-int`` instead. + +``kvm-steal-time`` + Enable stolen (when guest vCPU is not running) time accounting. Supported + since Linux v3.1. + +``kvm-pv-eoi`` + Enable paravirtualized end-of-interrupt signaling. Supported since Linux + v3.10. + +``kvm-pv-unhalt`` + Enable paravirtualized spinlocks support. Supported since Linux v3.12. + +``kvm-pv-tlb-flush`` + Enable paravirtualized TLB flush mechanism. Supported since Linux v4.16. + +``kvm-pv-ipi`` + Enable paravirtualized IPI mechanism. Supported since Linux v4.19. + +``kvm-poll-control`` + Enable host-side polling on HLT control from the guest. Supported since Linux + v5.10. + +``kvm-pv-sched-yield`` + Enable paravirtualized sched yield feature. Supported since Linux v5.10. + +``kvm-asyncpf-int`` + Enable interrupt based asynchronous page fault mechanism. Supported since Linux + v5.10. + +``kvm-msi-ext-dest-id`` + Support 'Extended Destination ID' for external interrupts. The feature allows + to use up to 32768 CPUs without IRQ remapping (but other limits may apply making + the number of supported vCPUs for a given configuration lower). Supported since + Linux v5.10. + +``kvmclock-stable-bit`` + Tell the guest that guest visible TSC value can be fully trusted for kvmclock + computations and no warps are expected. Supported since Linux v2.6.35. + +Supplementary features +---------------------- + +``kvm-pv-enforce-cpuid`` + Limit the supported paravirtualized feature set to the exposed features only. + Note, by default, ``KVM`` allows the guest to use all currently supported + paravirtualized features even when they were not announced in guest visible + CPUIDs. Supported since Linux v5.10. + + +Useful links +------------ + +Please refer to Documentation/virt/kvm in Linux for additional details. diff --git a/docs/system/i386/microvm.rst b/docs/system/i386/microvm.rst new file mode 100644 index 00000000..1675e37d --- /dev/null +++ b/docs/system/i386/microvm.rst @@ -0,0 +1,128 @@ +'microvm' virtual platform (``microvm``) +======================================== + +``microvm`` is a machine type inspired by ``Firecracker`` and +constructed after its machine model. + +It's a minimalist machine type without ``PCI`` nor ``ACPI`` support, +designed for short-lived guests. microvm also establishes a baseline +for benchmarking and optimizing both QEMU and guest operating systems, +since it is optimized for both boot time and footprint. + + +Supported devices +----------------- + +The microvm machine type supports the following devices: + +- ISA bus +- i8259 PIC (optional) +- i8254 PIT (optional) +- MC146818 RTC (optional) +- One ISA serial port (optional) +- LAPIC +- IOAPIC (with kernel-irqchip=split by default) +- kvmclock (if using KVM) +- fw_cfg +- Up to eight virtio-mmio devices (configured by the user) + + +Limitations +----------- + +Currently, microvm does *not* support the following features: + +- PCI-only devices. +- Hotplug of any kind. +- Live migration across QEMU versions. + + +Using the microvm machine type +------------------------------ + +Machine-specific options +~~~~~~~~~~~~~~~~~~~~~~~~ + +It supports the following machine-specific options: + +- microvm.x-option-roms=bool (Set off to disable loading option ROMs) +- microvm.pit=OnOffAuto (Enable i8254 PIT) +- microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port) +- microvm.pic=OnOffAuto (Enable i8259 PIC) +- microvm.rtc=OnOffAuto (Enable MC146818 RTC) +- microvm.auto-kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline) + + +Boot options +~~~~~~~~~~~~ + +By default, microvm uses ``qboot`` as its BIOS, to obtain better boot +times, but it's also compatible with ``SeaBIOS``. + +As no current FW is able to boot from a block device using +``virtio-mmio`` as its transport, a microvm-based VM needs to be run +using a host-side kernel and, optionally, an initrd image. + + +Running a microvm-based VM +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, microvm aims for maximum compatibility, enabling both +legacy and non-legacy devices. In this example, a VM is created +without passing any additional machine-specific option, using the +legacy ``ISA serial`` device as console:: + + $ qemu-system-x86_64 -M microvm \ + -enable-kvm -cpu host -m 512m -smp 2 \ + -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \ + -nodefaults -no-user-config -nographic \ + -serial stdio \ + -drive id=test,file=test.img,format=raw,if=none \ + -device virtio-blk-device,drive=test \ + -netdev tap,id=tap0,script=no,downscript=no \ + -device virtio-net-device,netdev=tap0 + +While the example above works, you might be interested in reducing the +footprint further by disabling some legacy devices. If you're using +``KVM``, you can disable the ``RTC``, making the Guest rely on +``kvmclock`` exclusively. Additionally, if your host's CPUs have the +``TSC_DEADLINE`` feature, you can also disable both the i8259 PIC and +the i8254 PIT (make sure you're also emulating a CPU with such feature +in the guest). + +This is an example of a VM with all optional legacy features +disabled:: + + $ qemu-system-x86_64 \ + -M microvm,x-option-roms=off,pit=off,pic=off,isa-serial=off,rtc=off \ + -enable-kvm -cpu host -m 512m -smp 2 \ + -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ + -nodefaults -no-user-config -nographic \ + -chardev stdio,id=virtiocon0 \ + -device virtio-serial-device \ + -device virtconsole,chardev=virtiocon0 \ + -drive id=test,file=test.img,format=raw,if=none \ + -device virtio-blk-device,drive=test \ + -netdev tap,id=tap0,script=no,downscript=no \ + -device virtio-net-device,netdev=tap0 + + +Triggering a guest-initiated shut down +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As the microvm machine type includes just a small set of system +devices, some x86 mechanisms for rebooting or shutting down the +system, like sending a key sequence to the keyboard or writing to an +ACPI register, doesn't have any effect in the VM. + +The recommended way to trigger a guest-initiated shut down is by +generating a ``triple-fault``, which will cause the VM to initiate a +reboot. Additionally, if the ``-no-reboot`` argument is present in the +command line, QEMU will detect this event and terminate its own +execution gracefully. + +Linux does support this mechanism, but by default will only be used +after other options have been tried and failed, causing the reboot to +be delayed by a small number of seconds. It's possible to instruct it +to try the triple-fault mechanism first, by adding ``reboot=t`` to the +kernel's command line. diff --git a/docs/system/i386/pc.rst b/docs/system/i386/pc.rst new file mode 100644 index 00000000..d543c11a --- /dev/null +++ b/docs/system/i386/pc.rst @@ -0,0 +1,7 @@ +i440fx PC (``pc-i440fx``, ``pc``) +================================= + +Peripherals +~~~~~~~~~~~ + +.. include:: ../target-i386-desc.rst.inc diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst new file mode 100644 index 00000000..0f0a73f7 --- /dev/null +++ b/docs/system/i386/sgx.rst @@ -0,0 +1,188 @@ +Software Guard eXtensions (SGX) +=============================== + +Overview +-------- + +Intel Software Guard eXtensions (SGX) is a set of instructions and mechanisms +for memory accesses in order to provide security accesses for sensitive +applications and data. SGX allows an application to use it's pariticular +address space as an *enclave*, which is a protected area provides confidentiality +and integrity even in the presence of privileged malware. Accesses to the +enclave memory area from any software not resident in the enclave are prevented, +including those from privileged software. + +Virtual SGX +----------- + +SGX feature is exposed to guest via SGX CPUID. Looking at SGX CPUID, we can +report the same CPUID info to guest as on host for most of SGX CPUID. With +reporting the same CPUID guest is able to use full capacity of SGX, and KVM +doesn't need to emulate those info. + +The guest's EPC base and size are determined by QEMU, and KVM needs QEMU to +notify such info to it before it can initialize SGX for guest. + +Virtual EPC +~~~~~~~~~~~ + +By default, QEMU does not assign EPC to a VM, i.e. fully enabling SGX in a VM +requires explicit allocation of EPC to the VM. Similar to other specialized +memory types, e.g. hugetlbfs, EPC is exposed as a memory backend. + +SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be realized +prior to realizing the vCPUs themselves, which occurs long before generic +devices are parsed and realized. This limitation means that EPC does not +require -maxmem as EPC is not treated as {cold,hot}plugged memory. + +QEMU does not artificially restrict the number of EPC sections exposed to a +guest, e.g. QEMU will happily allow you to create 64 1M EPC sections. Be aware +that some kernels may not recognize all EPC sections, e.g. the Linux SGX driver +is hardwired to support only 8 EPC sections. + +The following QEMU snippet creates two EPC sections, with 64M pre-allocated +to the VM and an additional 28M mapped but not allocated:: + + -object memory-backend-epc,id=mem1,size=64M,prealloc=on \ + -object memory-backend-epc,id=mem2,size=28M \ + -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 + +Note: + +The size and location of the virtual EPC are far less restricted compared +to physical EPC. Because physical EPC is protected via range registers, +the size of the physical EPC must be a power of two (though software sees +a subset of the full EPC, e.g. 92M or 128M) and the EPC must be naturally +aligned. KVM SGX's virtual EPC is purely a software construct and only +requires the size and location to be page aligned. QEMU enforces the EPC +size is a multiple of 4k and will ensure the base of the EPC is 4k aligned. +To simplify the implementation, EPC is always located above 4g in the guest +physical address space. + +Migration +~~~~~~~~~ + +QEMU/KVM doesn't prevent live migrating SGX VMs, although from hardware's +perspective, SGX doesn't support live migration, since both EPC and the SGX +key hierarchy are bound to the physical platform. However live migration +can be supported in the sense if guest software stack can support recreating +enclaves when it suffers sudden lose of EPC; and if guest enclaves can detect +SGX keys being changed, and handle gracefully. For instance, when ERESUME fails +with #PF.SGX, guest software can gracefully detect it and recreate enclaves; +and when enclave fails to unseal sensitive information from outside, it can +detect such error and sensitive information can be provisioned to it again. + +CPUID +~~~~~ + +Due to its myriad dependencies, SGX is currently not listed as supported +in any of QEMU's built-in CPU configuration. To expose SGX (and SGX Launch +Control) to a guest, you must either use ``-cpu host`` to pass-through the +host CPU model, or explicitly enable SGX when using a built-in CPU model, +e.g. via ``-cpu <model>,+sgx`` or ``-cpu <model>,+sgx,+sgxlc``. + +All SGX sub-features enumerated through CPUID, e.g. SGX2, MISCSELECT, +ATTRIBUTES, etc... can be restricted via CPUID flags. Be aware that enforcing +restriction of MISCSELECT, ATTRIBUTES and XFRM requires intercepting ECREATE, +i.e. may marginally reduce SGX performance in the guest. All SGX sub-features +controlled via -cpu are prefixed with "sgx", e.g.:: + + $ qemu-system-x86_64 -cpu help | xargs printf "%s\n" | grep sgx + sgx + sgx-debug + sgx-encls-c + sgx-enclv + sgx-exinfo + sgx-kss + sgx-mode64 + sgx-provisionkey + sgx-tokenkey + sgx1 + sgx2 + sgxlc + +The following QEMU snippet passes through the host CPU but restricts access to +the provision and EINIT token keys:: + + -cpu host,-sgx-provisionkey,-sgx-tokenkey + +SGX sub-features cannot be emulated, i.e. sub-features that are not present +in hardware cannot be forced on via '-cpu'. + +Virtualize SGX Launch Control +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU SGX support for Launch Control (LC) is passive, in the sense that it +does not actively change the LC configuration. QEMU SGX provides the user +the ability to set/clear the CPUID flag (and by extension the associated +IA32_FEATURE_CONTROL MSR bit in fw_cfg) and saves/restores the LE Hash MSRs +when getting/putting guest state, but QEMU does not add new controls to +directly modify the LC configuration. Similar to hardware behavior, locking +the LC configuration to a non-Intel value is left to guest firmware. Unlike +host bios setting for SGX launch control(LC), there is no special bios setting +for SGX guest by our design. If host is in locked mode, we can still allow +creating VM with SGX. + +Feature Control +~~~~~~~~~~~~~~~ + +QEMU SGX updates the ``etc/msr_feature_control`` fw_cfg entry to set the SGX +(bit 18) and SGX LC (bit 17) flags based on their respective CPUID support, +i.e. existing guest firmware will automatically set SGX and SGX LC accordingly, +assuming said firmware supports fw_cfg.msr_feature_control. + +Launching a guest +----------------- + +To launch a SGX guest: + +.. parsed-literal:: + + |qemu_system_x86| \\ + -cpu host,+sgx-provisionkey \\ + -object memory-backend-epc,id=mem1,size=64M,prealloc=on \\ + -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0 + +Utilizing SGX in the guest requires a kernel/OS with SGX support. +The support can be determined in guest by:: + + $ grep sgx /proc/cpuinfo + +and SGX epc info by:: + + $ dmesg | grep sgx + [ 0.182807] sgx: EPC section 0x140000000-0x143ffffff + [ 0.183695] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0. + +To launch a SGX numa guest: + +.. parsed-literal:: + + |qemu_system_x86| \\ + -cpu host,+sgx-provisionkey \\ + -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \\ + -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \\ + -numa node,nodeid=0,cpus=0-1,memdev=node0 \\ + -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \\ + -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \\ + -numa node,nodeid=1,cpus=2-3,memdev=node1 \\ + -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 + +and SGX epc numa info by:: + + $ dmesg | grep sgx + [ 0.369937] sgx: EPC section 0x180000000-0x183ffffff + [ 0.370259] sgx: EPC section 0x184000000-0x185bfffff + + $ dmesg | grep SRAT + [ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff] + [ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff] + +References +---------- + +- `SGX Homepage <https://software.intel.com/sgx>`__ + +- `SGX SDK <https://github.com/intel/linux-sgx.git>`__ + +- SGX specification: Intel SDM Volume 3 diff --git a/docs/system/images.rst b/docs/system/images.rst new file mode 100644 index 00000000..d000bd6b --- /dev/null +++ b/docs/system/images.rst @@ -0,0 +1,85 @@ +.. _disk images: + +Disk Images +----------- + +QEMU supports many disk image formats, including growable disk images +(their size increase as non empty sectors are written), compressed and +encrypted disk images. + +.. _disk_005fimages_005fquickstart: + +Quick start for disk image creation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can create a disk image with the command:: + + qemu-img create myimage.img mysize + +where myimage.img is the disk image filename and mysize is its size in +kilobytes. You can add an ``M`` suffix to give the size in megabytes and +a ``G`` suffix for gigabytes. + +See the ``qemu-img`` invocation documentation for more information. + +.. _disk_005fimages_005fsnapshot_005fmode: + +Snapshot mode +~~~~~~~~~~~~~ + +If you use the option ``-snapshot``, all disk images are considered as +read only. When sectors in written, they are written in a temporary file +created in ``/tmp``. You can however force the write back to the raw +disk images by using the ``commit`` monitor command (or C-a s in the +serial console). + +.. _vm_005fsnapshots: + +VM snapshots +~~~~~~~~~~~~ + +VM snapshots are snapshots of the complete virtual machine including CPU +state, RAM, device state and the content of all the writable disks. In +order to use VM snapshots, you must have at least one non removable and +writable block device using the ``qcow2`` disk image format. Normally +this device is the first virtual hard drive. + +Use the monitor command ``savevm`` to create a new VM snapshot or +replace an existing one. A human readable name can be assigned to each +snapshot in addition to its numerical ID. + +Use ``loadvm`` to restore a VM snapshot and ``delvm`` to remove a VM +snapshot. ``info snapshots`` lists the available snapshots with their +associated information:: + + (qemu) info snapshots + Snapshot devices: hda + Snapshot list (from hda): + ID TAG VM SIZE DATE VM CLOCK + 1 start 41M 2006-08-06 12:38:02 00:00:14.954 + 2 40M 2006-08-06 12:43:29 00:00:18.633 + 3 msys 40M 2006-08-06 12:44:04 00:00:23.514 + +A VM snapshot is made of a VM state info (its size is shown in +``info snapshots``) and a snapshot of every writable disk image. The VM +state info is stored in the first ``qcow2`` non removable and writable +block device. The disk image snapshots are stored in every disk image. +The size of a snapshot in a disk image is difficult to evaluate and is +not shown by ``info snapshots`` because the associated disk sectors are +shared among all the snapshots to save disk space (otherwise each +snapshot would need a full copy of all the disk images). + +When using the (unrelated) ``-snapshot`` option +(:ref:`disk_005fimages_005fsnapshot_005fmode`), +you can always make VM snapshots, but they are deleted as soon as you +exit QEMU. + +VM snapshots currently have the following known limitations: + +- They cannot cope with removable devices if they are removed or + inserted after a snapshot is done. + +- A few device drivers still have incomplete snapshot support so their + state is not saved or restored properly (in particular USB). + +.. include:: qemu-block-drivers.rst.inc diff --git a/docs/system/index.rst b/docs/system/index.rst new file mode 100644 index 00000000..e3695649 --- /dev/null +++ b/docs/system/index.rst @@ -0,0 +1,38 @@ +---------------- +System Emulation +---------------- + +This section of the manual is the overall guide for users using QEMU +for full system emulation (as opposed to user-mode emulation). +This includes working with hypervisors such as KVM, Xen, Hax +or Hypervisor.Framework. + +.. toctree:: + :maxdepth: 3 + + quickstart + invocation + device-emulation + keys + mux-chardev + monitor + images + virtio-net-failover + linuxboot + generic-loader + guest-loader + barrier + vnc-security + tls + secrets + authz + gdb + replay + managed-startup + bootindex + cpu-hotplug + pr-manager + targets + security + multi-process + confidential-guest-support diff --git a/docs/system/invocation.rst b/docs/system/invocation.rst new file mode 100644 index 00000000..4ba38fc2 --- /dev/null +++ b/docs/system/invocation.rst @@ -0,0 +1,18 @@ +.. _sec_005finvocation: + +Invocation +---------- + +.. parsed-literal:: + + |qemu_system| [options] [disk_image] + +disk_image is a raw hard disk image for IDE hard disk 0. Some targets do +not need a disk image. + +.. hxtool-doc:: qemu-options.hx + +Device URL Syntax +~~~~~~~~~~~~~~~~~ + +.. include:: device-url-syntax.rst.inc diff --git a/docs/system/keys.rst b/docs/system/keys.rst new file mode 100644 index 00000000..e596ae6c --- /dev/null +++ b/docs/system/keys.rst @@ -0,0 +1,6 @@ +.. _pcsys_005fkeys: + +Keys in the graphical frontends +------------------------------- + +.. include:: keys.rst.inc diff --git a/docs/system/keys.rst.inc b/docs/system/keys.rst.inc new file mode 100644 index 00000000..bd9b8e5f --- /dev/null +++ b/docs/system/keys.rst.inc @@ -0,0 +1,35 @@ +During the graphical emulation, you can use special key combinations to +change modes. The default key mappings are shown below, but if you use +``-alt-grab`` then the modifier is Ctrl-Alt-Shift (instead of Ctrl-Alt) +and if you use ``-ctrl-grab`` then the modifier is the right Ctrl key +(instead of Ctrl-Alt): + +Ctrl-Alt-f + Toggle full screen + +Ctrl-Alt-+ + Enlarge the screen + +Ctrl-Alt\-- + Shrink the screen + +Ctrl-Alt-u + Restore the screen's un-scaled dimensions + +Ctrl-Alt-n + Switch to virtual console 'n'. Standard console mappings are: + + *1* + Target system display + + *2* + Monitor + + *3* + Serial port + +Ctrl-Alt + Toggle mouse and keyboard grab. + +In the virtual consoles, you can use Ctrl-Up, Ctrl-Down, Ctrl-PageUp and +Ctrl-PageDown to move in the back log. diff --git a/docs/system/linuxboot.rst b/docs/system/linuxboot.rst new file mode 100644 index 00000000..228650ab --- /dev/null +++ b/docs/system/linuxboot.rst @@ -0,0 +1,30 @@ +.. _direct_005flinux_005fboot: + +Direct Linux Boot +----------------- + +This section explains how to launch a Linux kernel inside QEMU without +having to make a full bootable image. It is very useful for fast Linux +kernel testing. + +The syntax is: + +.. parsed-literal:: + + |qemu_system| -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + +Use ``-kernel`` to provide the Linux kernel image and ``-append`` to +give the kernel command line arguments. The ``-initrd`` option can be +used to provide an INITRD image. + +If you do not need graphical output, you can disable it and redirect the +virtual serial port and the QEMU monitor to the console with the +``-nographic`` option. The typical command line is: + +.. parsed-literal:: + + |qemu_system| -kernel bzImage -hda rootdisk.img \ + -append "root=/dev/hda console=ttyS0" -nographic + +Use Ctrl-a c to switch between the serial console and the monitor (see +:ref:`pcsys_005fkeys`). diff --git a/docs/system/loongarch/loongson3.rst b/docs/system/loongarch/loongson3.rst new file mode 100644 index 00000000..489ea20f --- /dev/null +++ b/docs/system/loongarch/loongson3.rst @@ -0,0 +1,129 @@ +:orphan: + +========================================== +loongson3 virt generic platform (``virt``) +========================================== + +The ``virt`` machine use gpex host bridge, and there are some +emulated devices on virt board, such as loongson7a RTC device, +IOAPIC device, ACPI device and so on. + +Supported devices +----------------- + +The ``virt`` machine supports: +- Gpex host bridge +- Ls7a RTC device +- Ls7a IOAPIC device +- ACPI GED device +- Fw_cfg device +- PCI/PCIe devices +- Memory device +- CPU device. Type: la464-loongarch-cpu. + +CPU and machine Type +-------------------- + +The ``qemu-system-loongarch64`` provides emulation for virt +machine. You can specify the machine type ``virt`` and +cpu type ``la464-loongarch-cpu``. + +Boot options +------------ + +We can boot the LoongArch virt machine by specifying the uefi bios, +initrd, and linux kernel. And those source codes and binary files +can be accessed by following steps. + +(1) booting command: + +.. code-block:: bash + + $ qemu-system-loongarch64 -machine virt -m 4G -cpu la464-loongarch-cpu \ + -smp 1 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd initrd.img \ + -append "root=/dev/ram rdinit=/sbin/init console=ttyS0,115200" \ + --nographic + +Note: The running speed may be a little slow, as the performance of our +qemu and uefi bios is not perfect, and it is being fixed. + +(2) cross compiler tools: + +.. code-block:: bash + + wget https://github.com/loongson/build-tools/releases/download/ \ + 2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + + tar -vxf loongarch64-clfs-5.0-cross-tools-gcc-full.tar.xz + +(3) qemu compile configure option: + +.. code-block:: bash + + ./configure --disable-rdma --disable-pvrdma --prefix=usr \ + --target-list="loongarch64-softmmu" \ + --disable-libiscsi --disable-libnfs --disable-libpmem \ + --disable-glusterfs --enable-libusb --enable-usb-redir \ + --disable-opengl --disable-xen --enable-spice \ + --enable-debug --disable-capstone --disable-kvm \ + --enable-profiler + make + +(4) uefi bios source code and compile method: + +.. code-block:: bash + + git clone https://github.com/loongson/edk2-LoongarchVirt.git + + cd edk2-LoongarchVirt + + git submodule update --init + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export WORKSPACE=`pwd` + + export PACKAGES_PATH=$WORKSPACE/edk2-LoongarchVirt + + export GCC5_LOONGARCH64_PREFIX=loongarch64-unknown-linux-gnu- + + edk2-LoongarchVirt/edksetup.sh + + make -C edk2-LoongarchVirt/BaseTools + + build --buildtarget=DEBUG --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + + build --buildtarget=RELEASE --tagname=GCC5 --arch=LOONGARCH64 --platform=OvmfPkg/LoongArchQemu/Loongson.dsc + +The efi binary file path: + + Build/LoongArchQemu/DEBUG_GCC5/FV/QEMU_EFI.fd + + Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd + +(5) linux kernel source code and compile method: + +.. code-block:: bash + + git clone https://github.com/loongson/linux.git + + export PATH=$YOUR_COMPILER_PATH/bin:$PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/lib:$LD_LIBRARY_PATH + + export LD_LIBRARY_PATH=$YOUR_COMPILER_PATH/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- loongson3_defconfig + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- install + + make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- modules_install + +Note: The branch of linux source code is loongarch-next. + +(6) initrd file: + + You can use busybox tool and the linux modules to make a initrd file. Or you can access the + binary files: https://github.com/yangxiaojuan-loongson/qemu-binary diff --git a/docs/system/managed-startup.rst b/docs/system/managed-startup.rst new file mode 100644 index 00000000..9bcf98ea --- /dev/null +++ b/docs/system/managed-startup.rst @@ -0,0 +1,35 @@ +Managed start up options +======================== + +In system mode emulation, it's possible to create a VM in a paused +state using the ``-S`` command line option. In this state the machine +is completely initialized according to command line options and ready +to execute VM code but VCPU threads are not executing any code. The VM +state in this paused state depends on the way QEMU was started. It +could be in: + +- initial state (after reset/power on state) +- with direct kernel loading, the initial state could be amended to execute + code loaded by QEMU in the VM's RAM and with incoming migration +- with incoming migration, initial state will be amended with the migrated + machine state after migration completes + +This paused state is typically used by users to query machine state and/or +additionally configure the machine (by hotplugging devices) in runtime before +allowing VM code to run. + +However, at the ``-S`` pause point, it's impossible to configure options +that affect initial VM creation (like: ``-smp``/``-m``/``-numa`` ...) or +cold plug devices. The experimental ``--preconfig`` command line option +allows pausing QEMU before the initial VM creation, in a "preconfig" state, +where additional queries and configuration can be performed via QMP +before moving on to the resulting configuration startup. In the +preconfig state, QEMU only allows a limited set of commands over the +QMP monitor, where the commands do not depend on an initialized +machine, including but not limited to: + +- ``qmp_capabilities`` +- ``query-qmp-schema`` +- ``query-commands`` +- ``query-status`` +- ``x-exit-preconfig`` diff --git a/docs/system/monitor.rst b/docs/system/monitor.rst new file mode 100644 index 00000000..ff5c4346 --- /dev/null +++ b/docs/system/monitor.rst @@ -0,0 +1,31 @@ +.. _QEMU monitor: + +QEMU Monitor +------------ + +The QEMU monitor is used to give complex commands to the QEMU emulator. +You can use it to: + +- Remove or insert removable media images (such as CD-ROM or + floppies). + +- Freeze/unfreeze the Virtual Machine (VM) and save or restore its + state from a disk file. + +- Inspect the VM state without an external debugger. + +Commands +~~~~~~~~ + +The following commands are available: + +.. hxtool-doc:: hmp-commands.hx + +.. hxtool-doc:: hmp-commands-info.hx + +Integer expressions +~~~~~~~~~~~~~~~~~~~ + +The monitor understands integers expressions for every integer argument. +You can use register names to get the value of specifics CPU registers +by prefixing them with *$*. diff --git a/docs/system/multi-process.rst b/docs/system/multi-process.rst new file mode 100644 index 00000000..210531ee --- /dev/null +++ b/docs/system/multi-process.rst @@ -0,0 +1,64 @@ +Multi-process QEMU +================== + +This document describes how to configure and use multi-process qemu. +For the design document refer to docs/devel/qemu-multiprocess. + +1) Configuration +---------------- + +multi-process is enabled by default for targets that enable KVM + + +2) Usage +-------- + +Multi-process QEMU requires an orchestrator to launch. + +Following is a description of command-line used to launch mpqemu. + +* Orchestrator: + + - The Orchestrator creates a unix socketpair + + - It launches the remote process and passes one of the + sockets to it via command-line. + + - It then launches QEMU and specifies the other socket as an option + to the Proxy device object + +* Remote Process: + + - QEMU can enter remote process mode by using the "remote" machine + option. + + - The orchestrator creates a "remote-object" with details about + the device and the file descriptor for the device + + - The remaining options are no different from how one launches QEMU with + devices. + + - Example command-line for the remote process is as follows: + + /usr/bin/qemu-system-x86_64 \ + -machine x-remote \ + -device lsi53c895a,id=lsi0 \ + -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2 \ + -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0 \ + -object x-remote-object,id=robj1,devid=lsi0,fd=4, + +* QEMU: + + - Since parts of the RAM are shared between QEMU & remote process, a + memory-backend-memfd is required to facilitate this, as follows: + + -object memory-backend-memfd,id=mem,size=2G + + - A "x-pci-proxy-dev" device is created for each of the PCI devices emulated + in the remote process. A "socket" sub-option specifies the other end of + unix channel created by orchestrator. The "id" sub-option must be specified + and should be the same as the "id" specified for the remote PCI device + + - Example commandline for QEMU is as follows: + + -device x-pci-proxy-dev,id=lsi0,socket=3 diff --git a/docs/system/mux-chardev.rst b/docs/system/mux-chardev.rst new file mode 100644 index 00000000..05064068 --- /dev/null +++ b/docs/system/mux-chardev.rst @@ -0,0 +1,6 @@ +.. _keys in the character backend multiplexer: + +Keys in the character backend multiplexer +----------------------------------------- + +.. include:: mux-chardev.rst.inc diff --git a/docs/system/mux-chardev.rst.inc b/docs/system/mux-chardev.rst.inc new file mode 100644 index 00000000..84ea12cb --- /dev/null +++ b/docs/system/mux-chardev.rst.inc @@ -0,0 +1,27 @@ +During emulation, if you are using a character backend multiplexer +(which is the default if you are using ``-nographic``) then several +commands are available via an escape sequence. These key sequences all +start with an escape character, which is Ctrl-a by default, but can be +changed with ``-echr``. The list below assumes you're using the default. + +Ctrl-a h + Print this help + +Ctrl-a x + Exit emulator + +Ctrl-a s + Save disk data back to file (if -snapshot) + +Ctrl-a t + Toggle console timestamps + +Ctrl-a b + Send break (magic sysrq in Linux) + +Ctrl-a c + Rotate between the frontends connected to the multiplexer (usually + this switches between the monitor and the console) + +Ctrl-a Ctrl-a + Send the escape character to the frontend diff --git a/docs/system/openrisc/cpu-features.rst b/docs/system/openrisc/cpu-features.rst new file mode 100644 index 00000000..aeb65e22 --- /dev/null +++ b/docs/system/openrisc/cpu-features.rst @@ -0,0 +1,15 @@ +CPU Features +============ + +The QEMU emulation of the OpenRISC architecture provides following built in +features. + +- Shadow GPRs +- MMU TLB with 128 entries, 1 way +- Power Management (PM) +- Programmable Interrupt Controller (PIC) +- Tick Timer + +These features are on by default and the presence can be confirmed by checking +the contents of the Unit Presence Register (``UPR``) and CPU Configuration +Register (``CPUCFGR``). diff --git a/docs/system/openrisc/emulation.rst b/docs/system/openrisc/emulation.rst new file mode 100644 index 00000000..0af898ab --- /dev/null +++ b/docs/system/openrisc/emulation.rst @@ -0,0 +1,17 @@ +OpenRISC 1000 CPU architecture support +====================================== + +QEMU's TCG emulation includes support for the OpenRISC or1200 implementation of +the OpenRISC 1000 cpu architecture. + +The or1200 cpu also has support for the following instruction subsets: + +- ORBIS32 (OpenRISC Basic Instruction Set) +- ORFPX32 (OpenRISC Floating-Point eXtension) + +In addition to the instruction subsets the QEMU TCG emulation also has support +for most Class II (optional) instructions. + +For information on all OpenRISC instructions please refer to the latest +architecture manual available on the OpenRISC website in the +`OpenRISC Architecture <https://openrisc.io/architecture>`_ section. diff --git a/docs/system/openrisc/or1k-sim.rst b/docs/system/openrisc/or1k-sim.rst new file mode 100644 index 00000000..ef104397 --- /dev/null +++ b/docs/system/openrisc/or1k-sim.rst @@ -0,0 +1,43 @@ +Or1ksim board +============= + +The QEMU Or1ksim machine emulates the standard OpenRISC board simulator which is +also the standard SoC configuration. + +Supported devices +----------------- + + * 16550A UART + * ETHOC Ethernet controller + * SMP (OpenRISC multicore using ompic) + +Boot options +------------ + +The Or1ksim machine can be started using the ``-kernel`` and ``-initrd`` options +to load a Linux kernel and optional disk image. + +.. code-block:: bash + + $ qemu-system-or1k -cpu or1220 -M or1k-sim -nographic \ + -kernel vmlinux \ + -initrd initramfs.cpio.gz \ + -m 128 + +Linux guest kernel configuration +"""""""""""""""""""""""""""""""" + +The 'or1ksim_defconfig' for Linux openrisc kernels includes the right +drivers for the or1ksim machine. If you would like to run an SMP system +choose the 'simple_smp_defconfig' config. + +Hardware configuration information +"""""""""""""""""""""""""""""""""" + +The ``or1k-sim`` board automatically generates a device tree blob ("dtb") +which it passes to the guest. This provides information about the +addresses, interrupt lines and other configuration of the various devices +in the system. + +The location of the DTB will be passed in register ``r3`` to the guest operating +system. diff --git a/docs/system/openrisc/virt.rst b/docs/system/openrisc/virt.rst new file mode 100644 index 00000000..2fe61ac9 --- /dev/null +++ b/docs/system/openrisc/virt.rst @@ -0,0 +1,50 @@ +'virt' generic virtual platform +=============================== + +The ``virt`` board is a platform which does not correspond to any +real hardware; it is designed for use in virtual machines. +It is the recommended board type if you simply want to run +a guest such as Linux and do not care about reproducing the +idiosyncrasies and limitations of a particular bit of real-world +hardware. + +Supported devices +----------------- + + * PCI/PCIe devices + * 8 virtio-mmio transport devices + * 16550A UART + * Goldfish RTC + * SiFive Test device for poweroff and reboot + * SMP (OpenRISC multicore using ompic) + +Boot options +------------ + +The virt machine can be started using the ``-kernel`` and ``-initrd`` options +to load a Linux kernel and optional disk image. For example: + +.. code-block:: bash + + $ qemu-system-or1k -cpu or1220 -M or1k-sim -nographic \ + -device virtio-net-device,netdev=user -netdev user,id=user,net=10.9.0.1/24,host=10.9.0.100 \ + -device virtio-blk-device,drive=d0 -drive file=virt.qcow2,id=d0,if=none,format=qcow2 \ + -kernel vmlinux \ + -initrd initramfs.cpio.gz \ + -m 128 + +Linux guest kernel configuration +"""""""""""""""""""""""""""""""" + +The 'virt_defconfig' for Linux openrisc kernels includes the right drivers for +the ``virt`` machine. + +Hardware configuration information +"""""""""""""""""""""""""""""""""" + +The ``virt`` board automatically generates a device tree blob ("dtb") which it +passes to the guest. This provides information about the addresses, interrupt +lines and other configuration of the various devices in the system. + +The location of the DTB will be passed in register ``r3`` to the guest operating +system. diff --git a/docs/system/ppc/embedded.rst b/docs/system/ppc/embedded.rst new file mode 100644 index 00000000..af3b3d9f --- /dev/null +++ b/docs/system/ppc/embedded.rst @@ -0,0 +1,9 @@ +Embedded family boards +====================== + +- ``bamboo`` bamboo +- ``mpc8544ds`` mpc8544ds +- ``ppce500`` generic paravirt e500 platform +- ``ref405ep`` ref405ep +- ``sam460ex`` aCube Sam460ex +- ``virtex-ml507`` Xilinx Virtex ML507 reference design diff --git a/docs/system/ppc/powermac.rst b/docs/system/ppc/powermac.rst new file mode 100644 index 00000000..04334ba2 --- /dev/null +++ b/docs/system/ppc/powermac.rst @@ -0,0 +1,34 @@ +PowerMac family boards (``g3beige``, ``mac99``) +================================================================== + +Use the executable ``qemu-system-ppc`` to simulate a complete PowerMac +PowerPC system. + +- ``g3beige`` Heathrow based PowerMAC +- ``mac99`` Mac99 based PowerMAC + +Supported devices +----------------- + +QEMU emulates the following PowerMac peripherals: + + * UniNorth or Grackle PCI Bridge + * PCI VGA compatible card with VESA Bochs Extensions + * 2 PMAC IDE interfaces with hard disk and CD-ROM support + * NE2000 PCI adapters + * Non Volatile RAM + * VIA-CUDA with ADB keyboard and mouse. + + +Missing devices +--------------- + + * To be identified + +Firmware +-------- + +Since version 0.9.1, QEMU uses OpenBIOS https://www.openbios.org/ for +the g3beige and mac99 PowerMac and the 40p machines. OpenBIOS is a free +(GPL v2) portable firmware implementation. The goal is to implement a +100% IEEE 1275-1994 (referred to as Open Firmware) compliant firmware. diff --git a/docs/system/ppc/powernv.rst b/docs/system/ppc/powernv.rst new file mode 100644 index 00000000..c8f97623 --- /dev/null +++ b/docs/system/ppc/powernv.rst @@ -0,0 +1,206 @@ +PowerNV family boards (``powernv8``, ``powernv9``, ``powernv10``) +================================================================== + +PowerNV (as Non-Virtualized) is the "bare metal" platform using the +OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can +be used as an hypervisor OS, running KVM guests, or simply as a host +OS. + +The PowerNV QEMU machine tries to emulate a PowerNV system at the +level of the skiboot firmware, which loads the OS and provides some +runtime services. Power Systems have a lower firmware (HostBoot) that +does low level system initialization, like DRAM training. This is +beyond the scope of what QEMU addresses today. + +Supported devices +----------------- + + * Multi processor support for POWER8, POWER8NVL and POWER9. + * XSCOM, serial communication sideband bus to configure chiplets. + * Simple LPC Controller. + * Processor Service Interface (PSI) Controller. + * Interrupt Controller, XICS (POWER8) and XIVE (POWER9) and XIVE2 (Power10). + * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge. + * Simple OCC is an on-chip micro-controller used for power management tasks. + * iBT device to handle BMC communication, with the internal BMC simulator + provided by QEMU or an external BMC such as an Aspeed QEMU machine. + * PNOR containing the different firmware partitions. + +Missing devices +--------------- + +A lot is missing, among which : + + * I2C controllers (yet to be merged). + * NPU/NPU2/NPU3 controllers. + * EEH support for PCIe Host bridge controllers. + * NX controller. + * VAS controller. + * chipTOD (Time Of Day). + * Self Boot Engine (SBE). + * FSI bus. + +Firmware +-------- + +The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems +includes the runtime services ``skiboot`` and the bootloader kernel and +initramfs ``skiroot``. Source code can be found on the `OpenPOWER account at +GitHub <https://github.com/open-power>`_. + +Prebuilt images of ``skiboot`` and ``skiroot`` are made available on the +`OpenPOWER <https://github.com/open-power/op-build/releases/>`__ site. + +QEMU includes a prebuilt image of ``skiboot`` which is updated when a +more recent version is required by the models. + +Current acceleration status +--------------------------- + +KVM acceleration in Linux Power hosts is provided by the kvm-hv and +kvm-pr modules. kvm-hv is adherent to PAPR and it's not compliant with +powernv. kvm-pr in theory could be used as a valid accel option but +this isn't supported by kvm-pr at this moment. + +To spare users from dealing with not so informative errors when attempting +to use accel=kvm, the powernv machine will throw an error informing that +KVM is not supported. This can be revisited in the future if kvm-pr (or +any other KVM alternative) is usable as KVM accel for this machine. + +Boot options +------------ + +Here is a simple setup with one e1000e NIC : + +.. code-block:: bash + + $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 \ + -accel tcg,thread=single \ + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 \ + -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \ + -kernel ./zImage.epapr \ + -initrd ./rootfs.cpio.xz \ + -nographic + +and a SATA disk : + +.. code-block:: bash + + -device ich9-ahci,id=sata0,bus=pcie.1,addr=0x0 \ + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ + -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ + +Complex PCIe configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Six PHBs are defined per chip (POWER9) but no default PCI layout is +provided (to be compatible with libvirt). One PCI device can be added +on any of the available PCIe slots using command line options such as: + +.. code-block:: bash + + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 + -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0 + + -device megasas,id=scsi0,bus=pcie.0,addr=0x0 + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none + -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 + +Here is a full example with two different storage controllers on +different PHBs, each with a disk, the second PHB is empty : + +.. code-block:: bash + + $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 -accel tcg,thread=single \ + -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \ + \ + -device megasas,id=scsi0,bus=pcie.0,addr=0x0 \ + -drive file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \ + -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \ + \ + -device pcie-pci-bridge,id=bridge1,bus=pcie.1,addr=0x0 \ + \ + -device ich9-ahci,id=sata0,bus=bridge1,addr=0x1 \ + -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ + -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ + -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=bridge1,addr=0x2 \ + -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \ + -device nec-usb-xhci,bus=bridge1,addr=0x7 \ + \ + -serial mon:stdio -nographic + +You can also use VIRTIO devices : + +.. code-block:: bash + + -drive file=./fedora-ppc64le.qcow2,if=none,snapshot=on,id=drive0 \ + -device virtio-blk-pci,drive=drive0,id=blk0,bus=pcie.0 \ + \ + -netdev tap,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,id=netdev0 \ + -device virtio-net-pci,netdev=netdev0,id=net0,bus=pcie.1 \ + \ + -fsdev local,id=fsdev0,path=$HOME,security_model=passthrough \ + -device virtio-9p-pci,fsdev=fsdev0,mount_tag=host,bus=pcie.2 + +Multi sockets +~~~~~~~~~~~~~ + +The number of sockets is deduced from the number of CPUs and the +number of cores. ``-smp 2,cores=1`` will define a machine with 2 +sockets of 1 core, whereas ``-smp 2,cores=2`` will define a machine +with 1 socket of 2 cores. ``-smp 8,cores=2``, 4 sockets of 2 cores. + +BMC configuration +~~~~~~~~~~~~~~~~~ + +OpenPOWER systems negotiate the shutdown and reboot with their +BMC. The QEMU PowerNV machine embeds an IPMI BMC simulator using the +iBT interface and should offer the same power features. + +If you want to define your own BMC, use ``-nodefaults`` and specify +one on the command line : + +.. code-block:: bash + + -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 + +The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__ +and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__ +define a Sensor Data Record repository and a Field Replaceable Unit +inventory for a Palmetto BMC. They can be used to extend the QEMU BMC +simulator. + +.. code-block:: bash + + -device ipmi-bmc-sim,sdrfile=./palmetto-SDR.bin,fruareasize=256,frudatafile=./palmetto-FRU.bin,id=bmc0 \ + -device isa-ipmi-bt,bmc=bmc0,irq=10 + +The PowerNV machine can also be run with an external IPMI BMC device +connected to a remote QEMU machine acting as BMC, using these options +: + +.. code-block:: bash + + -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \ + -device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \ + -device isa-ipmi-bt,bmc=bmc0,irq=10 \ + -nodefaults + +NVRAM +~~~~~ + +Use a MTD drive to add a PNOR to the machine, and get a NVRAM : + +.. code-block:: bash + + -drive file=./witherspoon.pnor,format=raw,if=mtd + +CAVEATS +------- + + * No support for multiple HW threads (SMT=1). Same as pseries. + +Maintainer contact information +------------------------------ + +Cédric Le Goater <clg@kaod.org> diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst new file mode 100644 index 00000000..fa40e57d --- /dev/null +++ b/docs/system/ppc/ppce500.rst @@ -0,0 +1,182 @@ +ppce500 generic platform (``ppce500``) +====================================== + +QEMU for PPC supports a special ``ppce500`` machine designed for emulation and +virtualization purposes. + +Supported devices +----------------- + +The ``ppce500`` machine supports the following devices: + +* PowerPC e500 series core (e500v2/e500mc/e5500/e6500) +* Configuration, Control, and Status Register (CCSR) +* Multicore Programmable Interrupt Controller (MPIC) with MSI support +* 1 16550A UART device +* 1 Freescale MPC8xxx I2C controller +* 1 Pericom pt7c4338 RTC via I2C +* 1 Freescale MPC8xxx GPIO controller +* Power-off functionality via one GPIO pin +* 1 Freescale MPC8xxx PCI host controller +* VirtIO devices via PCI bus +* 1 Freescale Enhanced Triple Speed Ethernet controller (eTSEC) + +Hardware configuration information +---------------------------------- + +The ``ppce500`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The number of subnodes under /cpus node should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` + +Both ``qemu-system-ppc`` and ``qemu-system-ppc64`` provide emulation for the +following 32-bit PowerPC CPUs: + +* e500v2 +* e500mc + +Additionally ``qemu-system-ppc64`` provides support for the following 64-bit +PowerPC CPUs: + +* e5500 +* e6500 + +The CPU type can be specified via the ``-cpu`` command line. If not specified, +it creates a machine with e500v2 core. The following example shows an e6500 +based machine creation: + +.. code-block:: bash + + $ qemu-system-ppc64 -nographic -M ppce500 -cpu e6500 + +Boot options +------------ + +The ``ppce500`` machine can start using the standard -kernel functionality +for loading a payload like an OS kernel (e.g.: Linux), or U-Boot firmware. + +When -bios is omitted, the default pc-bios/u-boot.e500 firmware image is used +as the BIOS. QEMU follows below truth table to select which payload to execute: + +===== ========== ======= +-bios -kernel payload +===== ========== ======= + N N u-boot + N Y kernel + Y don't care u-boot +===== ========== ======= + +When both -bios and -kernel are present, QEMU loads U-Boot and U-Boot in turns +automatically loads the kernel image specified by the -kernel parameter via +U-Boot's built-in "bootm" command, hence a legacy uImage format is required in +such scenario. + +Running Linux kernel +-------------------- + +Linux mainline v5.11 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``ppce500`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=powerpc + $ export CROSS_COMPILE=powerpc-linux- + $ make corenet64_smp_defconfig + $ make menuconfig + +then manually select the following configuration: + + Platform support > Freescale Book-E Machine Type > QEMU generic e500 platform + +To boot the newly built Linux kernel in QEMU with the ``ppce500`` machine: + +.. code-block:: bash + + $ qemu-system-ppc64 -M ppce500 -cpu e5500 -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel vmlinux \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``ppce500`` machine +in 32-bit mode, use the same 64-bit configuration steps except the defconfig +file should use corenet32_smp_defconfig. + +To boot the 32-bit Linux kernel: + +.. code-block:: bash + + $ qemu-system-ppc64 -M ppce500 -cpu e500mc -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel vmlinux \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +Running U-Boot +-------------- + +U-Boot mainline v2021.07 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the ``ppce500`` machine, use +the qemu-ppce500_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=powerpc-linux- + $ make qemu-ppce500_defconfig + +You will get u-boot file in the build tree. + +When U-Boot boots, you will notice the following if using with ``-cpu e6500``: + +.. code-block:: none + + CPU: Unknown, Version: 0.0, (0x00000000) + Core: e6500, Version: 2.0, (0x80400020) + +This is because we only specified a core name to QEMU and it does not have a +meaningful SVR value which represents an actual SoC that integrates such core. +You can specify a real world SoC device that QEMU has built-in support but all +these SoCs are e500v2 based MPC85xx series, hence you cannot test anything +built for P4080 (e500mc), P5020 (e5500) and T2080 (e6500). + +Networking +---------- + +By default a VirtIO standard PCI networking device is connected as an ethernet +interface at PCI address 0.1.0, but we can switch that to an e1000 NIC by: + +.. code-block:: bash + + $ qemu-system-ppc64 -M ppce500 -smp 4 -m 2G \ + -display none -serial stdio \ + -bios u-boot \ + -nic tap,ifname=tap0,script=no,downscript=no,model=e1000 + +The QEMU ``ppce500`` machine can also dynamically instantiate an eTSEC device +if “-device eTSEC” is given to QEMU: + +.. code-block:: bash + + -netdev tap,ifname=tap0,script=no,downscript=no,id=net0 -device eTSEC,netdev=net0 + +Root file system on flash drive +------------------------------- + +Rather than using a root file system on ram disk, it is possible to have it on +CFI flash. Given an ext2 image whose size must be a power of two, it can be used +as follows: + +.. code-block:: bash + + $ qemu-system-ppc64 -M ppce500 -cpu e500mc -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel vmlinux \ + -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \ + -append "rootwait root=/dev/mtdblock0" diff --git a/docs/system/ppc/prep.rst b/docs/system/ppc/prep.rst new file mode 100644 index 00000000..bd9eb8ea --- /dev/null +++ b/docs/system/ppc/prep.rst @@ -0,0 +1,18 @@ +Prep machine (``40p``) +================================================================== + +Use the executable ``qemu-system-ppc`` to simulate a complete 40P (PREP) + +Supported devices +----------------- + +QEMU emulates the following 40P (PREP) peripherals: + + * PCI Bridge + * PCI VGA compatible card with VESA Bochs Extensions + * 2 IDE interfaces with hard disk and CD-ROM support + * Floppy disk + * PCnet network adapters + * Serial port + * PREP Non Volatile RAM + * PC compatible keyboard and mouse. diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst new file mode 100644 index 00000000..a876d897 --- /dev/null +++ b/docs/system/ppc/pseries.rst @@ -0,0 +1,298 @@ +=================================== +pSeries family boards (``pseries``) +=================================== + +The Power machine para-virtualized environment described by the Linux on Power +Architecture Reference ([LoPAR]_) document is called pSeries. This environment +is also known as sPAPR, System p guests, or simply Power Linux guests (although +it is capable of running other operating systems, such as AIX). + +Even though pSeries is designed to behave as a guest environment, it is also +capable of acting as a hypervisor OS, providing, on that role, nested +virtualization capabilities. + +Supported devices +================= + + * Multi processor support for many Power processors generations: POWER7, + POWER7+, POWER8, POWER8NVL, POWER9, and Power10. Support for POWER5+ exists, + but its state is unknown. + * Interrupt Controller, XICS (POWER8) and XIVE (POWER9 and Power10) + * vPHB PCIe Host bridge. + * vscsi and vnet devices, compatible with the same devices available on a + PowerVM hypervisor with VIOS managing LPARs. + * Virtio based devices. + * PCIe device pass through. + +Missing devices +=============== + + * SPICE support. + +Firmware +======== + +The pSeries platform in QEMU comes with 2 firmwares: + +`SLOF <https://github.com/aik/SLOF>`_ (Slimline Open Firmware) is an +implementation of the `IEEE 1275-1994, Standard for Boot (Initialization +Configuration) Firmware: Core Requirements and Practices +<https://standards.ieee.org/standard/1275-1994.html>`_. + +SLOF performs bus scanning, PCI resource allocation, provides the client +interface to boot from block devices and network. + +QEMU includes a prebuilt image of SLOF which is updated when a more recent +version is required. + +VOF (Virtual Open Firmware) is a minimalistic firmware to work with +``-machine pseries,x-vof=on``. When enabled, the firmware acts as a slim +shim and QEMU implements parts of the IEEE 1275 Open Firmware interface. + +VOF does not have device drivers, does not do PCI resource allocation and +relies on ``-kernel`` used with Linux kernels recent enough (v5.4+) +to PCI resource assignment. It is ideal to use with petitboot. + +Booting via ``-kernel`` supports the following: + ++-------------------+-------------------+------------------+ +| kernel | pseries,x-vof=off | pseries,x-vof=on | ++===================+===================+==================+ +| vmlinux BE | ✓ | ✓ | ++-------------------+-------------------+------------------+ +| vmlinux LE | ✓ | ✓ | ++-------------------+-------------------+------------------+ +| zImage.pseries BE | ✓¹ | ✓¹ | ++-------------------+-------------------+------------------+ +| zImage.pseries LE | ✓ | ✓ | ++-------------------+-------------------+------------------+ + +¹ must set kernel-addr=0 + +Build directions +================ + +.. code-block:: bash + + ./configure --target-list=ppc64-softmmu && make + +Running instructions +==================== + +Someone can select the pSeries machine type by running QEMU with the following +options: + +.. code-block:: bash + + qemu-system-ppc64 -M pseries <other QEMU arguments> + +sPAPR devices +============= + +The sPAPR specification defines a set of para-virtualized devices, which are +also supported by the pSeries machine in QEMU and can be instantiated with the +``-device`` option: + +* ``spapr-vlan`` : a virtual network interface. +* ``spapr-vscsi`` : a virtual SCSI disk interface. +* ``spapr-rng`` : a pseudo-device for passing random number generator data to the + guest (see the `H_RANDOM hypercall feature + <https://wiki.qemu.org/Features/HRandomHypercall>`_ for details). +* ``spapr-vty``: a virtual teletype. +* ``spapr-pci-host-bridge``: a PCI host bridge. +* ``tpm-spapr``: a Trusted Platform Module (TPM). +* ``spapr-tpm-proxy``: a TPM proxy. + +These are compatible with the devices historically available for use when +running the IBM PowerVM hypervisor with LPARs. + +However, since these devices have originally been specified with another +hypervisor and non-Linux guests in mind, you should use the virtio counterparts +(virtio-net, virtio-blk/scsi and virtio-rng for instance) if possible instead, +since they will most probably give you better performance with Linux guests in a +QEMU environment. + +The pSeries machine in QEMU is always instantiated with the following devices: + +* A NVRAM device (``spapr-nvram``). +* A virtual teletype (``spapr-vty``). +* A PCI host bridge (``spapr-pci-host-bridge``). + +Hence, it is not needed to add them manually, unless you use the ``-nodefaults`` +command line option in QEMU. + +In the case of the default ``spapr-nvram`` device, if someone wants to make the +contents of the NVRAM device persistent, they will need to specify a PFLASH +device when starting QEMU, i.e. either use +``-drive if=pflash,file=<filename>,format=raw`` to set the default PFLASH +device, or specify one with an ID +(``-drive if=none,file=<filename>,format=raw,id=pfid``) and pass that ID to the +NVRAM device with ``-global spapr-nvram.drive=pfid``. + +sPAPR specification +------------------- + +The main source of documentation on the sPAPR standard is the [LoPAR]_ document. +However, documentation specific to QEMU's implementation of the specification +can also be found in QEMU documentation: + +.. toctree:: + :maxdepth: 1 + + ../../specs/ppc-spapr-hotplug.rst + ../../specs/ppc-spapr-hcalls.rst + ../../specs/ppc-spapr-numa.rst + ../../specs/ppc-spapr-uv-hcalls.rst + ../../specs/ppc-spapr-xive.rst + +Switching between the KVM-PR and KVM-HV kernel module +===================================================== + +Currently, there are two implementations of KVM on Power, ``kvm_hv.ko`` and +``kvm_pr.ko``. + + +If a host supports both KVM modes, and both KVM kernel modules are loaded, it is +possible to switch between the two modes with the ``kvm-type`` parameter: + +* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=PR`` to use the + ``kvm_pr.ko`` kernel module. +* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=HV`` to use ``kvm_hv.ko`` + instead. + +KVM-PR +------ + +KVM-PR uses the so-called **PR**\ oblem state of the PPC CPUs to run the guests, +i.e. the virtual machine is run in user mode and all privileged instructions +trap and have to be emulated by the host. That means you can run KVM-PR inside +a pSeries guest (or a PowerVM LPAR for that matter), and that is where it has +originated, as historically (prior to POWER7) it was not possible to run Linux +on hypervisor mode on a Power processor (this function was restricted to +PowerVM, the IBM proprietary hypervisor). + +Because all privileged instructions are trapped, guests that use a lot of +privileged instructions run quite slow with KVM-PR. On the other hand, because +of that, this kernel module can run on pretty much every PPC hardware, and is +able to emulate a lot of guests CPUs. This module can even be used to run other +PowerPC guests like an emulated PowerMac. + +As KVM-PR can be run inside a pSeries guest, it can also provide nested +virtualization capabilities (i.e. running a guest from within a guest). + +It is important to notice that, as KVM-HV provides a much better execution +performance, maintenance work has been much more focused on it in the past +years. Maintenance for KVM-PR has been minimal. + +In order to run KVM-PR guests with POWER9 processors, someone will need to start +QEMU with ``kernel_irqchip=off`` command line option. + +KVM-HV +------ + +KVM-HV uses the hypervisor mode of more recent Power processors, that allow +access to the bare metal hardware directly. Although POWER7 had this capability, +it was only starting with POWER8 that this was officially supported by IBM. + +Originally, KVM-HV was only available when running on a PowerNV platform (a.k.a. +Power bare metal). Although it runs on a PowerNV platform, it can only be used +to start pSeries guests. As the pSeries guest doesn't have access to the +hypervisor mode of the Power CPU, it wasn't possible to run KVM-HV on a guest. +This limitation has been lifted, and now it is possible to run KVM-HV inside +pSeries guests as well, making nested virtualization possible with KVM-HV. + +As KVM-HV has access to privileged instructions, guests that use a lot of these +can run much faster than with KVM-PR. On the other hand, the guest CPU has to be +of the same type as the host CPU this way, e.g. it is not possible to specify an +embedded PPC CPU for the guest with KVM-HV. However, there is at least the +possibility to run the guest in a backward-compatibility mode of the previous +CPUs generations, e.g. you can run a POWER7 guest on a POWER8 host by using +``-cpu POWER8,compat=power7`` as parameter to QEMU. + +Modules support +=============== + +As noticed in the sections above, each module can run in a different +environment. The following table shows with which environment each module can +run. As long as you are in a supported environment, you can run KVM-PR or KVM-HV +nested. Combinations not shown in the table are not available. + ++--------------+------------+------+-------------------+----------+--------+ +| Platform | Host type | Bits | Page table format | KVM-HV | KVM-PR | ++==============+============+======+===================+==========+========+ +| PowerNV | bare metal | 32 | hash | no | yes | +| | | +-------------------+----------+--------+ +| | | | radix | N/A | N/A | +| | +------+-------------------+----------+--------+ +| | | 64 | hash | yes | yes | +| | | +-------------------+----------+--------+ +| | | | radix | yes | no | ++--------------+------------+------+-------------------+----------+--------+ +| pSeries [1]_ | PowerNV | 32 | hash | no | yes | +| | | +-------------------+----------+--------+ +| | | | radix | N/A | N/A | +| | +------+-------------------+----------+--------+ +| | | 64 | hash | no | yes | +| | | +-------------------+----------+--------+ +| | | | radix | yes [2]_ | no | +| +------------+------+-------------------+----------+--------+ +| | PowerVM | 32 | hash | no | yes | +| | | +-------------------+----------+--------+ +| | | | radix | N/A | N/A | +| | +------+-------------------+----------+--------+ +| | | 64 | hash | no | yes | +| | | +-------------------+----------+--------+ +| | | | radix [3]_ | no | yes | ++--------------+------------+------+-------------------+----------+--------+ + +.. [1] On POWER9 DD2.1 processors, the page table format on the host and guest + must be the same. + +.. [2] KVM-HV cannot run nested on POWER8 machines. + +.. [3] Introduced on Power10 machines. + + +.. _power-papr-protected-execution-facility-pef: + +POWER (PAPR) Protected Execution Facility (PEF) +----------------------------------------------- + +Protected Execution Facility (PEF), also known as Secure Guest support +is a feature found on IBM POWER9 and POWER10 processors. + +If a suitable firmware including an Ultravisor is installed, it adds +an extra memory protection mode to the CPU. The ultravisor manages a +pool of secure memory which cannot be accessed by the hypervisor. + +When this feature is enabled in QEMU, a guest can use ultracalls to +enter "secure mode". This transfers most of its memory to secure +memory, where it cannot be eavesdropped by a compromised hypervisor. + +Launching +^^^^^^^^^ + +To launch a guest which will be permitted to enter PEF secure mode:: + + $ qemu-system-ppc64 \ + -object pef-guest,id=pef0 \ + -machine confidential-guest-support=pef0 \ + ... + +Live Migration +^^^^^^^^^^^^^^ + +Live migration is not yet implemented for PEF guests. For +consistency, QEMU currently prevents migration if the PEF feature is +enabled, whether or not the guest has actually entered secure mode. + + +Maintainer contact information +============================== + +Cédric Le Goater <clg@kaod.org> + +Daniel Henrique Barboza <danielhb413@gmail.com> + +.. [LoPAR] `Linux on Power Architecture Reference document (LoPAR) revision + 2.9 <https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_. diff --git a/docs/system/pr-manager.rst b/docs/system/pr-manager.rst new file mode 100644 index 00000000..b19a0c15 --- /dev/null +++ b/docs/system/pr-manager.rst @@ -0,0 +1,83 @@ +=============================== +Persistent reservation managers +=============================== + +SCSI persistent reservations allow restricting access to block devices +to specific initiators in a shared storage setup. When implementing +clustering of virtual machines, it is a common requirement for virtual +machines to send persistent reservation SCSI commands. However, +the operating system restricts sending these commands to unprivileged +programs because incorrect usage can disrupt regular operation of the +storage fabric. + +For this reason, QEMU's SCSI passthrough devices, ``scsi-block`` +and ``scsi-generic`` (both are only available on Linux) can delegate +implementation of persistent reservations to a separate object, +the "persistent reservation manager". Only PERSISTENT RESERVE OUT and +PERSISTENT RESERVE IN commands are passed to the persistent reservation +manager object; other commands are processed by QEMU as usual. + +----------------------------------------- +Defining a persistent reservation manager +----------------------------------------- + +A persistent reservation manager is an instance of a subclass of the +"pr-manager" QOM class. + +Right now only one subclass is defined, ``pr-manager-helper``, which +forwards the commands to an external privileged helper program +over Unix sockets. The helper program only allows sending persistent +reservation commands to devices for which QEMU has a file descriptor, +so that QEMU will not be able to effect persistent reservations +unless it has access to both the socket and the device. + +``pr-manager-helper`` has a single string property, ``path``, which +accepts the path to the helper program's Unix socket. For example, +the following command line defines a ``pr-manager-helper`` object and +attaches it to a SCSI passthrough device:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd + +Alternatively, using ``-blockdev``:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd + +You will also need to ensure that the helper program +:command:`qemu-pr-helper` is running, and that it has been +set up to use the same socket filename as your QEMU commandline +specifies. See the qemu-pr-helper documentation or manpage for +further details. + +--------------------------------------------- +Multipath devices and persistent reservations +--------------------------------------------- + +Proper support of persistent reservation for multipath devices requires +communication with the multipath daemon, so that the reservation is +registered and applied when a path is newly discovered or becomes online +again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist`` +library was available on the system at build time. + +As of August 2017, a reservation key must be specified in ``multipath.conf`` +for ``multipathd`` to check for persistent reservation for newly +discovered paths or reinstated paths. The attribute can be added +to the ``defaults`` section or the ``multipaths`` section; for example:: + + multipaths { + multipath { + wwid XXXXXXXXXXXXXXXX + alias yellow + reservation_key 0x123abc + } + } + +Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede +its usage on regular SCSI devices. diff --git a/docs/system/qemu-block-drivers.rst b/docs/system/qemu-block-drivers.rst new file mode 100644 index 00000000..c2c0114c --- /dev/null +++ b/docs/system/qemu-block-drivers.rst @@ -0,0 +1,24 @@ +:orphan: + +============================ +QEMU block drivers reference +============================ + +-------- +Synopsis +-------- + +QEMU block driver reference manual + +----------- +Description +----------- + +.. include:: qemu-block-drivers.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux +user mode emulator invocation. diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc new file mode 100644 index 00000000..dfe5d229 --- /dev/null +++ b/docs/system/qemu-block-drivers.rst.inc @@ -0,0 +1,933 @@ +Disk image file formats +~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU supports many image file formats that can be used with VMs as well as with +any of the tools (like ``qemu-img``). This includes the preferred formats +raw and qcow2 as well as formats that are supported for compatibility with +older QEMU versions or other hypervisors. + +Depending on the image format, different options can be passed to +``qemu-img create`` and ``qemu-img convert`` using the ``-o`` option. +This section describes each format and the options that are supported for it. + +.. program:: image-formats +.. option:: raw + + Raw disk image format. This format has the advantage of + being simple and easily exportable to all other emulators. If your + file system supports *holes* (for example in ext2 or ext3 on + Linux or NTFS on Windows), then only the written sectors will reserve + space. Use ``qemu-img info`` to know the real size used by the + image or ``ls -ls`` on Unix/Linux. + + Supported options: + + .. program:: raw + .. option:: preallocation + + Preallocation mode (allowed values: ``off``, ``falloc``, + ``full``). ``falloc`` mode preallocates space for image by + calling ``posix_fallocate()``. ``full`` mode preallocates space + for image by writing data to underlying storage. This data may or + may not be zero, depending on the storage location. + +.. program:: image-formats +.. option:: qcow2 + + QEMU image format, the most versatile format. Use it to have smaller + images (useful if your filesystem does not supports holes, for example + on Windows), zlib based compression and support of multiple VM + snapshots. + + Supported options: + + .. program:: qcow2 + .. option:: compat + + Determines the qcow2 version to use. ``compat=0.10`` uses the + traditional image format that can be read by any QEMU since 0.10. + ``compat=1.1`` enables image format extensions that only QEMU 1.1 and + newer understand (this is the default). Amongst others, this includes + zero clusters, which allow efficient copy-on-read for sparse images. + + .. option:: backing_file + + File name of a base image (see ``create`` subcommand) + + .. option:: backing_fmt + + Image format of the base image + + .. option:: encryption + + This option is deprecated and equivalent to ``encrypt.format=aes`` + + .. option:: encrypt.format + + If this is set to ``luks``, it requests that the qcow2 payload (not + qcow2 header) be encrypted using the LUKS format. The passphrase to + use to unlock the LUKS key slot is given by the ``encrypt.key-secret`` + parameter. LUKS encryption parameters can be tuned with the other + ``encrypt.*`` parameters. + + If this is set to ``aes``, the image is encrypted with 128-bit AES-CBC. + The encryption key is given by the ``encrypt.key-secret`` parameter. + This encryption format is considered to be flawed by modern cryptography + standards, suffering from a number of design problems: + + - The AES-CBC cipher is used with predictable initialization vectors based + on the sector number. This makes it vulnerable to chosen plaintext attacks + which can reveal the existence of encrypted data. + - The user passphrase is directly used as the encryption key. A poorly + chosen or short passphrase will compromise the security of the encryption. + - In the event of the passphrase being compromised there is no way to + change the passphrase to protect data in any qcow images. The files must + be cloned, using a different encryption passphrase in the new file. The + original file must then be securely erased using a program like shred, + though even this is ineffective with many modern storage technologies. + + The use of this is no longer supported in system emulators. Support only + remains in the command line utilities, for the purposes of data liberation + and interoperability with old versions of QEMU. The ``luks`` format + should be used instead. + + .. option:: encrypt.key-secret + + Provides the ID of a ``secret`` object that contains the passphrase + (``encrypt.format=luks``) or encryption key (``encrypt.format=aes``). + + .. option:: encrypt.cipher-alg + + Name of the cipher algorithm and key length. Currently defaults + to ``aes-256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.cipher-mode + + Name of the encryption mode to use. Currently defaults to ``xts``. + Only used when ``encrypt.format=luks``. + + .. option:: encrypt.ivgen-alg + + Name of the initialization vector generator algorithm. Currently defaults + to ``plain64``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.ivgen-hash-alg + + Name of the hash algorithm to use with the initialization vector generator + (if required). Defaults to ``sha256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.hash-alg + + Name of the hash algorithm to use for PBKDF algorithm + Defaults to ``sha256``. Only used when ``encrypt.format=luks``. + + .. option:: encrypt.iter-time + + Amount of time, in milliseconds, to use for PBKDF algorithm per key slot. + Defaults to ``2000``. Only used when ``encrypt.format=luks``. + + .. option:: cluster_size + + Changes the qcow2 cluster size (must be between 512 and 2M). Smaller cluster + sizes can improve the image file size whereas larger cluster sizes generally + provide better performance. + + .. option:: preallocation + + Preallocation mode (allowed values: ``off``, ``metadata``, ``falloc``, + ``full``). An image with preallocated metadata is initially larger but can + improve performance when the image needs to grow. ``falloc`` and ``full`` + preallocations are like the same options of ``raw`` format, but sets up + metadata also. + + .. option:: lazy_refcounts + + If this option is set to ``on``, reference count updates are postponed with + the goal of avoiding metadata I/O and improving performance. This is + particularly interesting with :option:`cache=writethrough` which doesn't batch + metadata updates. The tradeoff is that after a host crash, the reference count + tables must be rebuilt, i.e. on the next open an (automatic) ``qemu-img + check -r all`` is required, which may take some time. + + This option can only be enabled if ``compat=1.1`` is specified. + + .. option:: nocow + + If this option is set to ``on``, it will turn off COW of the file. It's only + valid on btrfs, no effect on other file systems. + + Btrfs has low performance when hosting a VM image file, even more + when the guest on the VM also using btrfs as file system. Turning off + COW is a way to mitigate this bad performance. Generally there are two + ways to turn off COW on btrfs: + + - Disable it by mounting with nodatacow, then all newly created files + will be NOCOW. + - For an empty file, add the NOCOW file attribute. That's what this + option does. + + Note: this option is only valid to new or empty files. If there is + an existing file which is COW and has data blocks already, it couldn't + be changed to NOCOW by setting ``nocow=on``. One can issue ``lsattr + filename`` to check if the NOCOW flag is set or not (Capital 'C' is + NOCOW flag). + +.. program:: image-formats +.. option:: qed + + Old QEMU image format with support for backing files and compact image files + (when your filesystem or transport medium does not support holes). + + When converting QED images to qcow2, you might want to consider using the + ``lazy_refcounts=on`` option to get a more QED-like behaviour. + + Supported options: + + .. program:: qed + .. option:: backing_file + + File name of a base image (see ``create`` subcommand). + + .. option:: backing_fmt + + Image file format of backing file (optional). Useful if the format cannot be + autodetected because it has no header, like some vhd/vpc files. + + .. option:: cluster_size + + Changes the cluster size (must be power-of-2 between 4K and 64K). Smaller + cluster sizes can improve the image file size whereas larger cluster sizes + generally provide better performance. + + .. option:: table_size + + Changes the number of clusters per L1/L2 table (must be + power-of-2 between 1 and 16). There is normally no need to + change this value but this option can between used for + performance benchmarking. + +.. program:: image-formats +.. option:: qcow + + Old QEMU image format with support for backing files, compact image files, + encryption and compression. + + Supported options: + + .. program:: qcow + .. option:: backing_file + + File name of a base image (see ``create`` subcommand) + + .. option:: encryption + + This option is deprecated and equivalent to ``encrypt.format=aes`` + + .. option:: encrypt.format + + If this is set to ``aes``, the image is encrypted with 128-bit AES-CBC. + The encryption key is given by the ``encrypt.key-secret`` parameter. + This encryption format is considered to be flawed by modern cryptography + standards, suffering from a number of design problems enumerated previously + against the ``qcow2`` image format. + + The use of this is no longer supported in system emulators. Support only + remains in the command line utilities, for the purposes of data liberation + and interoperability with old versions of QEMU. + + Users requiring native encryption should use the ``qcow2`` format + instead with ``encrypt.format=luks``. + + .. option:: encrypt.key-secret + + Provides the ID of a ``secret`` object that contains the encryption + key (``encrypt.format=aes``). + +.. program:: image-formats +.. option:: luks + + LUKS v1 encryption format, compatible with Linux dm-crypt/cryptsetup + + Supported options: + + .. program:: luks + .. option:: key-secret + + Provides the ID of a ``secret`` object that contains the passphrase. + + .. option:: cipher-alg + + Name of the cipher algorithm and key length. Currently defaults + to ``aes-256``. + + .. option:: cipher-mode + + Name of the encryption mode to use. Currently defaults to ``xts``. + + .. option:: ivgen-alg + + Name of the initialization vector generator algorithm. Currently defaults + to ``plain64``. + + .. option:: ivgen-hash-alg + + Name of the hash algorithm to use with the initialization vector generator + (if required). Defaults to ``sha256``. + + .. option:: hash-alg + + Name of the hash algorithm to use for PBKDF algorithm + Defaults to ``sha256``. + + .. option:: iter-time + + Amount of time, in milliseconds, to use for PBKDF algorithm per key slot. + Defaults to ``2000``. + +.. program:: image-formats +.. option:: vdi + + VirtualBox 1.1 compatible image format. + + Supported options: + + .. program:: vdi + .. option:: static + + If this option is set to ``on``, the image is created with metadata + preallocation. + +.. program:: image-formats +.. option:: vmdk + + VMware 3 and 4 compatible image format. + + Supported options: + + .. program: vmdk + .. option:: backing_file + + File name of a base image (see ``create`` subcommand). + + .. option:: compat6 + + Create a VMDK version 6 image (instead of version 4) + + .. option:: hwversion + + Specify vmdk virtual hardware version. Compat6 flag cannot be enabled + if hwversion is specified. + + .. option:: subformat + + Specifies which VMDK subformat to use. Valid options are + ``monolithicSparse`` (default), + ``monolithicFlat``, + ``twoGbMaxExtentSparse``, + ``twoGbMaxExtentFlat`` and + ``streamOptimized``. + +.. program:: image-formats +.. option:: vpc + + VirtualPC compatible image format (VHD). + + Supported options: + + .. program:: vpc + .. option:: subformat + + Specifies which VHD subformat to use. Valid options are + ``dynamic`` (default) and ``fixed``. + +.. program:: image-formats +.. option:: VHDX + + Hyper-V compatible image format (VHDX). + + Supported options: + + .. program:: VHDX + .. option:: subformat + + Specifies which VHDX subformat to use. Valid options are + ``dynamic`` (default) and ``fixed``. + + .. option:: block_state_zero + + Force use of payload blocks of type 'ZERO'. Can be set to ``on`` (default) + or ``off``. When set to ``off``, new blocks will be created as + ``PAYLOAD_BLOCK_NOT_PRESENT``, which means parsers are free to return + arbitrary data for those blocks. Do not set to ``off`` when using + ``qemu-img convert`` with ``subformat=dynamic``. + + .. option:: block_size + + Block size; min 1 MB, max 256 MB. 0 means auto-calculate based on + image size. + + .. option:: log_size + + Log size; min 1 MB. + +Read-only formats +~~~~~~~~~~~~~~~~~ + +More disk image file formats are supported in a read-only mode. + +.. program:: image-formats +.. option:: bochs + + Bochs images of ``growing`` type. + +.. program:: image-formats +.. option:: cloop + + Linux Compressed Loop image, useful only to reuse directly compressed + CD-ROM images present for example in the Knoppix CD-ROMs. + +.. program:: image-formats +.. option:: dmg + + Apple disk image. + +.. program:: image-formats +.. option:: parallels + + Parallels disk image format. + +Using host drives +~~~~~~~~~~~~~~~~~ + +In addition to disk image files, QEMU can directly access host +devices. We describe here the usage for QEMU version >= 0.8.3. + +Linux +^^^^^ + +On Linux, you can directly use the host device filename instead of a +disk image filename provided you have enough privileges to access +it. For example, use ``/dev/cdrom`` to access to the CDROM. + +CD + You can specify a CDROM device even if no CDROM is loaded. QEMU has + specific code to detect CDROM insertion or removal. CDROM ejection by + the guest OS is supported. Currently only data CDs are supported. + +Floppy + You can specify a floppy device even if no floppy is loaded. Floppy + removal is currently not detected accurately (if you change floppy + without doing floppy access while the floppy is not loaded, the guest + OS will think that the same floppy is loaded). + Use of the host's floppy device is deprecated, and support for it will + be removed in a future release. + +Hard disks + Hard disks can be used. Normally you must specify the whole disk + (``/dev/hdb`` instead of ``/dev/hdb1``) so that the guest OS can + see it as a partitioned disk. WARNING: unless you know what you do, it + is better to only make READ-ONLY accesses to the hard disk otherwise + you may corrupt your host data (use the ``-snapshot`` command + line option or modify the device permissions accordingly). + +Windows +^^^^^^^ + +CD + The preferred syntax is the drive letter (e.g. ``d:``). The + alternate syntax ``\\.\d:`` is supported. ``/dev/cdrom`` is + supported as an alias to the first CDROM drive. + + Currently there is no specific code to handle removable media, so it + is better to use the ``change`` or ``eject`` monitor commands to + change or eject media. + +Hard disks + Hard disks can be used with the syntax: ``\\.\PhysicalDriveN`` + where *N* is the drive number (0 is the first hard disk). + + WARNING: unless you know what you do, it is better to only make + READ-ONLY accesses to the hard disk otherwise you may corrupt your + host data (use the ``-snapshot`` command line so that the + modifications are written in a temporary file). + +Mac OS X +^^^^^^^^ + +``/dev/cdrom`` is an alias to the first CDROM. + +Currently there is no specific code to handle removable media, so it +is better to use the ``change`` or ``eject`` monitor commands to +change or eject media. + +Virtual FAT disk images +~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU can automatically create a virtual FAT disk image from a +directory tree. In order to use it, just type: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb fat:/my_directory + +Then you access access to all the files in the ``/my_directory`` +directory without having to copy them in a disk image or to export +them via SAMBA or NFS. The default access is *read-only*. + +Floppies can be emulated with the ``:floppy:`` option: + +.. parsed-literal:: + + |qemu_system| linux.img -fda fat:floppy:/my_directory + +A read/write support is available for testing (beta stage) with the +``:rw:`` option: + +.. parsed-literal:: + + |qemu_system| linux.img -fda fat:floppy:rw:/my_directory + +What you should *never* do: + +- use non-ASCII filenames +- use "-snapshot" together with ":rw:" +- expect it to work when loadvm'ing +- write to the FAT directory on the host system while accessing it with the guest system + +NBD access +~~~~~~~~~~ + +QEMU can access directly to block device exported using the Network Block Device +protocol. + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd://my_nbd_server.mydomain.org:1024/ + +If the NBD server is located on the same host, you can use an unix socket instead +of an inet socket: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd+unix://?socket=/tmp/my_socket + +In this case, the block device must be exported using ``qemu-nbd``: + +.. parsed-literal:: + + qemu-nbd --socket=/tmp/my_socket my_disk.qcow2 + +The use of ``qemu-nbd`` allows sharing of a disk between several guests: + +.. parsed-literal:: + + qemu-nbd --socket=/tmp/my_socket --share=2 my_disk.qcow2 + +and then you can use it with two guests: + +.. parsed-literal:: + + |qemu_system| linux1.img -hdb nbd+unix://?socket=/tmp/my_socket + |qemu_system| linux2.img -hdb nbd+unix://?socket=/tmp/my_socket + +If the ``nbd-server`` uses named exports (supported since NBD 2.9.18, or with QEMU's +own embedded NBD server), you must specify an export name in the URI: + +.. parsed-literal:: + + |qemu_system| -cdrom nbd://localhost/debian-500-ppc-netinst + |qemu_system| -cdrom nbd://localhost/openSUSE-11.1-ppc-netinst + +The URI syntax for NBD is supported since QEMU 1.3. An alternative syntax is +also available. Here are some example of the older syntax: + +.. parsed-literal:: + + |qemu_system| linux.img -hdb nbd:my_nbd_server.mydomain.org:1024 + |qemu_system| linux2.img -hdb nbd:unix:/tmp/my_socket + |qemu_system| -cdrom nbd:localhost:10809:exportname=debian-500-ppc-netinst + +iSCSI LUNs +~~~~~~~~~~ + +iSCSI is a popular protocol used to access SCSI devices across a computer +network. + +There are two different ways iSCSI devices can be used by QEMU. + +The first method is to mount the iSCSI LUN on the host, and make it appear as +any other ordinary SCSI device on the host and then to access this device as a +/dev/sd device from QEMU. How to do this differs between host OSes. + +The second method involves using the iSCSI initiator that is built into +QEMU. This provides a mechanism that works the same way regardless of which +host OS you are running QEMU on. This section will describe this second method +of using iSCSI together with QEMU. + +In QEMU, iSCSI devices are described using special iSCSI URLs. URL syntax: + +:: + + iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn-name>/<lun> + +Username and password are optional and only used if your target is set up +using CHAP authentication for access control. +Alternatively the username and password can also be set via environment +variables to have these not show up in the process list: + +:: + + export LIBISCSI_CHAP_USERNAME=<username> + export LIBISCSI_CHAP_PASSWORD=<password> + iscsi://<host>/<target-iqn-name>/<lun> + +Various session related parameters can be set via special options, either +in a configuration file provided via '-readconfig' or directly on the +command line. + +If the initiator-name is not specified qemu will use a default name +of 'iqn.2008-11.org.linux-kvm[:<uuid>'] where <uuid> is the UUID of the +virtual machine. If the UUID is not specified qemu will use +'iqn.2008-11.org.linux-kvm[:<name>'] where <name> is the name of the +virtual machine. + +Setting a specific initiator name to use when logging in to the target: + +:: + + -iscsi initiator-name=iqn.qemu.test:my-initiator + +Controlling which type of header digest to negotiate with the target: + +:: + + -iscsi header-digest=CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + +These can also be set via a configuration file: + +:: + + [iscsi] + user = "CHAP username" + password = "CHAP password" + initiator-name = "iqn.qemu.test:my-initiator" + # header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + header-digest = "CRC32C" + +Setting the target name allows different options for different targets: + +:: + + [iscsi "iqn.target.name"] + user = "CHAP username" + password = "CHAP password" + initiator-name = "iqn.qemu.test:my-initiator" + # header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE + header-digest = "CRC32C" + +How to use a configuration file to set iSCSI configuration options: + +.. parsed-literal:: + + cat >iscsi.conf <<EOF + [iscsi] + user = "me" + password = "my password" + initiator-name = "iqn.qemu.test:my-initiator" + header-digest = "CRC32C" + EOF + + |qemu_system| -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \\ + -readconfig iscsi.conf + +How to set up a simple iSCSI target on loopback and access it via QEMU: +this example shows how to set up an iSCSI target with one CDROM and one DISK +using the Linux STGT software target. This target is available on Red Hat based +systems as the package 'scsi-target-utils'. + +.. parsed-literal:: + + tgtd --iscsi portal=127.0.0.1:3260 + tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.qemu.test + tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1 \\ + -b /IMAGES/disk.img --device-type=disk + tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 2 \\ + -b /IMAGES/cd.iso --device-type=cd + tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL + + |qemu_system| -iscsi initiator-name=iqn.qemu.test:my-initiator \\ + -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \\ + -cdrom iscsi://127.0.0.1/iqn.qemu.test/2 + +GlusterFS disk images +~~~~~~~~~~~~~~~~~~~~~ + +GlusterFS is a user space distributed file system. + +You can boot from the GlusterFS disk image with the command: + +URI: + +.. parsed-literal:: + + |qemu_system| -drive file=gluster[+TYPE]://[HOST}[:PORT]]/VOLUME/PATH + [?socket=...][,file.debug=9][,file.logfile=...] + +JSON: + +.. parsed-literal:: + + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img","debug":9,"logfile":"...", + "server":[{"type":"tcp","host":"...","port":"..."}, + {"type":"unix","socket":"..."}]}}' + +*gluster* is the protocol. + +*TYPE* specifies the transport type used to connect to gluster +management daemon (glusterd). Valid transport types are +tcp and unix. In the URI form, if a transport type isn't specified, +then tcp type is assumed. + +*HOST* specifies the server where the volume file specification for +the given volume resides. This can be either a hostname or an ipv4 address. +If transport type is unix, then *HOST* field should not be specified. +Instead *socket* field needs to be populated with the path to unix domain +socket. + +*PORT* is the port number on which glusterd is listening. This is optional +and if not specified, it defaults to port 24007. If the transport type is unix, +then *PORT* should not be specified. + +*VOLUME* is the name of the gluster volume which contains the disk image. + +*PATH* is the path to the actual disk image that resides on gluster volume. + +*debug* is the logging level of the gluster protocol driver. Debug levels +are 0-9, with 9 being the most verbose, and 0 representing no debugging output. +The default level is 4. The current logging levels defined in the gluster source +are 0 - None, 1 - Emergency, 2 - Alert, 3 - Critical, 4 - Error, 5 - Warning, +6 - Notice, 7 - Info, 8 - Debug, 9 - Trace + +*logfile* is a commandline option to mention log file path which helps in +logging to the specified file and also help in persisting the gfapi logs. The +default is stderr. + +You can create a GlusterFS disk image with the command: + +.. parsed-literal:: + + qemu-img create gluster://HOST/VOLUME/PATH SIZE + +Examples + +.. parsed-literal:: + + |qemu_system| -drive file=gluster://1.2.3.4/testvol/a.img + |qemu_system| -drive file=gluster+tcp://1.2.3.4/testvol/a.img + |qemu_system| -drive file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img + |qemu_system| -drive file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket + |qemu_system| -drive file=gluster+rdma://1.2.3.4:24007/testvol/a.img + |qemu_system| -drive file=gluster://1.2.3.4/testvol/a.img,file.debug=9,file.logfile=/var/log/qemu-gluster.log + |qemu_system| 'json:{"driver":"qcow2", + "file":{"driver":"gluster", + "volume":"testvol","path":"a.img", + "debug":9,"logfile":"/var/log/qemu-gluster.log", + "server":[{"type":"tcp","host":"1.2.3.4","port":24007}, + {"type":"unix","socket":"/var/run/glusterd.socket"}]}}' + |qemu_system| -drive driver=qcow2,file.driver=gluster,file.volume=testvol,file.path=/path/a.img, + file.debug=9,file.logfile=/var/log/qemu-gluster.log, + file.server.0.type=tcp,file.server.0.host=1.2.3.4,file.server.0.port=24007, + file.server.1.type=unix,file.server.1.socket=/var/run/glusterd.socket + +Secure Shell (ssh) disk images +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can access disk images located on a remote ssh server +by using the ssh protocol: + +.. parsed-literal:: + + |qemu_system| -drive file=ssh://[USER@]SERVER[:PORT]/PATH[?host_key_check=HOST_KEY_CHECK] + +Alternative syntax using properties: + +.. parsed-literal:: + + |qemu_system| -drive file.driver=ssh[,file.user=USER],file.host=SERVER[,file.port=PORT],file.path=PATH[,file.host_key_check=HOST_KEY_CHECK] + +*ssh* is the protocol. + +*USER* is the remote user. If not specified, then the local +username is tried. + +*SERVER* specifies the remote ssh server. Any ssh server can be +used, but it must implement the sftp-server protocol. Most Unix/Linux +systems should work without requiring any extra configuration. + +*PORT* is the port number on which sshd is listening. By default +the standard ssh port (22) is used. + +*PATH* is the path to the disk image. + +The optional *HOST_KEY_CHECK* parameter controls how the remote +host's key is checked. The default is ``yes`` which means to use +the local ``.ssh/known_hosts`` file. Setting this to ``no`` +turns off known-hosts checking. Or you can check that the host key +matches a specific fingerprint. The fingerprint can be provided in +``md5``, ``sha1``, or ``sha256`` format, however, it is strongly +recommended to only use ``sha256``, since the other options are +considered insecure by modern standards. The fingerprint value +must be given as a hex encoded string:: + + host_key_check=sha256:04ce2ae89ff4295a6b9c4111640bdcb3297858ee55cb434d9dd88796e93aa795 + +The key string may optionally contain ":" separators between +each pair of hex digits. + +The ``$HOME/.ssh/known_hosts`` file contains the base64 encoded +host keys. These can be converted into the format needed for +QEMU using a command such as:: + + $ for key in `grep 10.33.8.112 known_hosts | awk '{print $3}'` + do + echo $key | base64 -d | sha256sum + done + 6c3aa525beda9dc83eadfbd7e5ba7d976ecb59575d1633c87cd06ed2ed6e366f - + 12214fd9ea5b408086f98ecccd9958609bd9ac7c0ea316734006bc7818b45dc8 - + d36420137bcbd101209ef70c3b15dc07362fbe0fa53c5b135eba6e6afa82f0ce - + +Note that there can be multiple keys present per host, each with +different key ciphers. Care is needed to pick the key fingerprint +that matches the cipher QEMU will negotiate with the remote server. + +Currently authentication must be done using ssh-agent. Other +authentication methods may be supported in future. + +Note: Many ssh servers do not support an ``fsync``-style operation. +The ssh driver cannot guarantee that disk flush requests are +obeyed, and this causes a risk of disk corruption if the remote +server or network goes down during writes. The driver will +print a warning when ``fsync`` is not supported: + +:: + + warning: ssh server ssh.example.com:22 does not support fsync + +With sufficiently new versions of libssh and OpenSSH, ``fsync`` is +supported. + +NVMe disk images +~~~~~~~~~~~~~~~~ + +NVM Express (NVMe) storage controllers can be accessed directly by a userspace +driver in QEMU. This bypasses the host kernel file system and block layers +while retaining QEMU block layer functionalities, such as block jobs, I/O +throttling, image formats, etc. Disk I/O performance is typically higher than +with ``-drive file=/dev/sda`` using either thread pool or linux-aio. + +The controller will be exclusively used by the QEMU process once started. To be +able to share storage between multiple VMs and other applications on the host, +please use the file based protocols. + +Before starting QEMU, bind the host NVMe controller to the host vfio-pci +driver. For example: + +.. parsed-literal:: + + # modprobe vfio-pci + # lspci -n -s 0000:06:0d.0 + 06:0d.0 0401: 1102:0002 (rev 08) + # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind + # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id + + # |qemu_system| -drive file=nvme://HOST:BUS:SLOT.FUNC/NAMESPACE + +Alternative syntax using properties: + +.. parsed-literal:: + + |qemu_system| -drive file.driver=nvme,file.device=HOST:BUS:SLOT.FUNC,file.namespace=NAMESPACE + +*HOST*:*BUS*:*SLOT*.\ *FUNC* is the NVMe controller's PCI device +address on the host. + +*NAMESPACE* is the NVMe namespace number, starting from 1. + +Disk image file locking +~~~~~~~~~~~~~~~~~~~~~~~ + +By default, QEMU tries to protect image files from unexpected concurrent +access, as long as it's supported by the block protocol driver and host +operating system. If multiple QEMU processes (including QEMU emulators and +utilities) try to open the same image with conflicting accessing modes, all but +the first one will get an error. + +This feature is currently supported by the file protocol on Linux with the Open +File Descriptor (OFD) locking API, and can be configured to fall back to POSIX +locking if the POSIX host doesn't support Linux OFD locking. + +To explicitly enable image locking, specify "locking=on" in the file protocol +driver options. If OFD locking is not possible, a warning will be printed and +the POSIX locking API will be used. In this case there is a risk that the lock +will get silently lost when doing hot plugging and block jobs, due to the +shortcomings of the POSIX locking API. + +QEMU transparently handles lock handover during shared storage migration. For +shared virtual disk images between multiple VMs, the "share-rw" device option +should be used. + +By default, the guest has exclusive write access to its disk image. If the +guest can safely share the disk image with other writers the +``-device ...,share-rw=on`` parameter can be used. This is only safe if +the guest is running software, such as a cluster file system, that +coordinates disk accesses to avoid corruption. + +Note that share-rw=on only declares the guest's ability to share the disk. +Some QEMU features, such as image file formats, require exclusive write access +to the disk image and this is unaffected by the share-rw=on option. + +Alternatively, locking can be fully disabled by "locking=off" block device +option. In the command line, the option is usually in the form of +"file.locking=off" as the protocol driver is normally placed as a "file" child +under a format driver. For example: + +:: + + -blockdev driver=qcow2,file.filename=/path/to/image,file.locking=off,file.driver=file + +To check if image locking is active, check the output of the "lslocks" command +on host and see if there are locks held by the QEMU process on the image file. +More than one byte could be locked by the QEMU instance, each byte of which +reflects a particular permission that is acquired or protected by the running +block driver. + +Filter drivers +~~~~~~~~~~~~~~ + +QEMU supports several filter drivers, which don't store any data, but perform +some additional tasks, hooking io requests. + +.. program:: filter-drivers +.. option:: preallocate + + The preallocate filter driver is intended to be inserted between format + and protocol nodes and preallocates some additional space + (expanding the protocol file) when writing past the file’s end. This can be + useful for file-systems with slow allocation. + + Supported options: + + .. program:: preallocate + .. option:: prealloc-align + + On preallocation, align the file length to this value (in bytes), default 1M. + + .. program:: preallocate + .. option:: prealloc-size + + How much to preallocate (in bytes), default 128M. diff --git a/docs/system/qemu-cpu-models.rst b/docs/system/qemu-cpu-models.rst new file mode 100644 index 00000000..5cf6e46f --- /dev/null +++ b/docs/system/qemu-cpu-models.rst @@ -0,0 +1,24 @@ +:orphan: + +================================== +QEMU / KVM CPU model configuration +================================== + +-------- +Synopsis +-------- + +QEMU CPU Modelling Infrastructure manual + +----------- +Description +----------- + +.. include:: cpu-models-x86.rst.inc +.. include:: cpu-models-mips.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux user mode emulator invocation. diff --git a/docs/system/qemu-manpage.rst b/docs/system/qemu-manpage.rst new file mode 100644 index 00000000..c47a4127 --- /dev/null +++ b/docs/system/qemu-manpage.rst @@ -0,0 +1,51 @@ +:orphan: + +.. + This file is the skeleton for the qemu.1 manpage. It mostly + should simply include the .rst.inc files corresponding to the + parts of the documentation that go in the manpage as well as the + HTML manual. + +======================= +QEMU User Documentation +======================= + +-------- +Synopsis +-------- + +.. parsed-literal:: + + |qemu_system| [options] [disk_image] + +----------- +Description +----------- + +.. include:: target-i386-desc.rst.inc + +------- +Options +------- + +disk_image is a raw hard disk image for IDE hard disk 0. Some targets do +not need a disk image. + +.. hxtool-doc:: qemu-options.hx + +.. include:: keys.rst.inc + +.. include:: mux-chardev.rst.inc + +----- +Notes +----- + +.. include:: device-url-syntax.rst.inc + +-------- +See also +-------- + +The HTML documentation of QEMU for more precise information and Linux +user mode emulator invocation. diff --git a/docs/system/quickstart.rst b/docs/system/quickstart.rst new file mode 100644 index 00000000..681678c8 --- /dev/null +++ b/docs/system/quickstart.rst @@ -0,0 +1,21 @@ +.. _pcsys_005fquickstart: + +Quick Start +----------- + +Download and uncompress a PC hard disk image with Linux installed (e.g. +``linux.img``) and type: + +.. parsed-literal:: + + |qemu_system| linux.img + +Linux should boot and give you a prompt. + +Users should be aware the above example elides a lot of the complexity +of setting up a VM with x86_64 specific defaults and assumes the +first non switch argument is a PC compatible disk image with a boot +sector. For a non-x86 system where we emulate a broad range of machine +types, the command lines are generally more explicit in defining the +machine and boot behaviour. You will find more example command lines +in the :ref:`system-targets-ref` section of the manual. diff --git a/docs/system/replay.rst b/docs/system/replay.rst new file mode 100644 index 00000000..31053274 --- /dev/null +++ b/docs/system/replay.rst @@ -0,0 +1,237 @@ +.. _replay: + +.. + Copyright (c) 2010-2022 Institute for System Programming + of the Russian Academy of Sciences. + + This work is licensed under the terms of the GNU GPL, version 2 or later. + See the COPYING file in the top-level directory. + +Record/replay +============= + +Record/replay functions are used for the deterministic replay of qemu execution. +Execution recording writes a non-deterministic events log, which can be later +used for replaying the execution anywhere and for unlimited number of times. +It also supports checkpointing for faster rewind to the specific replay moment. +Execution replaying reads the log and replays all non-deterministic events +including external input, hardware clocks, and interrupts. + +Deterministic replay has the following features: + + * Deterministically replays whole system execution and all contents of + the memory, state of the hardware devices, clocks, and screen of the VM. + * Writes execution log into the file for later replaying for multiple times + on different machines. + * Supports i386, x86_64, ARM, AArch64, Risc-V, MIPS, MIPS64, S390X, Alpha, + PowerPC, PowerPC64, M68000, Microblaze, OpenRISC, Nios II, SPARC, + and Xtensa hardware platforms. + * Performs deterministic replay of all operations with keyboard and mouse + input devices, serial ports, and network. + +Usage of the record/replay: + + * First, record the execution with the following command line: + + .. parsed-literal:: + |qemu_system| \\ + -icount shift=auto,rr=record,rrfile=replay.bin \\ + -drive file=disk.qcow2,if=none,snapshot,id=img-direct \\ + -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \\ + -device ide-hd,drive=img-blkreplay \\ + -netdev user,id=net1 -device rtl8139,netdev=net1 \\ + -object filter-replay,id=replay,netdev=net1 + + * After recording, you can replay it by using another command line: + + .. parsed-literal:: + |qemu_system| \\ + -icount shift=auto,rr=replay,rrfile=replay.bin \\ + -drive file=disk.qcow2,if=none,snapshot,id=img-direct \\ + -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \\ + -device ide-hd,drive=img-blkreplay \\ + -netdev user,id=net1 -device rtl8139,netdev=net1 \\ + -object filter-replay,id=replay,netdev=net1 + + The only difference with recording is changing the rr option + from record to replay. + * Block device images are not actually changed in the recording mode, + because all of the changes are written to the temporary overlay file. + This behavior is enabled by using blkreplay driver. It should be used + for every enabled block device, as described in :ref:`block-label` section. + * ``-net none`` option should be specified when network is not used, + because QEMU adds network card by default. When network is needed, + it should be configured explicitly with replay filter, as described + in :ref:`network-label` section. + * Interaction with audio devices and serial ports are recorded and replayed + automatically when such devices are enabled. + +Core idea +--------- + +Record/replay system is based on saving and replaying non-deterministic +events (e.g. keyboard input) and simulating deterministic ones (e.g. reading +from HDD or memory of the VM). Saving only non-deterministic events makes +log file smaller and simulation faster. + +The following non-deterministic data from peripheral devices is saved into +the log: mouse and keyboard input, network packets, audio controller input, +serial port input, and hardware clocks (they are non-deterministic +too, because their values are taken from the host machine). Inputs from +simulated hardware, memory of VM, software interrupts, and execution of +instructions are not saved into the log, because they are deterministic and +can be replayed by simulating the behavior of virtual machine starting from +initial state. + +Instruction counting +-------------------- + +QEMU should work in icount mode to use record/replay feature. icount was +designed to allow deterministic execution in absence of external inputs +of the virtual machine. Record/replay feature is enabled through ``-icount`` +command-line option, making possible deterministic execution of the machine, +interacting with user or network. + +.. _block-label: + +Block devices +------------- + +Block devices record/replay module intercepts calls of +bdrv coroutine functions at the top of block drivers stack. +To record and replay block operations the drive must be configured +as following: + +.. parsed-literal:: + -drive file=disk.qcow2,if=none,snapshot,id=img-direct + -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay + -device ide-hd,drive=img-blkreplay + +blkreplay driver should be inserted between disk image and virtual driver +controller. Therefore all disk requests may be recorded and replayed. + +.. _snapshotting-label: + +Snapshotting +------------ + +New VM snapshots may be created in replay mode. They can be used later +to recover the desired VM state. All VM states created in replay mode +are associated with the moment of time in the replay scenario. +After recovering the VM state replay will start from that position. + +Default starting snapshot name may be specified with icount field +rrsnapshot as follows: + +.. parsed-literal:: + -icount shift=auto,rr=record,rrfile=replay.bin,rrsnapshot=snapshot_name + +This snapshot is created at start of recording and restored at start +of replaying. It also can be loaded while replaying to roll back +the execution. + +``snapshot`` flag of the disk image must be removed to save the snapshots +in the overlay (or original image) instead of using the temporary overlay. + +.. parsed-literal:: + -drive file=disk.ovl,if=none,id=img-direct + -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay + -device ide-hd,drive=img-blkreplay + +Use QEMU monitor to create additional snapshots. ``savevm <name>`` command +created the snapshot and ``loadvm <name>`` restores it. To prevent corruption +of the original disk image, use overlay files linked to the original images. +Therefore all new snapshots (including the starting one) will be saved in +overlays and the original image remains unchanged. + +When you need to use snapshots with diskless virtual machine, +it must be started with "orphan" qcow2 image. This image will be used +for storing VM snapshots. Here is the example of the command line for this: + +.. parsed-literal:: + |qemu_system| \\ + -icount shift=auto,rr=replay,rrfile=record.bin,rrsnapshot=init \\ + -net none -drive file=empty.qcow2,if=none,id=rr + +``empty.qcow2`` drive does not connected to any virtual block device and used +for VM snapshots only. + +.. _network-label: + +Network devices +--------------- + +Record and replay for network interactions is performed with the network filter. +Each backend must have its own instance of the replay filter as follows: + +.. parsed-literal:: + -netdev user,id=net1 -device rtl8139,netdev=net1 + -object filter-replay,id=replay,netdev=net1 + +Replay network filter is used to record and replay network packets. While +recording the virtual machine this filter puts all packets coming from +the outer world into the log. In replay mode packets from the log are +injected into the network device. All interactions with network backend +in replay mode are disabled. + +Audio devices +------------- + +Audio data is recorded and replay automatically. The command line for recording +and replaying must contain identical specifications of audio hardware, e.g.: + +.. parsed-literal:: + -soundhw ac97 + +Serial ports +------------ + +Serial ports input is recorded and replay automatically. The command lines +for recording and replaying must contain identical number of ports in record +and replay modes, but their backends may differ. +E.g., ``-serial stdio`` in record mode, and ``-serial null`` in replay mode. + +Reverse debugging +----------------- + +Reverse debugging allows "executing" the program in reverse direction. +GDB remote protocol supports "reverse step" and "reverse continue" +commands. The first one steps single instruction backwards in time, +and the second one finds the last breakpoint in the past. + +Recorded executions may be used to enable reverse debugging. QEMU can't +execute the code in backwards direction, but can load a snapshot and +replay forward to find the desired position or breakpoint. + +The following GDB commands are supported: + + - ``reverse-stepi`` (or ``rsi``) - step one instruction backwards + - ``reverse-continue`` (or ``rc``) - find last breakpoint in the past + +Reverse step loads the nearest snapshot and replays the execution until +the required instruction is met. + +Reverse continue may include several passes of examining the execution +between the snapshots. Each of the passes include the following steps: + + #. loading the snapshot + #. replaying to examine the breakpoints + #. if breakpoint or watchpoint was met + + * loading the snapshot again + * replaying to the required breakpoint + + #. else + + * proceeding to the p.1 with the earlier snapshot + +Therefore usage of the reverse debugging requires at least one snapshot +created. This can be done by omitting ``snapshot`` option +for the block drives and adding ``rrsnapshot`` for both record and replay +command lines. +See the :ref:`snapshotting-label` section to learn more about running record/replay +and creating the snapshot in these modes. + +When ``rrsnapshot`` is not used, then snapshot named ``start_debugging`` +created in temporary overlay. This allows using reverse debugging, but with +temporary snapshots (existing within the session). diff --git a/docs/system/riscv/microchip-icicle-kit.rst b/docs/system/riscv/microchip-icicle-kit.rst new file mode 100644 index 00000000..40798b1a --- /dev/null +++ b/docs/system/riscv/microchip-icicle-kit.rst @@ -0,0 +1,149 @@ +Microchip PolarFire SoC Icicle Kit (``microchip-icicle-kit``) +============================================================= + +Microchip PolarFire SoC Icicle Kit integrates a PolarFire SoC, with one +SiFive's E51 plus four U54 cores and many on-chip peripherals and an FPGA. + +For more details about Microchip PolarFire SoC, please see: +https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga + +The Icicle Kit board information can be found here: +https://www.microsemi.com/existing-parts/parts/152514 + +Supported devices +----------------- + +The ``microchip-icicle-kit`` machine supports the following devices: + +* 1 E51 core +* 4 U54 cores +* Core Level Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* L2 Loosely Integrated Memory (L2-LIM) +* DDR memory controller +* 5 MMUARTs +* 1 DMA controller +* 2 GEM Ethernet controllers +* 1 SDHC storage controller + +Boot options +------------ + +The ``microchip-icicle-kit`` machine can start using the standard -bios +functionality for loading its BIOS image, aka Hart Software Services (HSS_). +HSS loads the second stage bootloader U-Boot from an SD card. Then a kernel +can be loaded from U-Boot. It also supports direct kernel booting via the +-kernel option along with the device tree blob via -dtb. When direct kernel +boot is used, the OpenSBI fw_dynamic BIOS image is used to boot a payload +like U-Boot or OS kernel directly. + +The user provided DTB should have the following requirements: + +* The /cpus node should contain at least one subnode for E51 and the number + of subnodes should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" + +QEMU follows below truth table to select which payload to execute: + +===== ========== ========== ======= +-bios -kernel -dtb payload +===== ========== ========== ======= + N N don't care HSS + Y don't care don't care HSS + N Y Y kernel +===== ========== ========== ======= + +The memory is set to 1537 MiB by default which is the minimum required high +memory size by HSS. A sanity check on ram size is performed in the machine +init routine to prompt user to increase the RAM size to > 1537 MiB when less +than 1537 MiB ram is detected. + +Running HSS +----------- + +HSS 2020.12 release is tested at the time of writing. To build an HSS image +that can be booted by the ``microchip-icicle-kit`` machine, type the following +in the HSS source tree: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ cp boards/mpfs-icicle-kit-es/def_config .config + $ make BOARD=mpfs-icicle-kit-es + +Download the official SD card image released by Microchip and prepare it for +QEMU usage: + +.. code-block:: bash + + $ wget ftp://ftpsoc.microsemi.com/outgoing/core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz + $ gunzip core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz + $ qemu-img resize core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic 4G + +Then we can boot the machine by: + +.. code-block:: bash + + $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \ + -bios path/to/hss.bin -sd path/to/sdcard.img \ + -nic user,model=cadence_gem \ + -nic tap,ifname=tap,model=cadence_gem,script=no \ + -display none -serial stdio \ + -chardev socket,id=serial1,path=serial1.sock,server=on,wait=on \ + -serial chardev:serial1 + +With above command line, current terminal session will be used for the first +serial port. Open another terminal window, and use ``minicom`` to connect the +second serial port. + +.. code-block:: bash + + $ minicom -D unix\#serial1.sock + +HSS output is on the first serial port (stdio) and U-Boot outputs on the +second serial port. U-Boot will automatically load the Linux kernel from +the SD card image. + +Direct Kernel Boot +------------------ + +Sometimes we just want to test booting a new kernel, and transforming the +kernel image to the format required by the HSS bootflow is tedious. We can +use '-kernel' for direct kernel booting just like other RISC-V machines do. + +In this mode, the OpenSBI fw_dynamic BIOS image for 'generic' platform is +used to boot an S-mode payload like U-Boot or OS kernel directly. + +For example, the following commands show building a U-Boot image from U-Boot +mainline v2021.07 for the Microchip Icicle Kit board: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make microchip_mpfs_icicle_defconfig + +Then we can boot the machine by: + +.. code-block:: bash + + $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 -m 2G \ + -sd path/to/sdcard.img \ + -nic user,model=cadence_gem \ + -nic tap,ifname=tap,model=cadence_gem,script=no \ + -display none -serial stdio \ + -kernel path/to/u-boot/build/dir/u-boot.bin \ + -dtb path/to/u-boot/build/dir/u-boot.dtb + +CAVEATS: + +* Check the "stdout-path" property in the /chosen node in the DTB to determine + which serial port is used for the serial console, e.g.: if the console is set + to the second serial port, change to use "-serial null -serial stdio". +* The default U-Boot configuration uses CONFIG_OF_SEPARATE hence the ELF image + ``u-boot`` cannot be passed to "-kernel" as it does not contain the DTB hence + ``u-boot.bin`` has to be used which does contain one. To use the ELF image, + we need to change to CONFIG_OF_EMBED or CONFIG_OF_PRIOR_STAGE. + +.. _HSS: https://github.com/polarfire-soc/hart-software-services diff --git a/docs/system/riscv/shakti-c.rst b/docs/system/riscv/shakti-c.rst new file mode 100644 index 00000000..fea57f7b --- /dev/null +++ b/docs/system/riscv/shakti-c.rst @@ -0,0 +1,82 @@ +Shakti C Reference Platform (``shakti_c``) +========================================== + +Shakti C Reference Platform is a reference platform based on arty a7 100t +for the Shakti SoC. + +Shakti SoC is a SoC based on the Shakti C-class processor core. Shakti C +is a 64bit RV64GCSUN processor core. + +For more details on Shakti SoC, please see: +https://gitlab.com/shaktiproject/cores/shakti-soc/-/blob/master/fpga/boards/artya7-100t/c-class/README.rst + +For more info on the Shakti C-class core, please see: +https://c-class.readthedocs.io/en/latest/ + +Supported devices +----------------- + +The ``shakti_c`` machine supports the following devices: + + * 1 C-class core + * Core Level Interruptor (CLINT) + * Platform-Level Interrupt Controller (PLIC) + * 1 UART + +Boot options +------------ + +The ``shakti_c`` machine can start using the standard -bios +functionality for loading the baremetal application or opensbi. + +Boot the machine +---------------- + +Shakti SDK +~~~~~~~~~~ +Shakti SDK can be used to generate the baremetal example UART applications. + +.. code-block:: bash + + $ git clone https://gitlab.com/behindbytes/shakti-sdk.git + $ cd shakti-sdk + $ make software PROGRAM=loopback TARGET=artix7_100t + +Binary would be generated in: + software/examples/uart_applns/loopback/output/loopback.shakti + +You could also download the precompiled example applications using below +commands. + +.. code-block:: bash + + $ wget -c https://gitlab.com/behindbytes/shakti-binaries/-/raw/master/sdk/shakti_sdk_qemu.zip + $ unzip shakti_sdk_qemu.zip + +Then we can run the UART example using: + +.. code-block:: bash + + $ qemu-system-riscv64 -M shakti_c -nographic \ + -bios path/to/shakti_sdk_qemu/loopback.shakti + +OpenSBI +~~~~~~~ +We can also run OpenSBI with Test Payload. + +.. code-block:: bash + + $ git clone https://github.com/riscv/opensbi.git -b v0.9 + $ cd opensbi + $ wget -c https://gitlab.com/behindbytes/shakti-binaries/-/raw/master/dts/shakti.dtb + $ export CROSS_COMPILE=riscv64-unknown-elf- + $ export FW_FDT_PATH=./shakti.dtb + $ make PLATFORM=generic + +fw_payload.elf would be generated in build/platform/generic/firmware/fw_payload.elf. +Boot it using the below qemu command. + +.. code-block:: bash + + $ qemu-system-riscv64 -M shakti_c -nographic \ + -bios path/to/fw_payload.elf diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst new file mode 100644 index 00000000..7b166567 --- /dev/null +++ b/docs/system/riscv/sifive_u.rst @@ -0,0 +1,375 @@ +SiFive HiFive Unleashed (``sifive_u``) +====================================== + +SiFive HiFive Unleashed Development Board is the ultimate RISC-V development +board featuring the Freedom U540 multi-core RISC-V processor. + +Supported devices +----------------- + +The ``sifive_u`` machine supports the following devices: + +* 1 E51 / E31 core +* Up to 4 U54 / U34 cores +* Core Local Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* Power, Reset, Clock, Interrupt (PRCI) +* L2 Loosely Integrated Memory (L2-LIM) +* DDR memory controller +* 2 UARTs +* 1 GEM Ethernet controller +* 1 GPIO controller +* 1 One-Time Programmable (OTP) memory with stored serial number +* 1 DMA controller +* 2 QSPI controllers +* 1 ISSI 25WP256 flash +* 1 SD card in SPI mode +* PWM0 and PWM1 + +Please note the real world HiFive Unleashed board has a fixed configuration of +1 E51 core and 4 U54 core combination and the RISC-V core boots in 64-bit mode. +With QEMU, one can create a machine with 1 E51 core and up to 4 U54 cores. It +is also possible to create a 32-bit variant with the same peripherals except +that the RISC-V cores are replaced by the 32-bit ones (E31 and U34), to help +testing of 32-bit guest software. + +Hardware configuration information +---------------------------------- + +The ``sifive_u`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. Guest software should discover the devices +that are present in the generated DTB instead of using a DTB for the real +hardware, as some of the devices are not modeled by QEMU and trying to access +these devices may cause unexpected behavior. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The /cpus node should contain at least one subnode for E51 and the number + of subnodes should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" if using with OpenSBI BIOS images + +Boot options +------------ + +The ``sifive_u`` machine can start using the standard -kernel functionality +for loading a Linux kernel, a VxWorks kernel, a modified U-Boot bootloader +(S-mode) or ELF executable with the default OpenSBI firmware image as the +-bios. It also supports booting the unmodified U-Boot bootloader using the +standard -bios functionality. + +Machine-specific options +------------------------ + +The following machine-specific options are supported: + +- serial=nnn + + The board serial number. When not given, the default serial number 1 is used. + + SiFive reserves the first 1 KiB of the 16 KiB OTP memory for internal use. + The current usage is only used to store the serial number of the board at + offset 0xfc. U-Boot reads the serial number from the OTP memory, and uses + it to generate a unique MAC address to be programmed to the on-chip GEM + Ethernet controller. When multiple QEMU ``sifive_u`` machines are created + and connected to the same subnet, they all have the same MAC address hence + it creates an unusable network. In such scenario, user should give different + values to serial= when creating different ``sifive_u`` machines. + +- start-in-flash + + When given, QEMU's ROM codes jump to QSPI memory-mapped flash directly. + Otherwise QEMU will jump to DRAM or L2LIM depending on the msel= value. + When not given, it defaults to direct DRAM booting. + +- msel=[6|11] + + Mode Select (MSEL[3:0]) pins value, used to control where to boot from. + + The FU540 SoC supports booting from several sources, which are controlled + using the Mode Select pins on the chip. Typically, the boot process runs + through several stages before it begins execution of user-provided programs. + These stages typically include the following: + + 1. Zeroth Stage Boot Loader (ZSBL), which is contained in an on-chip mask + ROM and provided by QEMU. Note QEMU implemented ROM codes are not the + same as what is programmed in the hardware. The QEMU one is a simplified + version, but it provides the same functionality as the hardware. + 2. First Stage Boot Loader (FSBL), which brings up PLLs and DDR memory. + This is U-Boot SPL. + 3. Second Stage Boot Loader (SSBL), which further initializes additional + peripherals as needed. This is U-Boot proper combined with an OpenSBI + fw_dynamic firmware image. + + msel=6 means FSBL and SSBL are both on the QSPI flash. msel=11 means FSBL + and SSBL are both on the SD card. + +Running Linux kernel +-------------------- + +Linux mainline v5.10 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``sifive_u`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ make defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the ``sifive_u`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +Alternatively, we can use a custom DTB to boot the machine by inserting a CLINT +node in fu540-c000.dtsi in the Linux kernel, + +.. code-block:: none + + clint: clint@2000000 { + compatible = "riscv,clint0"; + interrupts-extended = <&cpu0_intc 3 &cpu0_intc 7 + &cpu1_intc 3 &cpu1_intc 7 + &cpu2_intc 3 &cpu2_intc 7 + &cpu3_intc 3 &cpu3_intc 7 + &cpu4_intc 3 &cpu4_intc 7>; + reg = <0x00 0x2000000 0x00 0x10000>; + }; + +with the following command line options: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 8G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -dtb arch/riscv/boot/dts/sifive/hifive-unleashed-a00.dtb \ + -initrd /path/to/rootfs.ext4 \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``sifive_u`` machine +in 32-bit mode, use the rv32_defconfig configuration. A patch is required to +fix the 32-bit boot issue for Linux kernel v5.10. + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ curl https://patchwork.kernel.org/project/linux-riscv/patch/20201219001356.2887782-1-atish.patra@wdc.com/mbox/ > riscv.patch + $ git am riscv.patch + $ make rv32_defconfig + $ make + +Replace ``qemu-system-riscv64`` with ``qemu-system-riscv32`` in the command +line above to boot the 32-bit Linux kernel. A rootfs image containing 32-bit +applications shall be used in order for kernel to boot to user space. + +Running VxWorks kernel +---------------------- + +VxWorks 7 SR0650 release is tested at the time of writing. To build a 64-bit +VxWorks mainline kernel that can be booted by the ``sifive_u`` machine, simply +create a VxWorks source build project based on the sifive_generic BSP, and a +VxWorks image project to generate the bootable VxWorks image, by following the +BSP documentation instructions. + +A pre-built 64-bit VxWorks 7 image for HiFive Unleashed board is available as +part of the VxWorks SDK for testing as well. Instructions to download the SDK: + +.. code-block:: bash + + $ wget https://labs.windriver.com/downloads/wrsdk-vxworks7-sifive-hifive-1.01.tar.bz2 + $ tar xvf wrsdk-vxworks7-sifive-hifive-1.01.tar.bz2 + $ ls bsps/sifive_generic_1_0_0_0/uboot/uVxWorks + +To boot the VxWorks kernel in QEMU with the ``sifive_u`` machine, use: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -nic tap,ifname=tap0,script=no,downscript=no \ + -kernel /path/to/vxWorks \ + -append "gem(0,0)host:vxWorks h=192.168.200.1 e=192.168.200.2:ffffff00 u=target pw=vxTarget f=0x01" + +It is also possible to test 32-bit VxWorks on the ``sifive_u`` machine. Create +a 32-bit project to build the 32-bit VxWorks image, and use exact the same +command line options with ``qemu-system-riscv32``. + +Running U-Boot +-------------- + +U-Boot mainline v2021.07 release is tested at the time of writing. To build a +U-Boot mainline bootloader that can be booted by the ``sifive_u`` machine, use +the sifive_unleashed_defconfig with similar commands as described above for +Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin + $ make sifive_unleashed_defconfig + +You will get spl/u-boot-spl.bin and u-boot.itb file in the build tree. + +To start U-Boot using the ``sifive_u`` machine, prepare an SPI flash image, or +SD card image that is properly partitioned and populated with correct contents. +genimage_ can be used to generate these images. + +A sample configuration file for a 128 MiB SD card image is: + +.. code-block:: bash + + $ cat genimage_sdcard.cfg + image sdcard.img { + size = 128M + + hdimage { + gpt = true + } + + partition u-boot-spl { + image = "u-boot-spl.bin" + offset = 17K + partition-type-uuid = 5B193300-FC78-40CD-8002-E86C45580B47 + } + + partition u-boot { + image = "u-boot.itb" + offset = 1041K + partition-type-uuid = 2E54B353-1271-4842-806F-E436D6AF6985 + } + } + +SPI flash image has slightly different partition offsets, and the size has to +be 32 MiB to match the ISSI 25WP256 flash on the real board: + +.. code-block:: bash + + $ cat genimage_spi-nor.cfg + image spi-nor.img { + size = 32M + + hdimage { + gpt = true + } + + partition u-boot-spl { + image = "u-boot-spl.bin" + offset = 20K + partition-type-uuid = 5B193300-FC78-40CD-8002-E86C45580B47 + } + + partition u-boot { + image = "u-boot.itb" + offset = 1044K + partition-type-uuid = 2E54B353-1271-4842-806F-E436D6AF6985 + } + } + +Assume U-Boot binaries are put in the same directory as the config file, +we can generate the image by: + +.. code-block:: bash + + $ genimage --config genimage_<boot_src>.cfg --inputpath . + +Boot U-Boot from SD card, by specifying msel=11 and pass the SD card image +to QEMU ``sifive_u`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u,msel=11 -smp 5 -m 8G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl.bin \ + -drive file=/path/to/sdcard.img,if=sd + +Changing msel= value to 6, allows booting U-Boot from the SPI flash: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u,msel=6 -smp 5 -m 8G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl.bin \ + -drive file=/path/to/spi-nor.img,if=mtd + +Note when testing U-Boot, QEMU automatically generated device tree blob is +not used because U-Boot itself embeds device tree blobs for U-Boot SPL and +U-Boot proper. Hence the number of cores and size of memory have to match +the real hardware, ie: 5 cores (-smp 5) and 8 GiB memory (-m 8G). + +Above use case is to run upstream U-Boot for the SiFive HiFive Unleashed +board on QEMU ``sifive_u`` machine out of the box. This allows users to +develop and test the recommended RISC-V boot flow with a real world use +case: ZSBL (in QEMU) loads U-Boot SPL from SD card or SPI flash to L2LIM, +then U-Boot SPL loads the combined payload image of OpenSBI fw_dynamic +firmware and U-Boot proper. + +However sometimes we want to have a quick test of booting U-Boot on QEMU +without the needs of preparing the SPI flash or SD card images, an alternate +way can be used, which is to create a U-Boot S-mode image by modifying the +configuration of U-Boot: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make sifive_unleashed_defconfig + $ make menuconfig + +then manually select the following configuration: + + * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage bootloader DTB + +and unselect the following configuration: + + * Library routines ---> Allow access to binman information in the device tree + +This changes U-Boot to use the QEMU generated device tree blob, and bypass +running the U-Boot SPL stage. + +Boot the 64-bit U-Boot S-mode image directly: + +.. code-block:: bash + + $ qemu-system-riscv64 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +It's possible to create a 32-bit U-Boot S-mode image as well. + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make sifive_unleashed_defconfig + $ make menuconfig + +then manually update the following configuration in U-Boot: + + * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage bootloader DTB + * RISC-V architecture ---> Base ISA ---> RV32I + * Boot options ---> Boot images ---> Text Base ---> 0x80400000 + +and unselect the following configuration: + + * Library routines ---> Allow access to binman information in the device tree + +Use the same command line options to boot the 32-bit U-Boot S-mode image: + +.. code-block:: bash + + $ qemu-system-riscv32 -M sifive_u -smp 5 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +.. _genimage: https://github.com/pengutronix/genimage diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst new file mode 100644 index 00000000..4b16e41d --- /dev/null +++ b/docs/system/riscv/virt.rst @@ -0,0 +1,189 @@ +'virt' Generic Virtual Platform (``virt``) +========================================== + +The ``virt`` board is a platform which does not correspond to any real hardware; +it is designed for use in virtual machines. It is the recommended board type +if you simply want to run a guest such as Linux and do not care about +reproducing the idiosyncrasies and limitations of a particular bit of +real-world hardware. + +Supported devices +----------------- + +The ``virt`` machine supports the following devices: + +* Up to 8 generic RV32GC/RV64GC cores, with optional extensions +* Core Local Interruptor (CLINT) +* Platform-Level Interrupt Controller (PLIC) +* CFI parallel NOR flash memory +* 1 NS16550 compatible UART +* 1 Google Goldfish RTC +* 1 SiFive Test device +* 8 virtio-mmio transport devices +* 1 generic PCIe host bridge +* The fw_cfg device that allows a guest to obtain data from QEMU + +The hypervisor extension has been enabled for the default CPU, so virtual +machines with hypervisor extension can simply be used without explicitly +declaring. + +Hardware configuration information +---------------------------------- + +The ``virt`` machine automatically generates a device tree blob ("dtb") +which it passes to the guest, if there is no ``-dtb`` option. This provides +information about the addresses, interrupt lines and other configuration of +the various devices in the system. Guest software should discover the devices +that are present in the generated DTB. + +If users want to provide their own DTB, they can use the ``-dtb`` option. +These DTBs should have the following requirements: + +* The number of subnodes of the /cpus node should match QEMU's ``-smp`` option +* The /memory reg size should match QEMU’s selected ram_size via ``-m`` +* Should contain a node for the CLINT device with a compatible string + "riscv,clint0" if using with OpenSBI BIOS images + +Boot options +------------ + +The ``virt`` machine can start using the standard -kernel functionality +for loading a Linux kernel, a VxWorks kernel, an S-mode U-Boot bootloader +with the default OpenSBI firmware image as the -bios. It also supports +the recommended RISC-V bootflow: U-Boot SPL (M-mode) loads OpenSBI fw_dynamic +firmware and U-Boot proper (S-mode), using the standard -bios functionality. + +Machine-specific options +------------------------ + +The following machine-specific options are supported: + +- aclint=[on|off] + + When this option is "on", ACLINT devices will be emulated instead of + SiFive CLINT. When not specified, this option is assumed to be "off". + +- aia=[none|aplic|aplic-imsic] + + This option allows selecting interrupt controller defined by the AIA + (advanced interrupt architecture) specification. The "aia=aplic" selects + APLIC (advanced platform level interrupt controller) to handle wired + interrupts whereas the "aia=aplic-imsic" selects APLIC and IMSIC (incoming + message signaled interrupt controller) to handle both wired interrupts and + MSIs. When not specified, this option is assumed to be "none" which selects + SiFive PLIC to handle wired interrupts. + +- aia-guests=nnn + + The number of per-HART VS-level AIA IMSIC pages to be emulated for a guest + having AIA IMSIC (i.e. "aia=aplic-imsic" selected). When not specified, + the default number of per-HART VS-level AIA IMSIC pages is 0. + +Running Linux kernel +-------------------- + +Linux mainline v5.12 release is tested at the time of writing. To build a +Linux mainline kernel that can be booted by the ``virt`` machine in +64-bit mode, simply configure the kernel using the defconfig configuration: + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ make defconfig + $ make + +To boot the newly built Linux kernel in QEMU with the ``virt`` machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel arch/riscv/boot/Image \ + -initrd /path/to/rootfs.cpio \ + -append "root=/dev/ram" + +To build a Linux mainline kernel that can be booted by the ``virt`` machine +in 32-bit mode, use the rv32_defconfig configuration. A patch is required to +fix the 32-bit boot issue for Linux kernel v5.12. + +.. code-block:: bash + + $ export ARCH=riscv + $ export CROSS_COMPILE=riscv64-linux- + $ curl https://patchwork.kernel.org/project/linux-riscv/patch/20210627135117.28641-1-bmeng.cn@gmail.com/mbox/ > riscv.patch + $ git am riscv.patch + $ make rv32_defconfig + $ make + +Replace ``qemu-system-riscv64`` with ``qemu-system-riscv32`` in the command +line above to boot the 32-bit Linux kernel. A rootfs image containing 32-bit +applications shall be used in order for kernel to boot to user space. + +Running U-Boot +-------------- + +U-Boot mainline v2021.04 release is tested at the time of writing. To build an +S-mode U-Boot bootloader that can be booted by the ``virt`` machine, use +the qemu-riscv64_smode_defconfig with similar commands as described above for Linux: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ make qemu-riscv64_smode_defconfig + +Boot the 64-bit U-Boot S-mode image directly: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -kernel /path/to/u-boot.bin + +To test booting U-Boot SPL which in M-mode, which in turn loads a FIT image +that bundles OpenSBI fw_dynamic firmware and U-Boot proper (S-mode) together, +build the U-Boot images using riscv64_spl_defconfig: + +.. code-block:: bash + + $ export CROSS_COMPILE=riscv64-linux- + $ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin + $ make qemu-riscv64_spl_defconfig + +The minimal QEMU commands to run U-Boot SPL are: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -smp 4 -m 2G \ + -display none -serial stdio \ + -bios /path/to/u-boot-spl \ + -device loader,file=/path/to/u-boot.itb,addr=0x80200000 + +To test 32-bit U-Boot images, switch to use qemu-riscv32_smode_defconfig and +riscv32_spl_defconfig builds, and replace ``qemu-system-riscv64`` with +``qemu-system-riscv32`` in the command lines above to boot the 32-bit U-Boot. + +Enabling TPM +------------ + +A TPM device can be connected to the virt board by following the steps below. + +First launch the TPM emulator: + +.. code-block:: bash + + $ swtpm socket --tpm2 -t -d --tpmstate dir=/tmp/tpm \ + --ctrl type=unixio,path=swtpm-sock + +Then launch QEMU with some additional arguments to link a TPM device to the backend: + +.. code-block:: bash + + $ qemu-system-riscv64 \ + ... other args .... \ + -chardev socket,id=chrtpm,path=swtpm-sock \ + -tpmdev emulator,id=tpm0,chardev=chrtpm \ + -device tpm-tis-device,tpmdev=tpm0 + +The TPM device can be seen in the memory tree and the generated device +tree and should be accessible from the guest software. diff --git a/docs/system/s390x/3270.rst b/docs/system/s390x/3270.rst new file mode 100644 index 00000000..0e173b32 --- /dev/null +++ b/docs/system/s390x/3270.rst @@ -0,0 +1,63 @@ +3270 devices +============ + +The 3270 is the classic 'green-screen' console of the mainframes (see the +`IBM 3270 Wikipedia article <https://en.wikipedia.org/wiki/IBM_3270>`__). + +The 3270 data stream is not implemented within QEMU; the device only provides +TN3270 (a telnet extension; see `RFC 854 <https://tools.ietf.org/html/rfc854>`__ +and `RFC 1576 <https://tools.ietf.org/html/rfc1576>`__) and leaves the heavy +lifting to an external 3270 terminal emulator (such as ``x3270``) to make a +single 3270 device available to a guest. Note that this supports basic +features only. + +To provide a 3270 device to a guest, create a ``x-terminal3270`` linked to +a ``tn3270`` chardev. The guest will see a 3270 channel device. In order +to actually be able to use it, attach the ``x3270`` emulator to the chardev. + +Example configuration +--------------------- + +* Make sure that 3270 support is enabled in the guest's Linux kernel. You need + ``CONFIG_TN3270`` and at least one of ``CONFIG_TN3270_TTY`` (for additional + ttys) or ``CONFIG_TN3270_CONSOLE`` (for a 3270 console). + +* Add a ``tn3270`` chardev and a ``x-terminal3270`` to the QEMU command line:: + + -chardev socket,id=ch0,host=0.0.0.0,port=2300,wait=off,server=on,tn3270=on + -device x-terminal3270,chardev=ch0,devno=fe.0.000a,id=terminal0 + +* Start the guest. In the guest, use ``chccwdev -e 0.0.000a`` to enable + the device. + +* On the host, start the ``x3270`` emulator:: + + x3270 <host>:2300 + +* In the guest, locate the 3270 device node under ``/dev/3270/`` (say, + ``tty1``) and start a getty on it:: + + systemctl start serial-getty@3270-tty1.service + + This should get you an additional tty for logging into the guest. + +* If you want to use the 3270 device as the Linux kernel console instead of + an additional tty, you can also append ``conmode=3270 condev=000a`` to + the guest's kernel command line. The kernel then should use the 3270 as + console after the next boot. + +Restrictions +------------ + +3270 support is very basic. In particular: + +* Only one 3270 device is supported. + +* It has only been tested with Linux guests and the x3270 emulator. + +* TLS/SSL is not supported. + +* Resizing on reattach is not supported. + +* Multiple commands in one inbound buffer (for example, when the reset key + is pressed while the network is slow) are not supported. diff --git a/docs/system/s390x/bootdevices.rst b/docs/system/s390x/bootdevices.rst new file mode 100644 index 00000000..1a7a18b4 --- /dev/null +++ b/docs/system/s390x/bootdevices.rst @@ -0,0 +1,108 @@ +Boot devices on s390x +===================== + +Booting with bootindex parameter +-------------------------------- + +For classical mainframe guests (i.e. LPAR or z/VM installations), you always +have to explicitly specify the disk where you want to boot from (or "IPL" from, +in s390x-speak -- IPL means "Initial Program Load"). In particular, there can +also be only one boot device according to the architecture specification, thus +specifying multiple boot devices is not possible (yet). + +So for booting an s390x guest in QEMU, you should always mark the +device where you want to boot from with the ``bootindex`` property, for +example:: + + qemu-system-s390x -drive if=none,id=dr1,file=guest.qcow2 \ + -device virtio-blk,drive=dr1,bootindex=1 + +For booting from a CD-ROM ISO image (which needs to include El-Torito boot +information in order to be bootable), it is recommended to specify a ``scsi-cd`` +device, for example like this:: + + qemu-system-s390x -blockdev file,node-name=c1,filename=... \ + -device virtio-scsi \ + -device scsi-cd,drive=c1,bootindex=1 + +Note that you really have to use the ``bootindex`` property to select the +boot device. The old-fashioned ``-boot order=...`` command of QEMU (and +also ``-boot once=...``) is not supported on s390x. + + +Booting without bootindex parameter +----------------------------------- + +The QEMU guest firmware (the so-called s390-ccw bios) has also some rudimentary +support for scanning through the available block devices. So in case you did +not specify a boot device with the ``bootindex`` property, there is still a +chance that it finds a bootable device on its own and starts a guest operating +system from it. However, this scanning algorithm is still very rough and may +be incomplete, so that it might fail to detect a bootable device in many cases. +It is really recommended to always specify the boot device with the +``bootindex`` property instead. + +This also means that you should avoid the classical short-cut commands like +``-hda``, ``-cdrom`` or ``-drive if=virtio``, since it is not possible to +specify the ``bootindex`` with these commands. Note that the convenience +``-cdrom`` option even does not give you a real (virtio-scsi) CD-ROM device on +s390x. Due to technical limitations in the QEMU code base, you will get a +virtio-blk device with this parameter instead, which might not be the right +device type for installing a Linux distribution via ISO image. It is +recommended to specify a CD-ROM device via ``-device scsi-cd`` (as mentioned +above) instead. + + +Selecting kernels with the ``loadparm`` property +------------------------------------------------ + +The ``s390-ccw-virtio`` machine supports the so-called ``loadparm`` parameter +which can be used to select the kernel on the disk of the guest that the +s390-ccw bios should boot. When starting QEMU, it can be specified like this:: + + qemu-system-s390x -machine s390-ccw-virtio,loadparm=<string> + +The first way to use this parameter is to use the word ``PROMPT`` as the +``<string>`` here. In that case the s390-ccw bios will show a list of +installed kernels on the disk of the guest and ask the user to enter a number +to chose which kernel should be booted -- similar to what can be achieved by +specifying the ``-boot menu=on`` option when starting QEMU. Note that the menu +list will only show the names of the installed kernels when using a DASD-like +disk image with 4k byte sectors. On normal SCSI-style disks with 512-byte +sectors, there is not enough space for the zipl loader on the disk to store +the kernel names, so you only get a list without names here. + +The second way to use this parameter is to use a number in the range from 0 +to 31. The numbers that can be used here correspond to the numbers that are +shown when using the ``PROMPT`` option, and the s390-ccw bios will then try +to automatically boot the kernel that is associated with the given number. +Note that ``0`` can be used to boot the default entry. + + +Booting from a network device +----------------------------- + +Beside the normal guest firmware (which is loaded from the file ``s390-ccw.img`` +in the data directory of QEMU, or via the ``-bios`` option), QEMU ships with +a small TFTP network bootloader firmware for virtio-net-ccw devices, too. This +firmware is loaded from a file called ``s390-netboot.img`` in the QEMU data +directory. In case you want to load it from a different filename instead, +you can specify it via the ``-global s390-ipl.netboot_fw=filename`` +command line option. + +The ``bootindex`` property is especially important for booting via the network. +If you don't specify the ``bootindex`` property here, the network bootloader +firmware code won't get loaded into the guest memory so that the network boot +will fail. For a successful network boot, try something like this:: + + qemu-system-s390x -netdev user,id=n1,tftp=...,bootfile=... \ + -device virtio-net-ccw,netdev=n1,bootindex=1 + +The network bootloader firmware also has basic support for pxelinux.cfg-style +configuration files. See the `PXELINUX Configuration page +<https://wiki.syslinux.org/wiki/index.php?title=PXELINUX#Configuration>`__ +for details how to set up the configuration file on your TFTP server. +The supported configuration file entries are ``DEFAULT``, ``LABEL``, +``KERNEL``, ``INITRD`` and ``APPEND`` (see the `Syslinux Config file syntax +<https://wiki.syslinux.org/wiki/index.php?title=Config>`__ for more +information). diff --git a/docs/system/s390x/css.rst b/docs/system/s390x/css.rst new file mode 100644 index 00000000..3b401611 --- /dev/null +++ b/docs/system/s390x/css.rst @@ -0,0 +1,86 @@ +The virtual channel subsystem +============================= + +QEMU implements a virtual channel subsystem with subchannels, (mostly +functionless) channel paths, and channel devices (virtio-ccw, 3270, and +devices passed via vfio-ccw). It supports multiple subchannel sets (MSS) and +multiple channel subsystems extended (MCSS-E). + +All channel devices support the ``devno`` property, which takes a parameter +in the form ``<cssid>.<ssid>.<device number>``. + +The default channel subsystem image id (``<cssid>``) is ``0xfe``. Devices in +there will show up in channel subsystem image ``0`` to guests that do not +enable MCSS-E. Note that devices with a different cssid will not be visible +if the guest OS does not enable MCSS-E (which is true for all supported guest +operating systems today). + +Supported values for the subchannel set id (``<ssid>``) range from ``0-3``. +Devices with a ssid that is not ``0`` will not be visible if the guest OS +does not enable MSS (any Linux version that supports virtio also enables MSS). +Any device may be put into any subchannel set, there is no restriction by +device type. + +The device number can range from ``0-0xffff``. + +If the ``devno`` property is not specified for a device, QEMU will choose the +next free device number in subchannel set 0, skipping to the next subchannel +set if no more device numbers are free. + +QEMU places a device at the first free subchannel in the specified subchannel +set. If a device is hotunplugged and later replugged, it may appear at a +different subchannel. (This is similar to how z/VM works.) + + +Examples +-------- + +* a virtio-net device, cssid/ssid/devno automatically assigned:: + + -device virtio-net-ccw + + In a Linux guest (without default devices and no other devices specified + prior to this one), this will show up as ``0.0.0000`` under subchannel + ``0.0.0000``. + + The auto-assigned-properties in QEMU (as seen via e.g. ``info qtree``) + would be ``dev_id = "fe.0.0000"`` and ``subch_id = "fe.0.0000"``. + +* a virtio-rng device in subchannel set ``0``:: + + -device virtio-rng-ccw,devno=fe.0.0042 + + If added to the same Linux guest as above, it would show up as ``0.0.0042`` + under subchannel ``0.0.0001``. + + The properties for the device would be ``dev_id = "fe.0.0042"`` and + ``subch_id = "fe.0.0001"``. + +* a virtio-gpu device in subchannel set ``2``:: + + -device virtio-gpu-ccw,devno=fe.2.1111 + + If added to the same Linux guest as above, it would show up as ``0.2.1111`` + under subchannel ``0.2.0000``. + + The properties for the device would be ``dev_id = "fe.2.1111"`` and + ``subch_id = "fe.2.0000"``. + +* a virtio-mouse device in a non-standard channel subsystem image:: + + -device virtio-mouse-ccw,devno=2.0.2222 + + This would not show up in a standard Linux guest. + + The properties for the device would be ``dev_id = "2.0.2222"`` and + ``subch_id = "2.0.0000"``. + +* a virtio-keyboard device in another non-standard channel subsystem image:: + + -device virtio-keyboard-ccw,devno=0.0.1234 + + This would not show up in a standard Linux guest, either, as ``0`` is not + the standard channel subsystem image id. + + The properties for the device would be ``dev_id = "0.0.1234"`` and + ``subch_id = "0.0.0000"``. diff --git a/docs/system/s390x/protvirt.rst b/docs/system/s390x/protvirt.rst new file mode 100644 index 00000000..aee63ed7 --- /dev/null +++ b/docs/system/s390x/protvirt.rst @@ -0,0 +1,67 @@ +Protected Virtualization on s390x +================================= + +The memory and most of the registers of Protected Virtual Machines +(PVMs) are encrypted or inaccessible to the hypervisor, effectively +prohibiting VM introspection when the VM is running. At rest, PVMs are +encrypted and can only be decrypted by the firmware, represented by an +entity called Ultravisor, of specific IBM Z machines. + + +Prerequisites +------------- + +To run PVMs, a machine with the Protected Virtualization feature, as +indicated by the Ultravisor Call facility (stfle bit 158), is +required. The Ultravisor needs to be initialized at boot by setting +``prot_virt=1`` on the host's kernel command line. + +Running PVMs requires using the KVM hypervisor. + +If those requirements are met, the capability ``KVM_CAP_S390_PROTECTED`` +will indicate that KVM can support PVMs on that LPAR. + + +Running a Protected Virtual Machine +----------------------------------- + +To run a PVM you will need to select a CPU model which includes the +``Unpack facility`` (stfle bit 161 represented by the feature +``unpack``/``S390_FEAT_UNPACK``), and add these options to the command line:: + + -object s390-pv-guest,id=pv0 \ + -machine confidential-guest-support=pv0 + +Adding these options will: + +* Ensure the ``unpack`` facility is available +* Enable the IOMMU by default for all I/O devices +* Initialize the PV mechanism + +Passthrough (vfio) devices are currently not supported. + +Host huge page backings are not supported. However guests can use huge +pages as indicated by its facilities. + + +Boot Process +------------ + +A secure guest image can either be loaded from disk or supplied on the +QEMU command line. Booting from disk is done by the unmodified +s390-ccw BIOS. I.e., the bootmap is interpreted, multiple components +are read into memory and control is transferred to one of the +components (zipl stage3). Stage3 does some fixups and then transfers +control to some program residing in guest memory, which is normally +the OS kernel. The secure image has another component prepended +(stage3a) that uses the new diag308 subcodes 8 and 10 to trigger the +transition into secure mode. + +Booting from the image supplied on the QEMU command line requires that +the file passed via -kernel has the same memory layout as would result +from the disk boot. This memory layout includes the encrypted +components (kernel, initrd, cmdline), the stage3a loader and +metadata. In case this boot method is used, the command line +options -initrd and -cmdline are ineffective. The preparation of a PVM +image is done via the ``genprotimg`` tool from the s390-tools +collection. diff --git a/docs/system/s390x/vfio-ap.rst b/docs/system/s390x/vfio-ap.rst new file mode 100644 index 00000000..084ba9c4 --- /dev/null +++ b/docs/system/s390x/vfio-ap.rst @@ -0,0 +1,916 @@ +Adjunct Processor (AP) Device +============================= + +.. contents:: + +Introduction +------------ + +The IBM Adjunct Processor (AP) Cryptographic Facility is comprised +of three AP instructions and from 1 to 256 PCIe cryptographic adapter cards. +These AP devices provide cryptographic functions to all CPUs assigned to a +linux system running in an IBM Z system LPAR. + +On s390x, AP adapter cards are exposed via the AP bus. This document +describes how those cards may be made available to KVM guests using the +VFIO mediated device framework. + +AP Architectural Overview +------------------------- + +In order understand the terminology used in the rest of this document, let's +start with some definitions: + +* AP adapter + + An AP adapter is an IBM Z adapter card that can perform cryptographic + functions. There can be from 0 to 256 adapters assigned to an LPAR depending + on the machine model. Adapters assigned to the LPAR in which a linux host is + running will be available to the linux host. Each adapter is identified by a + number from 0 to 255; however, the maximum adapter number allowed is + determined by machine model. When installed, an AP adapter is accessed by + AP instructions executed by any CPU. + +* AP domain + + An adapter is partitioned into domains. Each domain can be thought of as + a set of hardware registers for processing AP instructions. An adapter can + hold up to 256 domains; however, the maximum domain number allowed is + determined by machine model. Each domain is identified by a number from 0 to + 255. Domains can be further classified into two types: + + * Usage domains are domains that can be accessed directly to process AP + commands + + * Control domains are domains that are accessed indirectly by AP + commands sent to a usage domain to control or change the domain; for + example, to set a secure private key for the domain. + +* AP Queue + + An AP queue is the means by which an AP command-request message is sent to an + AP usage domain inside a specific AP. An AP queue is identified by a tuple + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The + APQI corresponds to a given usage domain number within the adapter. This tuple + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP + instructions include a field containing the APQN to identify the AP queue to + which the AP command-request message is to be sent for processing. + +* AP Instructions: + + There are three AP instructions: + + * NQAP: to enqueue an AP command-request message to a queue + * DQAP: to dequeue an AP command-reply message from a queue + * PQAP: to administer the queues + + AP instructions identify the domain that is targeted to process the AP + command; this must be one of the usage domains. An AP command may modify a + domain that is not one of the usage domains, but the modified domain + must be one of the control domains. + +Start Interpretive Execution (SIE) Instruction +---------------------------------------------- + +A KVM guest is started by executing the Start Interpretive Execution (SIE) +instruction. The SIE state description is a control block that contains the +state information for a KVM guest and is supplied as input to the SIE +instruction. The SIE state description contains a satellite control block called +the Crypto Control Block (CRYCB). The CRYCB contains three fields to identify +the adapters, usage domains and control domains assigned to the KVM guest: + +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned + to the KVM guest. Each bit in the mask, from left to right, corresponds to + an APID from 0-255. If a bit is set, the corresponding adapter is valid for + use by the KVM guest. + +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains + assigned to the KVM guest. Each bit in the mask, from left to right, + corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the + corresponding queue is valid for use by the KVM guest. + +* The AP Domain Mask field is a bit mask that identifies the AP control domains + assigned to the KVM guest. The ADM bit mask controls which domains can be + changed by an AP command-request message sent to a usage domain from the + guest. Each bit in the mask, from left to right, corresponds to a domain from + 0-255. If a bit is set, the corresponding domain can be modified by an AP + command-request message sent to a usage domain. + +If you recall from the description of an AP Queue, AP instructions include +an APQN to identify the AP adapter and AP queue to which an AP command-request +message is to be sent (NQAP and PQAP instructions), or from which a +command-reply message is to be received (DQAP instruction). The validity of an +APQN is defined by the matrix calculated from the APM and AQM; it is the +cross product of all assigned adapter numbers (APM) with all assigned queue +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for +the guest. + +The APQNs can provide secure key functionality - i.e., a private key is stored +on the adapter card for each of its domains - so each APQN must be assigned to +at most one guest or the linux host. + +Example 1: Valid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 1, 2 | ++----------+--------+--------+ +| domains | 5, 6 | 7 | ++----------+--------+--------+ + +This is valid because both guests have a unique set of APQNs: + +* Guest1 has APQNs (1,5), (1,6), (2,5) and (2,6); +* Guest2 has APQNs (1,7) and (2,7). + +Example 2: Valid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 3, 4 | ++----------+--------+--------+ +| domains | 5, 6 | 5, 6 | ++----------+--------+--------+ + +This is also valid because both guests have a unique set of APQNs: + +* Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); +* Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) + +Example 3: Invalid configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ++----------+--------+--------+ +| | Guest1 | Guest2 | ++==========+========+========+ +| adapters | 1, 2 | 1 | ++----------+--------+--------+ +| domains | 5, 6 | 6, 7 | ++----------+--------+--------+ + +This is an invalid configuration because both guests have access to +APQN (1,6). + +AP Matrix Configuration on Linux Host +------------------------------------- + +A linux system is a guest of the LPAR in which it is running and has access to +the AP resources configured for the LPAR. The LPAR's AP matrix is +configured via its Activation Profile which can be edited on the HMC. When the +linux system is started, the AP bus will detect the AP devices assigned to the +LPAR and create the following in sysfs:: + + /sys/bus/ap + ... [devices] + ...... xx.yyyy + ...... ... + ...... cardxx + ...... ... + +Where: + +``cardxx`` + is AP adapter number xx (in hex) + +``xx.yyyy`` + is an APQN with xx specifying the APID and yyyy specifying the APQI + +For example, if AP adapters 5 and 6 and domains 4, 71 (0x47), 171 (0xab) and +255 (0xff) are configured for the LPAR, the sysfs representation on the linux +host system would look like this:: + + /sys/bus/ap + ... [devices] + ...... 05.0004 + ...... 05.0047 + ...... 05.00ab + ...... 05.00ff + ...... 06.0004 + ...... 06.0047 + ...... 06.00ab + ...... 06.00ff + ...... card05 + ...... card06 + +A set of default device drivers are also created to control each type of AP +device that can be assigned to the LPAR on which a linux host is running:: + + /sys/bus/ap + ... [drivers] + ...... [cex2acard] for Crypto Express 2/3 accelerator cards + ...... [cex2aqueue] for AP queues served by Crypto Express 2/3 + accelerator cards + ...... [cex4card] for Crypto Express 4/5/6 accelerator and coprocessor + cards + ...... [cex4queue] for AP queues served by Crypto Express 4/5/6 + accelerator and coprocessor cards + ...... [pcixcccard] for Crypto Express 2/3 coprocessor cards + ...... [pcixccqueue] for AP queues served by Crypto Express 2/3 + coprocessor cards + +Binding AP devices to device drivers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are two sysfs files that specify bitmasks marking a subset of the APQN +range as 'usable by the default AP queue device drivers' or 'not usable by the +default device drivers' and thus available for use by the alternate device +driver(s). The sysfs locations of the masks are:: + + /sys/bus/ap/apmask + /sys/bus/ap/aqmask + +The ``apmask`` is a 256-bit mask that identifies a set of AP adapter IDs +(APID). Each bit in the mask, from left to right (i.e., from most significant +to least significant bit in big endian order), corresponds to an APID from +0-255. If a bit is set, the APID is marked as usable only by the default AP +queue device drivers; otherwise, the APID is usable by the vfio_ap +device driver. + +The ``aqmask`` is a 256-bit mask that identifies a set of AP queue indexes +(APQI). Each bit in the mask, from left to right (i.e., from most significant +to least significant bit in big endian order), corresponds to an APQI from +0-255. If a bit is set, the APQI is marked as usable only by the default AP +queue device drivers; otherwise, the APQI is usable by the vfio_ap device +driver. + +Take, for example, the following mask:: + + 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff + +It indicates: + + 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 + belong to the vfio_ap device driver's pool. + +The APQN of each AP queue device assigned to the linux host is checked by the +AP bus against the set of APQNs derived from the cross product of APIDs +and APQIs marked as usable only by the default AP queue device drivers. If a +match is detected, only the default AP queue device drivers will be probed; +otherwise, the vfio_ap device driver will be probed. + +By default, the two masks are set to reserve all APQNs for use by the default +AP queue device drivers. There are two ways the default masks can be changed: + + 1. The sysfs mask files can be edited by echoing a string into the + respective sysfs mask file in one of two formats: + + * An absolute hex string starting with 0x - like "0x12345678" - sets + the mask. If the given string is shorter than the mask, it is padded + with 0s on the right; for example, specifying a mask value of 0x41 is + the same as specifying:: + + 0x4100000000000000000000000000000000000000000000000000000000000000 + + Keep in mind that the mask reads from left to right (i.e., most + significant to least significant bit in big endian order), so the mask + above identifies device numbers 1 and 7 (``01000001``). + + If the string is longer than the mask, the operation is terminated with + an error (EINVAL). + + * Individual bits in the mask can be switched on and off by specifying + each bit number to be switched in a comma separated list. Each bit + number string must be prepended with a (``+``) or minus (``-``) to indicate + the corresponding bit is to be switched on (``+``) or off (``-``). Some + valid values are:: + + "+0" switches bit 0 on + "-13" switches bit 13 off + "+0x41" switches bit 65 on + "-0xff" switches bit 255 off + + The following example:: + + +0,-6,+0x47,-0xf0 + + Switches bits 0 and 71 (0x47) on + Switches bits 6 and 240 (0xf0) off + + Note that the bits not specified in the list remain as they were before + the operation. + + 2. The masks can also be changed at boot time via parameters on the kernel + command line like this:: + + ap.apmask=0xffff ap.aqmask=0x40 + + This would create the following masks: + + apmask:: + + 0xffff000000000000000000000000000000000000000000000000000000000000 + + aqmask:: + + 0x4000000000000000000000000000000000000000000000000000000000000000 + + Resulting in these two pools:: + + default drivers pool: adapter 0-15, domain 1 + alternate drivers pool: adapter 16-255, domains 0, 2-255 + +Configuring an AP matrix for a linux guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The sysfs interfaces for configuring an AP matrix for a guest are built on the +VFIO mediated device framework. To configure an AP matrix for a guest, a +mediated matrix device must first be created for the ``/sys/devices/vfio_ap/matrix`` +device. When the vfio_ap device driver is loaded, it registers with the VFIO +mediated device framework. When the driver registers, the sysfs interfaces for +creating mediated matrix devices is created:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + +A mediated AP matrix device is created by writing a UUID to the attribute file +named ``create``, for example:: + + uuidgen > create + +or + +:: + + echo $uuid > create + +When a mediated AP matrix device is created, a sysfs directory named after +the UUID is created in the ``devices`` subdirectory:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + .................. [$uuid] + +There will also be three sets of attribute files created in the mediated +matrix device's sysfs directory to configure an AP matrix for the +KVM guest:: + + /sys/devices + ... [vfio_ap] + ......[matrix] + ......... [mdev_supported_types] + ............ [vfio_ap-passthrough] + ............... create + ............... [devices] + .................. [$uuid] + ..................... assign_adapter + ..................... assign_control_domain + ..................... assign_domain + ..................... matrix + ..................... unassign_adapter + ..................... unassign_control_domain + ..................... unassign_domain + +``assign_adapter`` + To assign an AP adapter to the mediated matrix device, its APID is written + to the ``assign_adapter`` file. This may be done multiple times to assign more + than one adapter. The APID may be specified using conventional semantics + as a decimal, hexadecimal, or octal number. For example, to assign adapters + 4, 5 and 16 to a mediated matrix device in decimal, hexadecimal and octal + respectively:: + + echo 4 > assign_adapter + echo 0x5 > assign_adapter + echo 020 > assign_adapter + + In order to successfully assign an adapter: + + * The adapter number specified must represent a value from 0 up to the + maximum adapter number allowed by the machine model. If an adapter number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the adapter ID being assigned and the + IDs of the previously assigned domains must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APID bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + * No APQN that can be derived from the adapter ID and the IDs of the + previously assigned domains can be assigned to another mediated matrix + device. If an APQN is assigned to another mediated matrix device, the + operation will terminate with an error (EADDRINUSE). + +``unassign_adapter`` + To unassign an AP adapter, its APID is written to the ``unassign_adapter`` + file. This may also be done multiple times to unassign more than one adapter. + +``assign_domain`` + To assign a usage domain, the domain number is written into the + ``assign_domain`` file. This may be done multiple times to assign more than one + usage domain. The domain number is specified using conventional semantics as + a decimal, hexadecimal, or octal number. For example, to assign usage domains + 4, 8, and 71 to a mediated matrix device in decimal, hexadecimal and octal + respectively:: + + echo 4 > assign_domain + echo 0x8 > assign_domain + echo 0107 > assign_domain + + In order to successfully assign a domain: + + * The domain number specified must represent a value from 0 up to the + maximum domain number allowed by the machine model. If a domain number + higher than the maximum is specified, the operation will terminate with + an error (ENODEV). + + * All APQNs that can be derived from the domain ID being assigned and the IDs + of the previously assigned adapters must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APQI bound to the vfio_ap driver. If no such + APQNs are bound to the driver, the operation will terminate with an + error (EADDRNOTAVAIL). + + * No APQN that can be derived from the domain ID being assigned and the IDs + of the previously assigned adapters can be assigned to another mediated + matrix device. If an APQN is assigned to another mediated matrix device, + the operation will terminate with an error (EADDRINUSE). + +``unassign_domain`` + To unassign a usage domain, the domain number is written into the + ``unassign_domain`` file. This may be done multiple times to unassign more than + one usage domain. + +``assign_control_domain`` + To assign a control domain, the domain number is written into the + ``assign_control_domain`` file. This may be done multiple times to + assign more than one control domain. The domain number may be specified using + conventional semantics as a decimal, hexadecimal, or octal number. For + example, to assign control domains 4, 8, and 71 to a mediated matrix device + in decimal, hexadecimal and octal respectively:: + + echo 4 > assign_domain + echo 0x8 > assign_domain + echo 0107 > assign_domain + + In order to successfully assign a control domain, the domain number + specified must represent a value from 0 up to the maximum domain number + allowed by the machine model. If a control domain number higher than the + maximum is specified, the operation will terminate with an error (ENODEV). + +``unassign_control_domain`` + To unassign a control domain, the domain number is written into the + ``unassign_domain`` file. This may be done multiple times to unassign more than + one control domain. + +Notes: No changes to the AP matrix will be allowed while a guest using +the mediated matrix device is running. Attempts to assign an adapter, +domain or control domain will be rejected and an error (EBUSY) returned. + +Starting a Linux Guest Configured with an AP Matrix +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To provide a mediated matrix device for use by a guest, the following option +must be specified on the QEMU command line:: + + -device vfio_ap,sysfsdev=$path-to-mdev + +The sysfsdev parameter specifies the path to the mediated matrix device. +There are a number of ways to specify this path:: + + /sys/devices/vfio_ap/matrix/$uuid + /sys/bus/mdev/devices/$uuid + /sys/bus/mdev/drivers/vfio_mdev/$uuid + /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid + +When the linux guest is started, the guest will open the mediated +matrix device's file descriptor to get information about the mediated matrix +device. The ``vfio_ap`` device driver will update the APM, AQM, and ADM fields in +the guest's CRYCB with the adapter, usage domain and control domains assigned +via the mediated matrix device's sysfs attribute files. Programs running on the +linux guest will then: + +1. Have direct access to the APQNs derived from the cross product of the AP + adapter numbers (APID) and queue indexes (APQI) specified in the APM and AQM + fields of the guests's CRYCB respectively. These APQNs identify the AP queues + that are valid for use by the guest; meaning, AP commands can be sent by the + guest to any of these queues for processing. + +2. Have authorization to process AP commands to change a control domain + identified in the ADM field of the guest's CRYCB. The AP command must be sent + to a valid APQN (see 1 above). + +CPU model features: + +Three CPU model features are available for controlling guest access to AP +facilities: + +1. AP facilities feature + + The AP facilities feature indicates that AP facilities are installed on the + guest. This feature will be exposed for use only if the AP facilities + are installed on the host system. The feature is s390-specific and is + represented as a parameter of the -cpu option on the QEMU command line:: + + qemu-system-s390x -cpu $model,ap=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest (defaults to the model of + the host system if not specified). + + ``ap=on|off`` + indicates whether AP facilities are installed (on) or not + (off). The default for CPU models zEC12 or newer + is ``ap=on``. AP facilities must be installed on the guest if a + vfio-ap device (``-device vfio-ap,sysfsdev=$path``) is configured + for the guest, or the guest will fail to start. + +2. Query Configuration Information (QCI) facility + + The QCI facility is used by the AP bus running on the guest to query the + configuration of the AP facilities. This facility will be available + only if the QCI facility is installed on the host system. The feature is + s390-specific and is represented as a parameter of the -cpu option on the + QEMU command line:: + + qemu-system-s390x -cpu $model,apqci=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest + + ``apqci=on|off`` + indicates whether the QCI facility is installed (on) or + not (off). The default for CPU models zEC12 or newer + is ``apqci=on``; for older models, QCI will not be installed. + + If QCI is installed (``apqci=on``) but AP facilities are not + (``ap=off``), an error message will be logged, but the guest + will be allowed to start. It makes no sense to have QCI + installed if the AP facilities are not; this is considered + an invalid configuration. + + If the QCI facility is not installed, APQNs with an APQI + greater than 15 will not be detected by the AP bus + running on the guest. + +3. Adjunct Process Facility Test (APFT) facility + + The APFT facility is used by the AP bus running on the guest to test the + AP facilities available for a given AP queue. This facility will be available + only if the APFT facility is installed on the host system. The feature is + s390-specific and is represented as a parameter of the -cpu option on the + QEMU command line:: + + qemu-system-s390x -cpu $model,apft=on|off + + Where: + + ``$model`` + is the CPU model defined for the guest (defaults to the model of + the host system if not specified). + + ``apft=on|off`` + indicates whether the APFT facility is installed (on) or + not (off). The default for CPU models zEC12 and + newer is ``apft=on`` for older models, APFT will not be + installed. + + If APFT is installed (``apft=on``) but AP facilities are not + (``ap=off``), an error message will be logged, but the guest + will be allowed to start. It makes no sense to have APFT + installed if the AP facilities are not; this is considered + an invalid configuration. + + It also makes no sense to turn APFT off because the AP bus + running on the guest will not detect CEX4 and newer devices + without it. Since only CEX4 and newer devices are supported + for guest usage, no AP devices can be made accessible to a + guest started without APFT installed. + +Hot plug a vfio-ap device into a running guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Only one vfio-ap device can be attached to the virtual machine's ap-bus, so a +vfio-ap device can be hot plugged if and only if no vfio-ap device is attached +to the bus already, whether via the QEMU command line or a prior hot plug +action. + +To hot plug a vfio-ap device, use the QEMU ``device_add`` command:: + + (qemu) device_add vfio-ap,sysfsdev="$path-to-mdev",id="$id" + +Where the ``$path-to-mdev`` value specifies the absolute path to a mediated +device to which AP resources to be used by the guest have been assigned. +``$id`` is the name value for the optional id parameter. + +Note that on Linux guests, the AP devices will be created in the +``/sys/bus/ap/devices`` directory when the AP bus subsequently performs its periodic +scan, so there may be a short delay before the AP devices are accessible on the +guest. + +The command will fail if: + +* A vfio-ap device has already been attached to the virtual machine's ap-bus. + +* The CPU model features for controlling guest access to AP facilities are not + enabled (see 'CPU model features' subsection in the previous section). + +Hot unplug a vfio-ap device from a running guest +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A vfio-ap device can be unplugged from a running KVM guest if a vfio-ap device +has been attached to the virtual machine's ap-bus via the QEMU command line +or a prior hot plug action. + +To hot unplug a vfio-ap device, use the QEMU ``device_del`` command:: + + (qemu) device_del "$id" + +Where ``$id`` is the same id that was specified at device creation. + +On a Linux guest, the AP devices will be removed from the ``/sys/bus/ap/devices`` +directory on the guest when the AP bus subsequently performs its periodic scan, +so there may be a short delay before the AP devices are no longer accessible by +the guest. + +The command will fail if the ``$path-to-mdev`` specified on the ``device_del`` command +does not match the value specified when the vfio-ap device was attached to +the virtual machine's ap-bus. + +Example: Configure AP Matrices for Three Linux Guests +----------------------------------------------------- + +Let's now provide an example to illustrate how KVM guests may be given +access to AP facilities. For this example, we will show how to configure +three guests such that executing the lszcrypt command on the guests would +look like this: + +Guest1:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 05 CEX5C CCA-Coproc + 05.0004 CEX5C CCA-Coproc + 05.00ab CEX5C CCA-Coproc + 06 CEX5A Accelerator + 06.0004 CEX5A Accelerator + 06.00ab CEX5C CCA-Coproc + +Guest2:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 05 CEX5A Accelerator + 05.0047 CEX5A Accelerator + 05.00ff CEX5A Accelerator + +Guest3:: + + CARD.DOMAIN TYPE MODE + ------------------------------ + 06 CEX5A Accelerator + 06.0047 CEX5A Accelerator + 06.00ff CEX5A Accelerator + +These are the steps: + +1. Install the vfio_ap module on the linux host. The dependency chain for the + vfio_ap module is: + + * iommu + * s390 + * zcrypt + * vfio + * vfio_mdev + * vfio_mdev_device + * KVM + + To build the vfio_ap module, the kernel build must be configured with the + following Kconfig elements selected: + + * IOMMU_SUPPORT + * S390 + * ZCRYPT + * S390_AP_IOMMU + * VFIO + * VFIO_MDEV + * VFIO_MDEV_DEVICE + * KVM + + If using make menuconfig select the following to build the vfio_ap module:: + -> Device Drivers + -> IOMMU Hardware Support + select S390 AP IOMMU Support + -> VFIO Non-Privileged userspace driver framework + -> Mediated device driver framework + -> VFIO driver for Mediated devices + -> I/O subsystem + -> VFIO support for AP devices + +2. Secure the AP queues to be used by the three guests so that the host can not + access them. To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, + 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device driver, + the corresponding APQNs must be removed from the default queue drivers pool + as follows:: + + echo -5,-6 > /sys/bus/ap/apmask + + echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask + + This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, + 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The + sysfs directory for the vfio_ap device driver will now contain symbolic links + to the AP queue devices bound to it:: + + /sys/bus/ap + ... [drivers] + ...... [vfio_ap] + ......... [05.0004] + ......... [05.0047] + ......... [05.00ab] + ......... [05.00ff] + ......... [06.0004] + ......... [06.0047] + ......... [06.00ab] + ......... [06.00ff] + + Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) + can be bound to the vfio_ap device driver. The reason for this is to + simplify the implementation by not needlessly complicating the design by + supporting older devices that will go out of service in the relatively near + future, and for which there are few older systems on which to test. + + The administrator, therefore, must take care to secure only AP queues that + can be bound to the vfio_ap device driver. The device type for a given AP + queue device can be read from the parent card's sysfs directory. For example, + to see the hardware type of the queue 05.0004:: + + cat /sys/bus/ap/devices/card05/hwtype + + The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the + vfio_ap device driver. + +3. Create the mediated devices needed to configure the AP matrixes for the + three guests and to provide an interface to the vfio_ap driver for + use by the guests:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] (passthrough mediated matrix device type) + ......... create + ......... [devices] + + To create the mediated devices for the three guests:: + + uuidgen > create + uuidgen > create + uuidgen > create + + or + + :: + + echo $uuid1 > create + echo $uuid2 > create + echo $uuid3 > create + + This will create three mediated devices in the [devices] subdirectory named + after the UUID used to create the mediated device. We'll call them $uuid1, + $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] + ......... [devices] + ............ [$uuid1] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + + ............ [$uuid2] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + + ............ [$uuid3] + ............... assign_adapter + ............... assign_control_domain + ............... assign_domain + ............... matrix + ............... unassign_adapter + ............... unassign_control_domain + ............... unassign_domain + +4. The administrator now needs to configure the matrixes for the mediated + devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). + + This is how the matrix is configured for Guest1:: + + echo 5 > assign_adapter + echo 6 > assign_adapter + echo 4 > assign_domain + echo 0xab > assign_domain + + Control domains can similarly be assigned using the assign_control_domain + sysfs file. + + If a mistake is made configuring an adapter, domain or control domain, + you can use the ``unassign_xxx`` interfaces to unassign the adapter, domain or + control domain. + + To display the matrix configuration for Guest1:: + + cat matrix + + The output will display the APQNs in the format ``xx.yyyy``, where xx is + the adapter number and yyyy is the domain number. The output for Guest1 + will look like this:: + + 05.0004 + 05.00ab + 06.0004 + 06.00ab + + This is how the matrix is configured for Guest2:: + + echo 5 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + + This is how the matrix is configured for Guest3:: + + echo 6 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + +5. Start Guest1:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... + +7. Start Guest2:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... + +7. Start Guest3:: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... + +When the guest is shut down, the mediated matrix devices may be removed. + +Using our example again, to remove the mediated matrix device $uuid1:: + + /sys/devices/vfio_ap/matrix/ + ... [mdev_supported_types] + ...... [vfio_ap-passthrough] + ......... [devices] + ............ [$uuid1] + ............... remove + + + echo 1 > remove + +This will remove all of the mdev matrix device's sysfs structures including +the mdev device itself. To recreate and reconfigure the mdev matrix device, +all of the steps starting with step 3 will have to be performed again. Note +that the remove will fail if a guest using the mdev is still running. + +It is not necessary to remove an mdev matrix device, but one may want to +remove it if no guest will use it during the remaining lifetime of the linux +host. If the mdev matrix device is removed, one may want to also reconfigure +the pool of adapters and queues reserved for use by the default drivers. + +Limitations +----------- + +* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN + to the default drivers pool of a queue that is still assigned to a mediated + device in use by a guest. It is incumbent upon the administrator to + ensure there is no mediated device in use by a guest to which the APQN is + assigned lest the host be given access to the private data of the AP queue + device, such as a private key configured specifically for the guest. + +* Dynamically assigning AP resources to or unassigning AP resources from a + mediated matrix device - see `Configuring an AP matrix for a linux guest`_ + section above - while a running guest is using it is currently not supported. + +* Live guest migration is not supported for guests using AP devices. If a guest + is using AP devices, the vfio-ap device configured for the guest must be + unplugged before migrating the guest (see `Hot unplug a vfio-ap device from a + running guest`_ section above.) diff --git a/docs/system/s390x/vfio-ccw.rst b/docs/system/s390x/vfio-ccw.rst new file mode 100644 index 00000000..41e0bad5 --- /dev/null +++ b/docs/system/s390x/vfio-ccw.rst @@ -0,0 +1,77 @@ +Subchannel passthrough via vfio-ccw +=================================== + +vfio-ccw (based upon the mediated vfio device infrastructure) allows to +make certain I/O subchannels and their devices available to a guest. The +host will not interact with those subchannels/devices any more. + +Note that while vfio-ccw should work with most non-QDIO devices, only ECKD +DASDs have really been tested. + +Example configuration +--------------------- + +Step 1: configure the host device +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As every mdev is identified by a uuid, the first step is to obtain one:: + + [root@host ~]# uuidgen + 7e270a25-e163-4922-af60-757fc8ed48c6 + +Note: it is recommended to use the ``mdevctl`` tool for actually configuring +the host device. + +To define the same device as configured below to be started +automatically, use + +:: + + [root@host ~]# driverctl -b css set-override 0.0.0313 vfio_ccw + [root@host ~]# mdevctl define -u 7e270a25-e163-4922-af60-757fc8ed48c6 \ + -p 0.0.0313 -t vfio_ccw-io -a + +If using ``mdevctl`` is not possible or wanted, follow the manual procedure +below. + +* Locate the subchannel for the device (in this example, ``0.0.2b09``):: + + [root@host ~]# lscss | grep 0.0.2b09 | awk '{print $2}' + 0.0.0313 + +* Unbind the subchannel (in this example, ``0.0.0313``) from the standard + I/O subchannel driver and bind it to the vfio-ccw driver:: + + [root@host ~]# echo 0.0.0313 > /sys/bus/css/devices/0.0.0313/driver/unbind + [root@host ~]# echo 0.0.0313 > /sys/bus/css/drivers/vfio_ccw/bind + +* Create the mediated device (identified by the uuid):: + + [root@host ~]# echo "7e270a25-e163-4922-af60-757fc8ed48c6" > \ + /sys/bus/css/devices/0.0.0313/mdev_supported_types/vfio_ccw-io/create + +Step 2: configure QEMU +~~~~~~~~~~~~~~~~~~~~~~ + +* Reference the created mediated device and (optionally) pick a device id to + be presented in the guest (here, ``fe.0.1234``, which will end up visible + in the guest as ``0.0.1234``:: + + -device vfio-ccw,devno=fe.0.1234,sysfsdev=\ + /sys/bus/mdev/devices/7e270a25-e163-4922-af60-757fc8ed48c6 + +* Start the guest. The device (here, ``0.0.1234``) should now be usable:: + + [root@guest ~]# lscss -d 0.0.1234 + Device Subchan. DevType CU Type Use PIM PAM POM CHPID + ---------------------------------------------------------------------- + 0.0.1234 0.0.0007 3390/0e 3990/e9 f0 f0 ff 1a2a3a0a 00000000 + [root@guest ~]# chccwdev -e 0.0.1234 + Setting device 0.0.1234 online + Done + [root@guest ~]# dmesg -t + (...) + dasd-eckd 0.0.1234: A channel path to the device has become operational + dasd-eckd 0.0.1234: New DASD 3390/0E (CU 3990/01) with 10017 cylinders, 15 heads, 224 sectors + dasd-eckd 0.0.1234: DASD with 4 KB/block, 7212240 KB total size, 48 KB/track, compatible disk layout + dasda:VOL1/ 0X2B09: dasda1 diff --git a/docs/system/secrets.rst b/docs/system/secrets.rst new file mode 100644 index 00000000..4a177369 --- /dev/null +++ b/docs/system/secrets.rst @@ -0,0 +1,162 @@ +.. _secret data: + +Providing secret data to QEMU +----------------------------- + +There are a variety of objects in QEMU which require secret data to be provided +by the administrator or management application. For example, network block +devices often require a password, LUKS block devices require a passphrase to +unlock key material, remote desktop services require an access password. +QEMU has a general purpose mechanism for providing secret data to QEMU in a +secure manner, using the ``secret`` object type. + +At startup this can be done using the ``-object secret,...`` command line +argument. At runtime this can be done using the ``object_add`` QMP / HMP +monitor commands. The examples that follow will illustrate use of ``-object`` +command lines, but they all apply equivalentely in QMP / HMP. When creating +a ``secret`` object it must be given a unique ID string. This ID is then +used to identify the object when configuring the thing which need the data. + + +INSECURE: Passing secrets as clear text inline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**The following should never be done in a production environment or on a +multi-user host. Command line arguments are usually visible in the process +listings and are often collected in log files by system monitoring agents +or bug reporting tools. QMP/HMP commands and their arguments are also often +logged and attached to bug reports. This all risks compromising secrets that +are passed inline.** + +For the convenience of people debugging / developing with QEMU, it is possible +to pass secret data inline on the command line. + +:: + + -object secret,id=secvnc0,data=87539319 + + +Again it is possible to provide the data in base64 encoded format, which is +particularly useful if the data contains binary characters that would clash +with argument parsing. + +:: + + -object secret,id=secvnc0,data=ODc1MzkzMTk=,format=base64 + + +**Note: base64 encoding does not provide any security benefit.** + +Passing secrets as clear text via a file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The simplest approach to providing data securely is to use a file to store +the secret: + +:: + + -object secret,id=secvnc0,file=vnc-password.txt + + +In this example the file ``vnc-password.txt`` contains the plain text secret +data. It is important to note that the contents of the file are treated as an +opaque blob. The entire raw file contents is used as the value, thus it is +important not to mistakenly add any trailing newline character in the file if +this newline is not intended to be part of the secret data. + +In some cases it might be more convenient to pass the secret data in base64 +format and have QEMU decode to get the raw bytes before use: + +:: + + -object secret,id=sec0,file=vnc-password.txt,format=base64 + + +The file should generally be given mode ``0600`` or ``0400`` permissions, and +have its user/group ownership set to the same account that the QEMU process +will be launched under. If using mandatory access control such as SELinux, then +the file should be labelled to only grant access to the specific QEMU process +that needs access. This will prevent other processes/users from compromising the +secret data. + + +Passing secrets as cipher text inline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To address the insecurity of passing secrets inline as clear text, it is +possible to configure a second secret as an AES key to use for decrypting +the data. + +The secret used as the AES key must always be configured using the file based +storage mechanism: + +:: + + -object secret,id=secmaster,file=masterkey.data,format=base64 + + +In this case the ``masterkey.data`` file would be initialized with 32 +cryptographically secure random bytes, which are then base64 encoded. +The contents of this file will by used as an AES-256 key to encrypt the +real secret that can now be safely passed to QEMU inline as cipher text + +:: + + -object secret,id=secvnc0,keyid=secmaster,data=BASE64-CIPHERTEXT,iv=BASE64-IV,format=base64 + + +In this example ``BASE64-CIPHERTEXT`` is the result of AES-256-CBC encrypting +the secret with ``masterkey.data`` and then base64 encoding the ciphertext. +The ``BASE64-IV`` data is 16 random bytes which have been base64 encrypted. +These bytes are used as the initialization vector for the AES-256-CBC value. + +A single master key can be used to encrypt all subsequent secrets, **but it is +critical that a different initialization vector is used for every secret**. + +Passing secrets via the Linux keyring +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The earlier mechanisms described are platform agnostic. If using QEMU on a Linux +host, it is further possible to pass secrets to QEMU using the Linux keyring: + +:: + + -object secret_keyring,id=secvnc0,serial=1729 + + +This instructs QEMU to load data from the Linux keyring secret identified by +the serial number ``1729``. It is possible to combine use of the keyring with +other features mentioned earlier such as base64 encoding: + +:: + + -object secret_keyring,id=secvnc0,serial=1729,format=base64 + + +and also encryption with a master key: + +:: + + -object secret_keyring,id=secvnc0,keyid=secmaster,serial=1729,iv=BASE64-IV + + +Best practice +~~~~~~~~~~~~~ + +It is recommended for production deployments to use a master key secret, and +then pass all subsequent inline secrets encrypted with the master key. + +Each QEMU instance must have a distinct master key, and that must be generated +from a cryptographically secure random data source. The master key should be +deleted immediately upon QEMU shutdown. If passing the master key as a file, +the key file must have access control rules applied that restrict access to +just the one QEMU process that is intended to use it. Alternatively the Linux +keyring can be used to pass the master key to QEMU. + +The secrets for individual QEMU device backends must all then be encrypted +with this master key. + +This procedure helps ensure that the individual secrets for QEMU backends will +not be compromised, even if ``-object`` CLI args or ``object_add`` monitor +commands are collected in log files and attached to public bug support tickets. +The only item that needs strongly protecting is the master key file. diff --git a/docs/system/security.rst b/docs/system/security.rst new file mode 100644 index 00000000..f2092c87 --- /dev/null +++ b/docs/system/security.rst @@ -0,0 +1,173 @@ +Security +======== + +Overview +-------- + +This chapter explains the security requirements that QEMU is designed to meet +and principles for securely deploying QEMU. + +Security Requirements +--------------------- + +QEMU supports many different use cases, some of which have stricter security +requirements than others. The community has agreed on the overall security +requirements that users may depend on. These requirements define what is +considered supported from a security perspective. + +Virtualization Use Case +''''''''''''''''''''''' + +The virtualization use case covers cloud and virtual private server (VPS) +hosting, as well as traditional data center and desktop virtualization. These +use cases rely on hardware virtualization extensions to execute guest code +safely on the physical CPU at close-to-native speed. + +The following entities are untrusted, meaning that they may be buggy or +malicious: + +- Guest +- User-facing interfaces (e.g. VNC, SPICE, WebSocket) +- Network protocols (e.g. NBD, live migration) +- User-supplied files (e.g. disk images, kernels, device trees) +- Passthrough devices (e.g. PCI, USB) + +Bugs affecting these entities are evaluated on whether they can cause damage in +real-world use cases and treated as security bugs if this is the case. + +Non-virtualization Use Case +''''''''''''''''''''''''''' + +The non-virtualization use case covers emulation using the Tiny Code Generator +(TCG). In principle the TCG and device emulation code used in conjunction with +the non-virtualization use case should meet the same security requirements as +the virtualization use case. However, for historical reasons much of the +non-virtualization use case code was not written with these security +requirements in mind. + +Bugs affecting the non-virtualization use case are not considered security +bugs at this time. Users with non-virtualization use cases must not rely on +QEMU to provide guest isolation or any security guarantees. + +Architecture +------------ + +This section describes the design principles that ensure the security +requirements are met. + +Guest Isolation +''''''''''''''' + +Guest isolation is the confinement of guest code to the virtual machine. When +guest code gains control of execution on the host this is called escaping the +virtual machine. Isolation also includes resource limits such as throttling of +CPU, memory, disk, or network. Guests must be unable to exceed their resource +limits. + +QEMU presents an attack surface to the guest in the form of emulated devices. +The guest must not be able to gain control of QEMU. Bugs in emulated devices +could allow malicious guests to gain code execution in QEMU. At this point the +guest has escaped the virtual machine and is able to act in the context of the +QEMU process on the host. + +Guests often interact with other guests and share resources with them. A +malicious guest must not gain control of other guests or access their data. +Disk image files and network traffic must be protected from other guests unless +explicitly shared between them by the user. + +Principle of Least Privilege +'''''''''''''''''''''''''''' + +The principle of least privilege states that each component only has access to +the privileges necessary for its function. In the case of QEMU this means that +each process only has access to resources belonging to the guest. + +The QEMU process should not have access to any resources that are inaccessible +to the guest. This way the guest does not gain anything by escaping into the +QEMU process since it already has access to those same resources from within +the guest. + +Following the principle of least privilege immediately fulfills guest isolation +requirements. For example, guest A only has access to its own disk image file +``a.img`` and not guest B's disk image file ``b.img``. + +In reality certain resources are inaccessible to the guest but must be +available to QEMU to perform its function. For example, host system calls are +necessary for QEMU but are not exposed to guests. A guest that escapes into +the QEMU process can then begin invoking host system calls. + +New features must be designed to follow the principle of least privilege. +Should this not be possible for technical reasons, the security risk must be +clearly documented so users are aware of the trade-off of enabling the feature. + +Isolation mechanisms +'''''''''''''''''''' + +Several isolation mechanisms are available to realize this architecture of +guest isolation and the principle of least privilege. With the exception of +Linux seccomp, these mechanisms are all deployed by management tools that +launch QEMU, such as libvirt. They are also platform-specific so they are only +described briefly for Linux here. + +The fundamental isolation mechanism is that QEMU processes must run as +unprivileged users. Sometimes it seems more convenient to launch QEMU as +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a +huge security risk. File descriptor passing can be used to give an otherwise +unprivileged QEMU process access to host devices without running QEMU as root. +It is also possible to launch QEMU as a non-root user and configure UNIX groups +for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes. +Some Linux distros already ship with UNIX groups for these devices by default. + +- SELinux and AppArmor make it possible to confine processes beyond the + traditional UNIX process and file permissions model. They restrict the QEMU + process from accessing processes and files on the host system that are not + needed by QEMU. + +- Resource limits and cgroup controllers provide throughput and utilization + limits on key resources such as CPU time, memory, and I/O bandwidth. + +- Linux namespaces can be used to make process, file system, and other system + resources unavailable to QEMU. A namespaced QEMU process is restricted to only + those resources that were granted to it. + +- Linux seccomp is available via the QEMU ``--sandbox`` option. It disables + system calls that are not needed by QEMU, thereby reducing the host kernel + attack surface. + +Sensitive configurations +------------------------ + +There are aspects of QEMU that can have security implications which users & +management applications must be aware of. + +Monitor console (QMP and HMP) +''''''''''''''''''''''''''''' + +The monitor console (whether used with QMP or HMP) provides an interface +to dynamically control many aspects of QEMU's runtime operation. Many of the +commands exposed will instruct QEMU to access content on the host file system +and/or trigger spawning of external processes. + +For example, the ``migrate`` command allows for the spawning of arbitrary +processes for the purpose of tunnelling the migration data stream. The +``blockdev-add`` command instructs QEMU to open arbitrary files, exposing +their content to the guest as a virtual disk. + +Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, +or Linux namespaces, the monitor console should be considered to have privileges +equivalent to those of the user account QEMU is running under. + +It is further important to consider the security of the character device backend +over which the monitor console is exposed. It needs to have protection against +malicious third parties which might try to make unauthorized connections, or +perform man-in-the-middle attacks. Many of the character device backends do not +satisfy this requirement and so must not be used for the monitor console. + +The general recommendation is that the monitor console should be exposed over +a UNIX domain socket backend to the local host only. Use of the TCP based +character device backend is inappropriate unless configured to use both TLS +encryption and authorization control policy on client connections. + +In summary, the monitor console is considered a privileged control interface to +QEMU and as such should only be made accessible to a trusted management +application or user. diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst new file mode 100644 index 00000000..91ebc26c --- /dev/null +++ b/docs/system/target-arm.rst @@ -0,0 +1,120 @@ +.. _ARM-System-emulator: + +Arm System emulator +------------------- + +QEMU can emulate both 32-bit and 64-bit Arm CPUs. Use the +``qemu-system-aarch64`` executable to simulate a 64-bit Arm machine. +You can use either ``qemu-system-arm`` or ``qemu-system-aarch64`` +to simulate a 32-bit Arm machine: in general, command lines that +work for ``qemu-system-arm`` will behave the same when used with +``qemu-system-aarch64``. + +QEMU has generally good support for Arm guests. It has support for +nearly fifty different machines. The reason we support so many is that +Arm hardware is much more widely varying than x86 hardware. Arm CPUs +are generally built into "system-on-chip" (SoC) designs created by +many different companies with different devices, and these SoCs are +then built into machines which can vary still further even if they use +the same SoC. Even with fifty boards QEMU does not cover more than a +small fraction of the Arm hardware ecosystem. + +The situation for 64-bit Arm is fairly similar, except that we don't +implement so many different machines. + +As well as the more common "A-profile" CPUs (which have MMUs and will +run Linux) QEMU also supports "M-profile" CPUs such as the Cortex-M0, +Cortex-M4 and Cortex-M33 (which are microcontrollers used in very +embedded boards). For most boards the CPU type is fixed (matching what +the hardware has), so typically you don't need to specify the CPU type +by hand, except for special cases like the ``virt`` board. + +Choosing a board model +====================== + +For QEMU's Arm system emulation, you must specify which board +model you want to use with the ``-M`` or ``--machine`` option; +there is no default. + +Because Arm systems differ so much and in fundamental ways, typically +operating system or firmware images intended to run on one machine +will not run at all on any other. This is often surprising for new +users who are used to the x86 world where every system looks like a +standard PC. (Once the kernel has booted, most userspace software +cares much less about the detail of the hardware.) + +If you already have a system image or a kernel that works on hardware +and you want to boot with QEMU, check whether QEMU lists that machine +in its ``-machine help`` output. If it is listed, then you can probably +use that board model. If it is not listed, then unfortunately your image +will almost certainly not boot on QEMU. (You might be able to +extract the filesystem and use that with a different kernel which +boots on a system that QEMU does emulate.) + +If you don't care about reproducing the idiosyncrasies of a particular +bit of hardware, such as small amount of RAM, no PCI or other hard +disk, etc., and just want to run Linux, the best option is to use the +``virt`` board. This is a platform which doesn't correspond to any +real hardware and is designed for use in virtual machines. You'll +need to compile Linux with a suitable configuration for running on +the ``virt`` board. ``virt`` supports PCI, virtio, recent CPUs and +large amounts of RAM. It also supports 64-bit CPUs. + +Board-specific documentation +============================ + +Unfortunately many of the Arm boards QEMU supports are currently +undocumented; you can get a complete list by running +``qemu-system-aarch64 --machine help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + arm/integratorcp + arm/mps2 + arm/musca + arm/realview + arm/sbsa + arm/versatile + arm/vexpress + arm/aspeed + arm/sabrelite + arm/digic + arm/cubieboard + arm/emcraft-sf2 + arm/highbank + arm/musicpal + arm/gumstix + arm/mainstone + arm/kzm + arm/nrf + arm/nseries + arm/nuvoton + arm/imx25-pdk + arm/orangepi + arm/palm + arm/raspi + arm/xscale + arm/collie + arm/sx1 + arm/stellaris + arm/stm32 + arm/virt + arm/xlnx-versal-virt + +Emulated CPU architecture support +================================= + +.. toctree:: + arm/emulation + +Arm CPU features +================ + +.. toctree:: + arm/cpu-features diff --git a/docs/system/target-avr.rst b/docs/system/target-avr.rst new file mode 100644 index 00000000..03d5ab51 --- /dev/null +++ b/docs/system/target-avr.rst @@ -0,0 +1,48 @@ +.. _AVR-System-emulator: + +AVR System emulator +------------------- + +Use the executable ``qemu-system-avr`` to emulate a AVR 8 bit based machine. +These can have one of the following cores: avr1, avr2, avr25, avr3, avr31, +avr35, avr4, avr5, avr51, avr6, avrtiny, xmega2, xmega3, xmega4, xmega5, +xmega6 and xmega7. + +As for now it supports few Arduino boards for educational and testing purposes. +These boards use a ATmega controller, which model is limited to USART & 16-bit +timer devices, enough to run FreeRTOS based applications (like +https://github.com/seharris/qemu-avr-tests/blob/master/free-rtos/Demo/AVR_ATMega2560_GCC/demo.elf +). + +Following are examples of possible usages, assuming demo.elf is compiled for +AVR cpu + +- Continuous non interrupted execution:: + + qemu-system-avr -machine mega2560 -bios demo.elf + +- Continuous non interrupted execution with serial output into telnet window:: + + qemu-system-avr -M mega2560 -bios demo.elf -nographic \ + -serial tcp::5678,server=on,wait=off + + and then in another shell:: + + telnet localhost 5678 + +- Debugging with GDB debugger:: + + qemu-system-avr -machine mega2560 -bios demo.elf -s -S + + and then in another shell:: + + avr-gdb demo.elf + + and then within GDB shell:: + + target remote :1234 + +- Print out executed instructions (that have not been translated by the JIT + compiler yet):: + + qemu-system-avr -machine mega2560 -bios demo.elf -d in_asm diff --git a/docs/system/target-i386-desc.rst.inc b/docs/system/target-i386-desc.rst.inc new file mode 100644 index 00000000..7d1fffac --- /dev/null +++ b/docs/system/target-i386-desc.rst.inc @@ -0,0 +1,73 @@ +The QEMU PC System emulator simulates the following peripherals: + +- i440FX host PCI bridge and PIIX3 PCI to ISA bridge + +- Cirrus CLGD 5446 PCI VGA card or dummy VGA card with Bochs VESA + extensions (hardware level, including all non standard modes). + +- PS/2 mouse and keyboard + +- 2 PCI IDE interfaces with hard disk and CD-ROM support + +- Floppy disk + +- PCI and ISA network adapters + +- Serial ports + +- IPMI BMC, either and internal or external one + +- Creative SoundBlaster 16 sound card + +- ENSONIQ AudioPCI ES1370 sound card + +- Intel 82801AA AC97 Audio compatible sound card + +- Intel HD Audio Controller and HDA codec + +- Adlib (OPL2) - Yamaha YM3812 compatible chip + +- Gravis Ultrasound GF1 sound card + +- CS4231A compatible sound card + +- PC speaker + +- PCI UHCI, OHCI, EHCI or XHCI USB controller and a virtual USB-1.1 + hub. + +SMP is supported with up to 255 CPUs. + +QEMU uses the PC BIOS from the Seabios project and the Plex86/Bochs LGPL +VGA BIOS. + +QEMU uses YM3812 emulation by Tatsuyuki Satoh. + +QEMU uses GUS emulation (GUSEMU32 http://www.deinmeister.de/gusemu/) by +Tibor \"TS\" Schütz. + +Note that, by default, GUS shares IRQ(7) with parallel ports and so QEMU +must be told to not have parallel ports to have working GUS. + +.. parsed-literal:: + + |qemu_system_x86| dos.img -device gus -parallel none + +Alternatively: + +.. parsed-literal:: + + |qemu_system_x86| dos.img -device gus,irq=5 + +Or some other unclaimed IRQ. + +CS4231A is the chip used in Windows Sound System and GUSMAX products + +The PC speaker audio device can be configured using the pcspk-audiodev +machine property, i.e. + +.. parsed-literal:: + + |qemu_system_x86| some.img \ + -audiodev <backend>,id=<name> \ + -machine pcspk-audiodev=<name> diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst new file mode 100644 index 00000000..e64c0130 --- /dev/null +++ b/docs/system/target-i386.rst @@ -0,0 +1,42 @@ +.. _QEMU-PC-System-emulator: + +x86 System emulator +------------------- + +.. _pcsys_005fdevices: + +Board-specific documentation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + i386/microvm + i386/pc + +Architectural features +~~~~~~~~~~~~~~~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + i386/cpu + i386/hyperv + i386/kvm-pv + i386/sgx + i386/amd-memory-encryption + +.. _pcsys_005freq: + +OS requirements +~~~~~~~~~~~~~~~ + +On x86_64 hosts, the default set of CPU features enabled by the KVM +accelerator require the host to be running Linux v4.5 or newer. Red Hat +Enterprise Linux 7 is also supported, since the required +functionality was backported. diff --git a/docs/system/target-m68k.rst b/docs/system/target-m68k.rst new file mode 100644 index 00000000..d28d3b92 --- /dev/null +++ b/docs/system/target-m68k.rst @@ -0,0 +1,21 @@ +.. _ColdFire-System-emulator: + +ColdFire System emulator +------------------------ + +Use the executable ``qemu-system-m68k`` to simulate a ColdFire machine. +The emulator is able to boot a uClinux kernel. + +The M5208EVB emulation includes the following devices: + +- MCF5208 ColdFire V2 Microprocessor (ISA A+ with EMAC). + +- Three Two on-chip UARTs. + +- Fast Ethernet Controller (FEC) + +The AN5206 emulation includes the following devices: + +- MCF5206 ColdFire V2 Microprocessor. + +- Two on-chip UARTs. diff --git a/docs/system/target-mips.rst b/docs/system/target-mips.rst new file mode 100644 index 00000000..138441bd --- /dev/null +++ b/docs/system/target-mips.rst @@ -0,0 +1,130 @@ +.. _MIPS-System-emulator: + +MIPS System emulator +-------------------- + +Four executables cover simulation of 32 and 64-bit MIPS systems in both +endian options, ``qemu-system-mips``, ``qemu-system-mipsel`` +``qemu-system-mips64`` and ``qemu-system-mips64el``. Five different +machine types are emulated: + +- A generic ISA PC-like machine \"mips\" + +- The MIPS Malta prototype board \"malta\" + +- An ACER Pica \"pica61\". This machine needs the 64-bit emulator. + +- MIPS emulator pseudo board \"mipssim\" + +- A MIPS Magnum R4000 machine \"magnum\". This machine needs the + 64-bit emulator. + +The generic emulation is supported by Debian 'Etch' and is able to +install Debian into a virtual disk image. The following devices are +emulated: + +- A range of MIPS CPUs, default is the 24Kf + +- PC style serial port + +- PC style IDE disk + +- NE2000 network card + +The Malta emulation supports the following devices: + +- Core board with MIPS 24Kf CPU and Galileo system controller + +- PIIX4 PCI/USB/SMbus controller + +- The Multi-I/O chip's serial device + +- PCI network cards (PCnet32 and others) + +- Malta FPGA serial device + +- Cirrus (default) or any other PCI VGA graphics card + +The Boston board emulation supports the following devices: + +- Xilinx FPGA, which includes a PCIe root port and an UART + +- Intel EG20T PCH connects the I/O peripherals, but only the SATA bus + is emulated + +The ACER Pica emulation supports: + +- MIPS R4000 CPU + +- PC-style IRQ and DMA controllers + +- PC Keyboard + +- IDE controller + +The MIPS Magnum R4000 emulation supports: + +- MIPS R4000 CPU + +- PC-style IRQ controller + +- PC Keyboard + +- SCSI controller + +- G364 framebuffer + +The Fuloong 2E emulation supports: + +- Loongson 2E CPU + +- Bonito64 system controller as North Bridge + +- VT82C686 chipset as South Bridge + +- RTL8139D as a network card chipset + +The Loongson-3 virtual platform emulation supports: + +- Loongson 3A CPU + +- LIOINTC as interrupt controller + +- GPEX and virtio as peripheral devices + +- Both KVM and TCG supported + +The mipssim pseudo board emulation provides an environment similar to +what the proprietary MIPS emulator uses for running Linux. It supports: + +- A range of MIPS CPUs, default is the 24Kf + +- PC style serial port + +- MIPSnet network emulation + +.. include:: cpu-models-mips.rst.inc + +.. _nanoMIPS-System-emulator: + +nanoMIPS System emulator +~~~~~~~~~~~~~~~~~~~~~~~~ + +Executable ``qemu-system-mipsel`` also covers simulation of 32-bit +nanoMIPS system in little endian mode: + +- nanoMIPS I7200 CPU + +Example of ``qemu-system-mipsel`` usage for nanoMIPS is shown below: + +Download ``<disk_image_file>`` from +https://mipsdistros.mips.com/LinuxDistro/nanomips/buildroot/index.html. + +Download ``<kernel_image_file>`` from +https://mipsdistros.mips.com/LinuxDistro/nanomips/kernels/v4.15.18-432-gb2eb9a8b07a1-20180627102142/index.html. + +Start system emulation of Malta board with nanoMIPS I7200 CPU:: + + qemu-system-mipsel -cpu I7200 -kernel <kernel_image_file> \ + -M malta -serial stdio -m <memory_size> -hda <disk_image_file> \ + -append "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" diff --git a/docs/system/target-openrisc.rst b/docs/system/target-openrisc.rst new file mode 100644 index 00000000..22cb2217 --- /dev/null +++ b/docs/system/target-openrisc.rst @@ -0,0 +1,71 @@ +.. _OpenRISC-System-emulator: + +OpenRISC System emulator +~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU can emulate 32-bit OpenRISC CPUs using the ``qemu-system-or1k`` executable. + +OpenRISC CPUs are generally built into "system-on-chip" (SoC) designs that run +on FPGAs. These SoCs are based on the same core architecture as the or1ksim +(the original OpenRISC instruction level simulator) which QEMU supports. For +this reason QEMU does not need to support many different boards to support the +OpenRISC hardware ecosystem. + +The OpenRISC CPU supported by QEMU is the ``or1200``, it supports an MMU and can +run linux. + +Choosing a board model +====================== + +For QEMU's OpenRISC system emulation, you must specify which board model you +want to use with the ``-M`` or ``--machine`` option; the default machine is +``or1k-sim``. + +If you intend to boot Linux, it is possible to have a single kernel image that +will boot on any of the QEMU machines. To do this one would compile all required +drivers into the kernel. This is possible because QEMU will create a device tree +structure that describes the QEMU machine and pass a pointer to the structure to +the kernel. The kernel can then use this to configure itself for the machine. + +However, typically users will have specific firmware images for a specific machine. + +If you already have a system image or a kernel that works on hardware and you +want to boot with QEMU, check whether QEMU lists that machine in its ``-machine +help`` output. If it is listed, then you can probably use that board model. If +it is not listed, then unfortunately your image will almost certainly not boot +on QEMU. (You might be able to extract the filesystem and use that with a +different kernel which boots on a system that QEMU does emulate.) + +If you don't care about reproducing the idiosyncrasies of a particular +bit of hardware, such as small amount of RAM, no PCI or other hard disk, etc., +and just want to run Linux, the best option is to use the ``virt`` board. This +is a platform which doesn't correspond to any real hardware and is designed for +use in virtual machines. You'll need to compile Linux with a suitable +configuration for running on the ``virt`` board. ``virt`` supports PCI, virtio +and large amounts of RAM. + +Board-specific documentation +============================ + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + openrisc/or1k-sim + openrisc/virt + +Emulated CPU architecture support +================================= + +.. toctree:: + openrisc/emulation + +OpenRISC CPU features +===================== + +.. toctree:: + openrisc/cpu-features diff --git a/docs/system/target-ppc.rst b/docs/system/target-ppc.rst new file mode 100644 index 00000000..4f6eb93b --- /dev/null +++ b/docs/system/target-ppc.rst @@ -0,0 +1,25 @@ +.. _PowerPC-System-emulator: + +PowerPC System emulator +----------------------- + +Board-specific documentation +============================ + +You can get a complete list by running ``qemu-system-ppc64 --machine +help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + ppc/embedded + ppc/powermac + ppc/powernv + ppc/ppce500 + ppc/prep + ppc/pseries diff --git a/docs/system/target-riscv.rst b/docs/system/target-riscv.rst new file mode 100644 index 00000000..89a866e4 --- /dev/null +++ b/docs/system/target-riscv.rst @@ -0,0 +1,86 @@ +.. _RISC-V-System-emulator: + +RISC-V System emulator +====================== + +QEMU can emulate both 32-bit and 64-bit RISC-V CPUs. Use the +``qemu-system-riscv64`` executable to simulate a 64-bit RISC-V machine, +``qemu-system-riscv32`` executable to simulate a 32-bit RISC-V machine. + +QEMU has generally good support for RISC-V guests. It has support for +several different machines. The reason we support so many is that +RISC-V hardware is much more widely varying than x86 hardware. RISC-V +CPUs are generally built into "system-on-chip" (SoC) designs created by +many different companies with different devices, and these SoCs are +then built into machines which can vary still further even if they use +the same SoC. + +For most boards the CPU type is fixed (matching what the hardware has), +so typically you don't need to specify the CPU type by hand, except for +special cases like the ``virt`` board. + +Choosing a board model +---------------------- + +For QEMU's RISC-V system emulation, you must specify which board +model you want to use with the ``-M`` or ``--machine`` option; +there is no default. + +Because RISC-V systems differ so much and in fundamental ways, typically +operating system or firmware images intended to run on one machine +will not run at all on any other. This is often surprising for new +users who are used to the x86 world where every system looks like a +standard PC. (Once the kernel has booted, most user space software +cares much less about the detail of the hardware.) + +If you already have a system image or a kernel that works on hardware +and you want to boot with QEMU, check whether QEMU lists that machine +in its ``-machine help`` output. If it is listed, then you can probably +use that board model. If it is not listed, then unfortunately your image +will almost certainly not boot on QEMU. (You might be able to +extract the file system and use that with a different kernel which +boots on a system that QEMU does emulate.) + +If you don't care about reproducing the idiosyncrasies of a particular +bit of hardware, such as small amount of RAM, no PCI or other hard +disk, etc., and just want to run Linux, the best option is to use the +``virt`` board. This is a platform which doesn't correspond to any +real hardware and is designed for use in virtual machines. You'll +need to compile Linux with a suitable configuration for running on +the ``virt`` board. ``virt`` supports PCI, virtio, recent CPUs and +large amounts of RAM. It also supports 64-bit CPUs. + +Board-specific documentation +---------------------------- + +Unfortunately many of the RISC-V boards QEMU supports are currently +undocumented; you can get a complete list by running +``qemu-system-riscv64 --machine help``, or +``qemu-system-riscv32 --machine help``. + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + :maxdepth: 1 + + riscv/microchip-icicle-kit + riscv/shakti-c + riscv/sifive_u + riscv/virt + +RISC-V CPU firmware +------------------- + +When using the ``sifive_u`` or ``virt`` machine there are three different +firmware boot options: +1. ``-bios default`` - This is the default behaviour if no -bios option +is included. This option will load the default OpenSBI firmware automatically. +The firmware is included with the QEMU release and no user interaction is +required. All a user needs to do is specify the kernel they want to boot +with the -kernel option +2. ``-bios none`` - QEMU will not automatically load any firmware. It is up +to the user to load all the images they need. +3. ``-bios <file>`` - Tells QEMU to load the specified file as the firmware. diff --git a/docs/system/target-rx.rst b/docs/system/target-rx.rst new file mode 100644 index 00000000..4a20a89a --- /dev/null +++ b/docs/system/target-rx.rst @@ -0,0 +1,36 @@ +.. _RX-System-emulator: + +RX System emulator +-------------------- + +Use the executable ``qemu-system-rx`` to simulate RX target (GDB simulator). +This target emulated following devices. + +- R5F562N8 MCU + + - On-chip memory (ROM 512KB, RAM 96KB) + - Interrupt Control Unit (ICUa) + - 8Bit Timer x 1CH (TMR0,1) + - Compare Match Timer x 2CH (CMT0,1) + - Serial Communication Interface x 1CH (SCI0) + +- External memory 16MByte + +Example of ``qemu-system-rx`` usage for RX is shown below: + +Download ``<u-boot_image_file>`` from +https://osdn.net/users/ysato/pf/qemu/dl/u-boot.bin.gz + +Start emulation of rx-virt:: + qemu-system-rx -M gdbsim-r5f562n8 -bios <u-boot_image_file> + +Download ``kernel_image_file`` from +https://osdn.net/users/ysato/pf/qemu/dl/zImage + +Download ``device_tree_blob`` from +https://osdn.net/users/ysato/pf/qemu/dl/rx-virt.dtb + +Start emulation of rx-virt:: + qemu-system-rx -M gdbsim-r5f562n8 \ + -kernel <kernel_image_file> -dtb <device_tree_blob> \ + -append "earlycon" diff --git a/docs/system/target-s390x.rst b/docs/system/target-s390x.rst new file mode 100644 index 00000000..c636f641 --- /dev/null +++ b/docs/system/target-s390x.rst @@ -0,0 +1,35 @@ +.. _s390x-System-emulator: + +s390x System emulator +--------------------- + +QEMU can emulate z/Architecture (in particular, 64 bit) s390x systems +via the ``qemu-system-s390x`` binary. Only one machine type, +``s390-ccw-virtio``, is supported (with versioning for compatibility +handling). + +When using KVM as accelerator, QEMU can emulate CPUs up to the generation +of the host. When using the default cpu model with TCG as accelerator, +QEMU will emulate a subset of z13 cpu features that should be enough to run +distributions built for the z13. + +Device support +============== + +QEMU will not emulate most of the traditional devices found under LPAR or +z/VM; virtio devices (especially using virtio-ccw) make up the bulk of +the available devices. Passthrough of host devices via vfio-pci, vfio-ccw, +or vfio-ap is also available. + +.. toctree:: + s390x/vfio-ap + s390x/css + s390x/3270 + s390x/vfio-ccw + +Architectural features +====================== + +.. toctree:: + s390x/bootdevices + s390x/protvirt diff --git a/docs/system/target-sparc.rst b/docs/system/target-sparc.rst new file mode 100644 index 00000000..b55f8d09 --- /dev/null +++ b/docs/system/target-sparc.rst @@ -0,0 +1,62 @@ +.. _Sparc32-System-emulator: + +Sparc32 System emulator +----------------------- + +Use the executable ``qemu-system-sparc`` to simulate the following Sun4m +architecture machines: + +- SPARCstation 4 + +- SPARCstation 5 + +- SPARCstation 10 + +- SPARCstation 20 + +- SPARCserver 600MP + +- SPARCstation LX + +- SPARCstation Voyager + +- SPARCclassic + +- SPARCbook + +The emulation is somewhat complete. SMP up to 16 CPUs is supported, but +Linux limits the number of usable CPUs to 4. + +QEMU emulates the following sun4m peripherals: + +- IOMMU + +- TCX or cgthree Frame buffer + +- Lance (Am7990) Ethernet + +- Non Volatile RAM M48T02/M48T08 + +- Slave I/O: timers, interrupt controllers, Zilog serial ports, + keyboard and power/reset logic + +- ESP SCSI controller with hard disk and CD-ROM support + +- Floppy drive (not on SS-600MP) + +- CS4231 sound device (only on SS-5, not working yet) + +The number of peripherals is fixed in the architecture. Maximum memory +size depends on the machine type, for SS-5 it is 256MB and for others +2047MB. + +Since version 0.8.2, QEMU uses OpenBIOS https://www.openbios.org/. +OpenBIOS is a free (GPL v2) portable firmware implementation. The goal +is to implement a 100% IEEE 1275-1994 (referred to as Open Firmware) +compliant firmware. + +A sample Linux 2.6 series kernel and ram disk image are available on the +QEMU web site. There are still issues with NetBSD and OpenBSD, but most +kernel versions work. Please note that currently older Solaris kernels +don't work probably due to interface issues between OpenBIOS and +Solaris. diff --git a/docs/system/target-sparc64.rst b/docs/system/target-sparc64.rst new file mode 100644 index 00000000..97e334b9 --- /dev/null +++ b/docs/system/target-sparc64.rst @@ -0,0 +1,37 @@ +.. _Sparc64-System-emulator: + +Sparc64 System emulator +----------------------- + +Use the executable ``qemu-system-sparc64`` to simulate a Sun4u +(UltraSPARC PC-like machine), Sun4v (T1 PC-like machine), or generic +Niagara (T1) machine. The Sun4u emulator is mostly complete, being able +to run Linux, NetBSD and OpenBSD in headless (-nographic) mode. The +Sun4v emulator is still a work in progress. + +The Niagara T1 emulator makes use of firmware and OS binaries supplied +in the S10image/ directory of the OpenSPARC T1 project +http://download.oracle.com/technetwork/systems/opensparc/OpenSPARCT1_Arch.1.5.tar.bz2 +and is able to boot the disk.s10hw2 Solaris image. + +:: + + qemu-system-sparc64 -M niagara -L /path-to/S10image/ \ + -nographic -m 256 \ + -drive if=pflash,readonly=on,file=/S10image/disk.s10hw2 + +QEMU emulates the following peripherals: + +- UltraSparc IIi APB PCI Bridge + +- PCI VGA compatible card with VESA Bochs Extensions + +- PS/2 mouse and keyboard + +- Non Volatile RAM M48T59 + +- PC-compatible serial ports + +- 2 PCI IDE interfaces with hard disk and CD-ROM support + +- Floppy disk diff --git a/docs/system/target-xtensa.rst b/docs/system/target-xtensa.rst new file mode 100644 index 00000000..8d703ad7 --- /dev/null +++ b/docs/system/target-xtensa.rst @@ -0,0 +1,27 @@ +.. _Xtensa-System-emulator: + +Xtensa System emulator +---------------------- + +Two executables cover simulation of both Xtensa endian options, +``qemu-system-xtensa`` and ``qemu-system-xtensaeb``. Two different +machine types are emulated: + +- Xtensa emulator pseudo board \"sim\" + +- Avnet LX60/LX110/LX200 board + +The sim pseudo board emulation provides an environment similar to one +provided by the proprietary Tensilica ISS. It supports: + +- A range of Xtensa CPUs, default is the DC232B + +- Console and filesystem access via semihosting calls + +The Avnet LX60/LX110/LX200 emulation supports: + +- A range of Xtensa CPUs, default is the DC232B + +- 16550 UART + +- OpenCores 10/100 Mbps Ethernet MAC diff --git a/docs/system/targets.rst b/docs/system/targets.rst new file mode 100644 index 00000000..224fadae --- /dev/null +++ b/docs/system/targets.rst @@ -0,0 +1,31 @@ +.. _system-targets-ref: + +QEMU System Emulator Targets +============================ + +QEMU is a generic emulator and it emulates many machines. Most of the +options are similar for all machines. Specific information about the +various targets are mentioned in the following sections. + +Contents: + +.. + This table of contents should be kept sorted alphabetically + by the title text of each file, which isn't the same ordering + as an alphabetical sort by filename. + +.. toctree:: + + target-arm + target-avr + target-m68k + target-mips + target-ppc + target-openrisc + target-riscv + target-rx + target-s390x + target-sparc + target-sparc64 + target-i386 + target-xtensa diff --git a/docs/system/tls.rst b/docs/system/tls.rst new file mode 100644 index 00000000..e284c828 --- /dev/null +++ b/docs/system/tls.rst @@ -0,0 +1,328 @@ +.. _network_005ftls: + +TLS setup for network services +------------------------------ + +Almost all network services in QEMU have the ability to use TLS for +session data encryption, along with x509 certificates for simple client +authentication. What follows is a description of how to generate +certificates suitable for usage with QEMU, and applies to the VNC +server, character devices with the TCP backend, NBD server and client, +and migration server and client. + +At a high level, QEMU requires certificates and private keys to be +provided in PEM format. Aside from the core fields, the certificates +should include various extension data sets, including v3 basic +constraints data, key purpose, key usage and subject alt name. + +The GnuTLS package includes a command called ``certtool`` which can be +used to easily generate certificates and keys in the required format +with expected data present. Alternatively a certificate management +service may be used. + +At a minimum it is necessary to setup a certificate authority, and issue +certificates to each server. If using x509 certificates for +authentication, then each client will also need to be issued a +certificate. + +Assuming that the QEMU network services will only ever be exposed to +clients on a private intranet, there is no need to use a commercial +certificate authority to create certificates. A self-signed CA is +sufficient, and in fact likely to be more secure since it removes the +ability of malicious 3rd parties to trick the CA into mis-issuing certs +for impersonating your services. The only likely exception where a +commercial CA might be desirable is if enabling the VNC websockets +server and exposing it directly to remote browser clients. In such a +case it might be useful to use a commercial CA to avoid needing to +install custom CA certs in the web browsers. + +The recommendation is for the server to keep its certificates in either +``/etc/pki/qemu`` or for unprivileged users in ``$HOME/.pki/qemu``. + +.. _tls_005fgenerate_005fca: + +Setup the Certificate Authority +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This step only needs to be performed once per organization / +organizational unit. First the CA needs a private key. This key must be +kept VERY secret and secure. If this key is compromised the entire trust +chain of the certificates issued with it is lost. + +:: + + # certtool --generate-privkey > ca-key.pem + +To generate a self-signed certificate requires one core piece of +information, the name of the organization. A template file ``ca.info`` +should be populated with the desired data to avoid having to deal with +interactive prompts from certtool:: + + # cat > ca.info <<EOF + cn = Name of your organization + ca + cert_signing_key + EOF + # certtool --generate-self-signed \ + --load-privkey ca-key.pem \ + --template ca.info \ + --outfile ca-cert.pem + +The ``ca`` keyword in the template sets the v3 basic constraints +extension to indicate this certificate is for a CA, while +``cert_signing_key`` sets the key usage extension to indicate this will +be used for signing other keys. The generated ``ca-cert.pem`` file +should be copied to all servers and clients wishing to utilize TLS +support in the VNC server. The ``ca-key.pem`` must not be +disclosed/copied anywhere except the host responsible for issuing +certificates. + +.. _tls_005fgenerate_005fserver: + +Issuing server certificates +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each server (or host) needs to be issued with a key and certificate. +When connecting the certificate is sent to the client which validates it +against the CA certificate. The core pieces of information for a server +certificate are the hostnames and/or IP addresses that will be used by +clients when connecting. The hostname / IP address that the client +specifies when connecting will be validated against the hostname(s) and +IP address(es) recorded in the server certificate, and if no match is +found the client will close the connection. + +Thus it is recommended that the server certificate include both the +fully qualified and unqualified hostnames. If the server will have +permanently assigned IP address(es), and clients are likely to use them +when connecting, they may also be included in the certificate. Both IPv4 +and IPv6 addresses are supported. Historically certificates only +included 1 hostname in the ``CN`` field, however, usage of this field +for validation is now deprecated. Instead modern TLS clients will +validate against the Subject Alt Name extension data, which allows for +multiple entries. In the future usage of the ``CN`` field may be +discontinued entirely, so providing SAN extension data is strongly +recommended. + +On the host holding the CA, create template files containing the +information for each server, and use it to issue server certificates. + +:: + + # cat > server-hostNNN.info <<EOF + organization = Name of your organization + cn = hostNNN.foo.example.com + dns_name = hostNNN + dns_name = hostNNN.foo.example.com + ip_address = 10.0.1.87 + ip_address = 192.8.0.92 + ip_address = 2620:0:cafe::87 + ip_address = 2001:24::92 + tls_www_server + encryption_key + signing_key + EOF + # certtool --generate-privkey > server-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey server-hostNNN-key.pem \ + --template server-hostNNN.info \ + --outfile server-hostNNN-cert.pem + +The ``dns_name`` and ``ip_address`` fields in the template are setting +the subject alt name extension data. The ``tls_www_server`` keyword is +the key purpose extension to indicate this certificate is intended for +usage in a web server. Although QEMU network services are not in fact +HTTP servers (except for VNC websockets), setting this key purpose is +still recommended. The ``encryption_key`` and ``signing_key`` keyword is +the key usage extension to indicate this certificate is intended for +usage in the data session. + +The ``server-hostNNN-key.pem`` and ``server-hostNNN-cert.pem`` files +should now be securely copied to the server for which they were +generated, and renamed to ``server-key.pem`` and ``server-cert.pem`` +when added to the ``/etc/pki/qemu`` directory on the target host. The +``server-key.pem`` file is security sensitive and should be kept +protected with file mode 0600 to prevent disclosure. + +.. _tls_005fgenerate_005fclient: + +Issuing client certificates +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU x509 TLS credential setup defaults to enabling client +verification using certificates, providing a simple authentication +mechanism. If this default is used, each client also needs to be issued +a certificate. The client certificate contains enough metadata to +uniquely identify the client with the scope of the certificate +authority. The client certificate would typically include fields for +organization, state, city, building, etc. + +Once again on the host holding the CA, create template files containing +the information for each client, and use it to issue client +certificates. + +:: + + # cat > client-hostNNN.info <<EOF + country = GB + state = London + locality = City Of London + organization = Name of your organization + cn = hostNNN.foo.example.com + tls_www_client + encryption_key + signing_key + EOF + # certtool --generate-privkey > client-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey client-hostNNN-key.pem \ + --template client-hostNNN.info \ + --outfile client-hostNNN-cert.pem + +The subject alt name extension data is not required for clients, so +the ``dns_name`` and ``ip_address`` fields are not included. The +``tls_www_client`` keyword is the key purpose extension to indicate this +certificate is intended for usage in a web client. Although QEMU network +clients are not in fact HTTP clients, setting this key purpose is still +recommended. The ``encryption_key`` and ``signing_key`` keyword is the +key usage extension to indicate this certificate is intended for usage +in the data session. + +The ``client-hostNNN-key.pem`` and ``client-hostNNN-cert.pem`` files +should now be securely copied to the client for which they were +generated, and renamed to ``client-key.pem`` and ``client-cert.pem`` +when added to the ``/etc/pki/qemu`` directory on the target host. The +``client-key.pem`` file is security sensitive and should be kept +protected with file mode 0600 to prevent disclosure. + +If a single host is going to be using TLS in both a client and server +role, it is possible to create a single certificate to cover both roles. +This would be quite common for the migration and NBD services, where a +QEMU process will be started by accepting a TLS protected incoming +migration, and later itself be migrated out to another host. To generate +a single certificate, simply include the template data from both the +client and server instructions in one. + +:: + + # cat > both-hostNNN.info <<EOF + country = GB + state = London + locality = City Of London + organization = Name of your organization + cn = hostNNN.foo.example.com + dns_name = hostNNN + dns_name = hostNNN.foo.example.com + ip_address = 10.0.1.87 + ip_address = 192.8.0.92 + ip_address = 2620:0:cafe::87 + ip_address = 2001:24::92 + tls_www_server + tls_www_client + encryption_key + signing_key + EOF + # certtool --generate-privkey > both-hostNNN-key.pem + # certtool --generate-certificate \ + --load-ca-certificate ca-cert.pem \ + --load-ca-privkey ca-key.pem \ + --load-privkey both-hostNNN-key.pem \ + --template both-hostNNN.info \ + --outfile both-hostNNN-cert.pem + +When copying the PEM files to the target host, save them twice, once as +``server-cert.pem`` and ``server-key.pem``, and again as +``client-cert.pem`` and ``client-key.pem``. + +.. _tls_005fcreds_005fsetup: + +TLS x509 credential configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU has a standard mechanism for loading x509 credentials that will be +used for network services and clients. It requires specifying the +``tls-creds-x509`` class name to the ``--object`` command line argument +for the system emulators. Each set of credentials loaded should be given +a unique string identifier via the ``id`` parameter. A single set of TLS +credentials can be used for multiple network backends, so VNC, +migration, NBD, character devices can all share the same credentials. +Note, however, that credentials for use in a client endpoint must be +loaded separately from those used in a server endpoint. + +When specifying the object, the ``dir`` parameters specifies which +directory contains the credential files. This directory is expected to +contain files with the names mentioned previously, ``ca-cert.pem``, +``server-key.pem``, ``server-cert.pem``, ``client-key.pem`` and +``client-cert.pem`` as appropriate. It is also possible to include a set +of pre-generated Diffie-Hellman (DH) parameters in a file +``dh-params.pem``, which can be created using the +``certtool --generate-dh-params`` command. If omitted, QEMU will +dynamically generate DH parameters when loading the credentials. + +The ``endpoint`` parameter indicates whether the credentials will be +used for a network client or server, and determines which PEM files are +loaded. + +The ``verify`` parameter determines whether x509 certificate validation +should be performed. This defaults to enabled, meaning clients will +always validate the server hostname against the certificate subject alt +name fields and/or CN field. It also means that servers will request +that clients provide a certificate and validate them. Verification +should never be turned off for client endpoints, however, it may be +turned off for server endpoints if an alternative mechanism is used to +authenticate clients. For example, the VNC server can use SASL to +authenticate clients instead. + +To load server credentials with client certificate validation enabled + +.. parsed-literal:: + + |qemu_system| -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server + +while to load client credentials use + +.. parsed-literal:: + + |qemu_system| -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=client + +Network services which support TLS will all have a ``tls-creds`` +parameter which expects the ID of the TLS credentials object. For +example with VNC: + +.. parsed-literal:: + + |qemu_system| -vnc 0.0.0.0:0,tls-creds=tls0 + +.. _tls_005fpsk: + +TLS Pre-Shared Keys (PSK) +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Instead of using certificates, you may also use TLS Pre-Shared Keys +(TLS-PSK). This can be simpler to set up than certificates but is less +scalable. + +Use the GnuTLS ``psktool`` program to generate a ``keys.psk`` file +containing one or more usernames and random keys:: + + mkdir -m 0700 /tmp/keys + psktool -u rich -p /tmp/keys/keys.psk + +TLS-enabled servers such as ``qemu-nbd`` can use this directory like so:: + + qemu-nbd \ + -t -x / \ + --object tls-creds-psk,id=tls0,endpoint=server,dir=/tmp/keys \ + --tls-creds tls0 \ + image.qcow2 + +When connecting from a qemu-based client you must specify the directory +containing ``keys.psk`` and an optional username (defaults to "qemu"):: + + qemu-img info \ + --object tls-creds-psk,id=tls0,dir=/tmp/keys,username=rich,endpoint=client \ + --image-opts \ + file.driver=nbd,file.host=localhost,file.port=10809,file.tls-creds=tls0,file.export=/ diff --git a/docs/system/virtio-net-failover.rst b/docs/system/virtio-net-failover.rst new file mode 100644 index 00000000..6002dc5d --- /dev/null +++ b/docs/system/virtio-net-failover.rst @@ -0,0 +1,68 @@ +====================================== +QEMU virtio-net standby (net_failover) +====================================== + +This document explains the setup and usage of virtio-net standby feature which +is used to create a net_failover pair of devices. + +The general idea is that we have a pair of devices, a (vfio-)pci and a +virtio-net device. Before migration the vfio device is unplugged and data flows +through the virtio-net device, on the target side another vfio-pci device is +plugged in to take over the data-path. In the guest the net_failover kernel +module will pair net devices with the same MAC address. + +The two devices are called primary and standby device. The fast hardware based +networking device is called the primary device and the virtio-net device is the +standby device. + +Restrictions +------------ + +Currently only PCIe devices are allowed as primary devices, this restriction +can be lifted in the future with enhanced QEMU support. Also, only networking +devices are allowed as primary device. The user needs to ensure that primary +and standby devices are not plugged into the same PCIe slot. + +Usecase +------- + + Virtio-net standby allows easy migration while using a passed-through fast + networking device by falling back to a virtio-net device for the duration of + the migration. It is like a simple version of a bond, the difference is that it + requires no configuration in the guest. When a guest is live-migrated to + another host QEMU will unplug the primary device via the PCIe based hotplug + handler and traffic will go through the virtio-net device. On the target + system the primary device will be automatically plugged back and the + net_failover module registers it again as the primary device. + +Usage +----- + + The primary device can be hotplugged or be part of the startup configuration + + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \ + bus=root2,failover=on + + With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be enabled. + + -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1 + + failover_pair_id references the id of the virtio-net standby device. This + is only for pairing the devices within QEMU. The guest kernel module + net_failover will match devices with identical MAC addresses. + +Hotplug +------- + + Both primary and standby device can be hotplugged via the QEMU monitor. Note + that if the virtio-net device is plugged first a warning will be issued that it + couldn't find the primary device. + +Migration +--------- + + A new migration state wait-unplug was added for this feature. If failover primary + devices are present in the configuration, migration will go into this state. + It will wait until the device unplug is completed in the guest and then move into + active state. On the target system the primary devices will be automatically hotplugged + when the feature bit was negotiated for the virtio-net standby device. diff --git a/docs/system/vnc-security.rst b/docs/system/vnc-security.rst new file mode 100644 index 00000000..4c1769ee --- /dev/null +++ b/docs/system/vnc-security.rst @@ -0,0 +1,203 @@ +.. _VNC security: + +VNC security +------------ + +The VNC server capability provides access to the graphical console of +the guest VM across the network. This has a number of security +considerations depending on the deployment scenarios. + +.. _vnc_005fsec_005fnone: + +Without passwords +~~~~~~~~~~~~~~~~~ + +The simplest VNC server setup does not include any form of +authentication. For this setup it is recommended to restrict it to +listen on a UNIX domain socket only. For example + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc unix:/home/joebloggs/.qemu-myvm-vnc + +This ensures that only users on local box with read/write access to that +path can access the VNC server. To securely access the VNC server from a +remote machine, a combination of netcat+ssh can be used to provide a +secure tunnel. + +.. _vnc_005fsec_005fpassword: + +With passwords +~~~~~~~~~~~~~~ + +The VNC protocol has limited support for password based authentication. +Since the protocol limits passwords to 8 characters it should not be +considered to provide high security. The password can be fairly easily +brute-forced by a client making repeat connections. For this reason, a +VNC server using password authentication should be restricted to only +listen on the loopback interface or UNIX domain sockets. Password +authentication is not supported when operating in FIPS 140-2 compliance +mode as it requires the use of the DES cipher. Password authentication +is requested with the ``password`` option, and then once QEMU is running +the password is set with the monitor. Until the monitor is used to set +the password all clients will be rejected. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc :1,password=on -monitor stdio + (qemu) change vnc password + Password: ******** + (qemu) + +.. _vnc_005fsec_005fcertificate: + +With x509 certificates +~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU VNC server also implements the VeNCrypt extension allowing use +of TLS for encryption of the session, and x509 certificates for +authentication. The use of x509 certificates is strongly recommended, +because TLS on its own is susceptible to man-in-the-middle attacks. +Basic x509 certificate support provides a secure session, but no +authentication. This allows any client to connect, and provides an +encrypted session. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=off \ + -vnc :1,tls-creds=tls0 -monitor stdio + +In the above example ``/etc/pki/qemu`` should contain at least three +files, ``ca-cert.pem``, ``server-cert.pem`` and ``server-key.pem``. +Unprivileged users will want to use a private directory, for example +``$HOME/.pki/qemu``. NB the ``server-key.pem`` file should be protected +with file mode 0600 to only be readable by the user owning it. + +.. _vnc_005fsec_005fcertificate_005fverify: + +With x509 certificates and client verification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Certificates can also provide a means to authenticate the client +connecting. The server will request that the client provide a +certificate, which it will then validate against the CA certificate. +This is a good choice if deploying in an environment with a private +internal certificate authority. It uses the same syntax as previously, +but with ``verify-peer`` set to ``on`` instead. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0 -monitor stdio + +.. _vnc_005fsec_005fcertificate_005fpw: + +With x509 certificates, client verification and passwords +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Finally, the previous method can be combined with VNC password +authentication to provide two layers of authentication for clients. + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0,password=on -monitor stdio + (qemu) change vnc password + Password: ******** + (qemu) + +.. _vnc_005fsec_005fsasl: + +With SASL authentication +~~~~~~~~~~~~~~~~~~~~~~~~ + +The SASL authentication method is a VNC extension, that provides an +easily extendable, pluggable authentication method. This allows for +integration with a wide range of authentication mechanisms, such as PAM, +GSSAPI/Kerberos, LDAP, SQL databases, one-time keys and more. The +strength of the authentication depends on the exact mechanism +configured. If the chosen mechanism also provides a SSF layer, then it +will encrypt the datastream as well. + +Refer to the later docs on how to choose the exact SASL mechanism used +for authentication, but assuming use of one supporting SSF, then QEMU +can be launched with: + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] -vnc :1,sasl=on -monitor stdio + +.. _vnc_005fsec_005fcertificate_005fsasl: + +With x509 certificates and SASL authentication +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the desired SASL authentication mechanism does not supported SSF +layers, then it is strongly advised to run it in combination with TLS +and x509 certificates. This provides securely encrypted data stream, +avoiding risk of compromising of the security credentials. This can be +enabled, by combining the 'sasl' option with the aforementioned TLS + +x509 options: + +.. parsed-literal:: + + |qemu_system| [...OPTIONS...] \ + -object tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \ + -vnc :1,tls-creds=tls0,sasl=on -monitor stdio + +.. _vnc_005fsetup_005fsasl: + +Configuring SASL mechanisms +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following documentation assumes use of the Cyrus SASL implementation +on a Linux host, but the principles should apply to any other SASL +implementation or host. When SASL is enabled, the mechanism +configuration will be loaded from system default SASL service config +/etc/sasl2/qemu.conf. If running QEMU as an unprivileged user, an +environment variable SASL_CONF_PATH can be used to make it search +alternate locations for the service config file. + +If the TLS option is enabled for VNC, then it will provide session +encryption, otherwise the SASL mechanism will have to provide +encryption. In the latter case the list of possible plugins that can be +used is drastically reduced. In fact only the GSSAPI SASL mechanism +provides an acceptable level of security by modern standards. Previous +versions of QEMU referred to the DIGEST-MD5 mechanism, however, it has +multiple serious flaws described in detail in RFC 6331 and thus should +never be used any more. The SCRAM-SHA-256 mechanism provides a simple +username/password auth facility similar to DIGEST-MD5, but does not +support session encryption, so can only be used in combination with TLS. + +When not using TLS the recommended configuration is + +:: + + mech_list: gssapi + keytab: /etc/qemu/krb5.tab + +This says to use the 'GSSAPI' mechanism with the Kerberos v5 protocol, +with the server principal stored in /etc/qemu/krb5.tab. For this to work +the administrator of your KDC must generate a Kerberos principal for the +server, with a name of 'qemu/somehost.example.com@EXAMPLE.COM' replacing +'somehost.example.com' with the fully qualified host name of the machine +running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm. + +When using TLS, if username+password authentication is desired, then a +reasonable configuration is + +:: + + mech_list: scram-sha-256 + sasldb_path: /etc/qemu/passwd.db + +The ``saslpasswd2`` program can be used to populate the ``passwd.db`` +file with accounts. Note that the ``passwd.db`` file stores passwords +in clear text. + +Other SASL configurations will be left as an exercise for the reader. +Note that all mechanisms, except GSSAPI, should be combined with use of +TLS to ensure a secure data channel. |