Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.
Do the save/resotre MSR_IA32_CSTAR during context switch.
Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
This reverts commit ce9d5e8779.
Before readiness of better solution to fix SOS VM e820 and efi memmap mismatch
issue, revert this patch.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.
This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
Add free_page to free page when unmap pagetable.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.
However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.
This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable hypervisor(host) MONITOR-WAIT support.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Below boolean function are defined in this patch:
- is_software_sram_enabled() to check if SW SRAM
feature is enabled or not.
- set global variable 'is_sw_sram_initialized'
to file static.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Accessing to software SRAM region is not allowed when
software SRAM is pass-thru to prelaunch RTVM.
This patch removes software SRAM region from service VM
EPT if it is enabled for prelaunch RTVM.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Provide EFI support for SOS could cause weird issues. For example, hypervisor
works based on E820 table whereras it's possible that the memory map from EFI
table is not aligned with E820 table. The SOS kernel kaslr will try to find the
random address for extracted kernel image in EFI table first. So it's possible
that none-RAM in E820 is picked for extracted kernel image. This will make
kernel boot fail.
This patch removes EFI support for SOS by not passing struct boot_efi_info to
SOS kernel zeropage, and reserve a memory to store RSDP table for SOS and pass
the RSDP address to SOS kernel zeropage for SOS to locate ACPI table.
The patch requires SOS kernel version be high than 4.20, otherwise the kernel
might fail to find the RSDP.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch denies Service VM the access permission to device resources
owned by hypervisor.
HV may own these devices: (1) debug uart pci device for debug version
(2) type 1 pci device if have pre-launched VMs.
Current implementation exposes the mmio/pio resource of HV owned devices
to SOS, should remove them from SOS.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
This patch denies Service VM the access permission to device
resources owned by pre-launched VMs.
Rationale:
* Pre-launched VMs in ACRN are independent of service VM,
and should be immune to attacks from service VM. However,
current implementation exposes the bar resource of passthru
devices to service VM for some reason. This makes it possible
for service VM to crash or attack pre-launched VMs.
* It is same for hypervisor owned devices.
NOTE:
* The MMIO spaces pre-allocated to VFs are still presented to
Service VM. The SR-IOV capable devices assigned to pre-launched
VMs doesn't have the SR-IOV capability. So the MMIO address spaces
pre-allocated by BIOS for VFs are not decoded by hardware and
couldn't be enabled by guest. SOS may live with seeing the address
space or not. We will revisit later.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.
IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.
Linux keylocker implementation uses this feature, so they are
introduced by this patch.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.
This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
1) keylocker feature enabled
2) Different from the current loaded one.
Two opportunities to do the load_iwkey():
1) Guest enables CR4.KL bit.
2) vCPU thread context switch.
load_iwkey() costs ~600 cycles when do the load IWKey action.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.
It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.
Keylocker usage:
- New “ENCODEKEY” instructions take original key input and returns HANDLE
crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
- Software can then delete the original key from memory
- Early in boot/software, less likely to have vulnerability that allows
stealing original key
- Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
instructions
- Note:
* Software can use original key without knowing it (use HANDLE)
* HANDLE cannot be used on other systems or after warm/cold reset
* IWKey cannot be read from CPU after it's loaded (this is the
nature of this feature) and only 1 copy of IWKey inside CPU.
The virtualization implementation of Key Locker on ACRN is:
- Each vCPU has a 'struct iwkey' to store its IWKey in struct
acrn_vcpu_arch.
- At initilization, every vCPU is created with a random IWKey.
- Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
VM-exectuion control) of vCPU to capture and save the IWKey if guest
set a new IWKey. Don't support randomization (emulate CPUID to
disable) of the LOADIWKEY as hypervisor cannot capture and save the
random IWKey. From keylocker spec:
"Note that a VMM may wish to enumerate no support for HW random IWKeys
to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
cannot be easily context switched. A guest ENCODEKEY will return the
type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
virtualized a HW random IWKey with a SW specified IWKey."
- In context_switch_in() of each vCPU, hypervisor loads that vCPU's
IWKey into pCPU by LOADIWKEY instruction.
- There is an assumption that ACRN hypervisor will never use the
KeyLocker feature itself.
This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.
This patch enables the tertiary VM-execution controls.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.
This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.
To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
- set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
- load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
- clear cr4_kl_enabled
This patch trap and passthru the CR4_KL bit to guest for operation.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.
This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.
Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Currently, we hardcode the GPA base of Software SRAM
to an address that is derived from TGL platform,
as this GPA is identical with HPA for Pre-launch VM,
This hardcoded address may not work on other platforms
if the HPA bases of Software SRAM are different.
Now, Offline tool configures above GPA based on the
detection of Software SRAM on specific platform.
This patch removes the hardcoding GPA of Software SRAM,
and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
is for Prelaunch VM only.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Physical address to SW SRAM region maybe different
on different platforms, this hardcoded address may
result in address mismatch for SW SRAM operations.
This patch removes above hardcoded address and uses
the physical address parsed from native RTCT.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'ptcm' and 'ptct' are legacy name according
to the latest TCC spec, hence rename below files
to avoid confusing:
ptcm.c -> rtcm.c
ptcm.h -> rtcm.h
ptct.h -> rtct.h
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.
Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move multiboot specific declarations from boot.h to multiboot.h.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The commit 'Fix: HV: VM OS failed to assign new address to pci-vuart
BARs' need more reshuffle.
Tracked-On: #5491
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
This patch move the split-lock logic into dedicated file
to reduce LOC. This may make the logic more clear.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
This patch adds a cache register for VMX_PROC_VM_EXEC_CONTROLS
to avoid the frequent VMCS access.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
When wrong BAR address is set for pci-vuart, OS may assign a
new BAR address to it. Pci-vuart BAR can't be reprogrammed,
for its wrong fixed value. That can may because pci_vbar.fixed and
pci_vbar.type has overlap in abstraction, pci_vbar.fixed
has a confusing name, pci_vbar.type has PCIBAR_MEM64HI which is not
really a type of pci BARs.
So replace pci_vbar.type with pci_vbar.is_mem64hi, and change
pci_vbar.fixed to an union type with new name pci_vbar.bar_type.
Tracked-On: #5491
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
xchg may also cause the #AC for split-lock check.
This patch adds this emulation.
1. Kick other vcpus of the guest to stop execution
if the guest has more than one vcpu.
2. Emulate the xchg instruction.
3. Notify other vcpus (if any) to restart execution.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds the split-lock emulation.
If a #AC is caused by instruction with LOCK prefix then
emulate it, otherwise, inject it back as it used to be.
1. Kick other vcpus of the guest to stop execution
and set the TF flag to have #DB if the guest has more
than one vcpu.
2. Skip over the LOCK prefix and resume the current
vcpu back to guest for execution.
3. Notify other vcpus to restart exception at the end
of handling the #DB since we have completed
the LOCK prefix instruction emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Check hardware support for all features in CR4,
and hide bits from guest by vcpuid if they're not supported
for guests OS.
Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The current code to virtualize CR0/CR4 is not
well designed, and hard to read.
This patch reshuffle the logic to make it clear
and classify those bits into PASSTHRU,
TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.
Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
It is possible for more than one vCPUs to trigger shutdown on an RTVM.
We need to avoid entering VM_READY_TO_POWEROFF state again after the
RTVM has been paused or shut down.
Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can
be paused.
v1 -> v2:
- rename to poweroff_if_rt_vm for better clarity
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Currently, ACRN only support shutdown when triple fault happens, because ACRN
doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This
patch emulate a virtual shutdown component, and the vACPI method for guest OS
to use.
Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status
registers for pre-launched VMs shutdown
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.
Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.
v1 -> v2:
- use vm_lock to avoid a race on shutdown
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename hv_access_memory_region_update to ppt_clear_user_bit to
verb + object style.
Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Temporarily remove NX bit of PTCM binary in pagetable during pSRAM
initialization:
1.added a function ppt_set_nx_bit to temporarily remove/restore the NX bit of
a given area in pagetable.
2.Temporarily remove NX bit of PTCM binary during pSRAM initialization to make
PTCM codes executable.
3. TODO: We may use SMP call to flush TLB and do pSRAM initilization on APs.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The added parse_ptct function will parse native ACPI PTCT table to
acquire information like pSRAM location/size/level and PTCM location,
and save them.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.We added a function init_psram to initialize pSRAM as well as some definitions.
Both AP and BSP shall call init_psram to make sure pSRAM is initialized, which is
required by PTCM.
BSP:
To parse PTCT and find the entry of PTCM command function, then call PTCM ABI.
AP:
Wait until BSP has done the parsing work, then call the PTCM ABI.
Synchronization of AP and BSP is ensured, both inside and outside PTCM.
2. Added calls of init_psram in init_pcpu_post to initialize pSRAM in HV booting phase
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Add cteate method for vmcs9900 vdev in hypercalls.
The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
if vuart type is pci-vuart, then use MSI interrupt
split vuart_toggle_intr() control flow into vuart_trigger_level_intr() &
trigger_vmcs9900_msix(), because MSI is edge triggered, no deassertion
operation. Only trigger MSI for pci-vuart when assert interrupt.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
support pci-vuart type, and refine:
1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type.
2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type.
3.Move io handler code out of setup_vuart(), into init_legacy_vuarts()
4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev.
and some change from requirement:
1.Increase MAX_VUART_NUM_PER_VM to 8.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The vuart_read()/vuart_write() are coupled with PIO vuart type. Move
the non-type related code into vuart_read_reg()/vuart_write_reg(), so
that we can re-use them to handle MMIO request of pci-vuart type.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- Refactor pci_dev_c.py to insert devices information per VMs
- Add function to get unused vbdf form bus:dev.func 00:00.0 to 00:1F.7
Add pci devices variables to vm_configurations.c
- To pass the pci vuart information form tool, add pci_dev_num and
pci_devs initialization by tool
- Change CONFIG_SOS_VM in hypervisor/include/arch/x86/vm_config.h to
compromise vm_configurations.c
Tracked-On: #5426
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
The new (1.8.17) release of doxygen is complaining about errors in the
doxygen comments that were's reported by our current 1.8.13 release.
Let's fix these now. In a separate PR we'll also update some
configuration settings that will be obsolete, in preparation for moving
to this newer version.
[External_System_ID]ACRN-6774
Tracked-On: #5385
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
- Since de-privilege boot is removed, we no longer need to save boot
context in boot time.
- cpu_primary_start_64 is not an entry for ACRN hypervisor any more,
and can be removed.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>