When hypervisor boots, the multiboot modules have been loaded to host space
by bootloader already. The space range of pre-launched VM modules is also
exposed to SOS VM, so SOS VM kernel might pick this range to extract kernel
when KASLR enabled. This would corrupt pre-launched VM modules and result in
pre-launched VM boot fail.
This patch will try to fix this issue. The SOS VM will not be loaded to guest
space until all pre-launched VMs are loaded successfully.
Tracked-On: #5879
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
We should not hardcode the VM ramdisk load address right after kernel
load address because of two reasons:
1. Per Linux kernel boot protocol, the Kernel need a size of
contiguous memory(i.e. init_size field in zeropage) from
its load address to boot, then the address would overlap
with ramdisk;
2. The hardcoded address could not be ensured as a valid address
in guest e820 table, especially with a huge ramdisk;
Also we should not hardcode the VM kernel load address to its pref_address
which work for non-relocatable kernel only. For a relocatable kernel,
it could run from any valid address where bootloader load to.
The patch will set the VM kernel and ramdisk load address by scanning
guest e820 table with find_space_from_ve820() api:
1. For SOS VM, the ramdisk has been loaded by multiboot bootloader
already so set the load address as module source address,
the relocatable kernel would be relocated to a appropriate address
out space of hypervisor and boot modules to avoid guest memory
copy corruption;
2. For pre-launched VM, the kernel would be loaded to pref_address
first, then ramdisk will be put to a appropriate address out space
of kernel according to guest memory layout and maximum ramdisk
address limit under 4GB;
Tracked-On: #5879
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The SOS VM should not use host efi memmap directly, since there are some
memory ranges which reserved by hypersior and pre-launched VM should not
be exposed to SOS VM. These memory ranges should be filtered from SOS VM
efi memmap, otherwise it would caused unexpected issues. For example, The
SOS kernel kaslr will try to find the random address for extracted kernel
image in EFI table first. So it's possible that these reserved memory is
picked for extracted kernel image. This will make SOS kernel boot fail.
The patch would create efi memmory map for SOS VM and pass the memory map
info to zeropage for loading SOS VM kernel. The boot service related region
in host efi memmap is also kept for SOS VM so that SOS VM could have full
capability of EFI services as host.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The bootargs module represents a string buffer and there is a NULL char at
the end so its size should not be calculated by strnlen_s(), otherwise the
NULL char will be ignored in gpa copy and result in kernel boot fail;
Tracked-On: #6162
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Previously the load GPA of LaaG boot params like zeropage/cmdline and
initgdt are all hard-coded, this would bring potential LaaG boot issues.
The patch will try to fix this issue by finding a 32KB load_params memory
block for LaaG to store these guest boot params.
For other guest with raw image, in general only vgdt need to be cared of so
the load_params will be put at 0x800 since it is a common place that most
guests won't touch for entering protected mode.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The API would search ve820 table and return a valid GPA when the requested
size of memory is available in the specified memory range, or return
INVALID_GPA if the requested memory slot is not available;
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The memory range of [0xA0000, 0xFFFFF] is a known reserved area for BIOS,
actually Linux kernel would enforce this area to be reserved during its
boot stage. Set this area to usable would cause potential compatibility
issues.
The patch set the range to reserved type to make it consistent with the
real world.
BTW, There should be a EBDA(Entended BIOS DATA Area) with reserved type
exist right before 0xA0000 in real world for non-EFI boot. But given ACRN
has no legacy BIOS emulation, we simply skipped the EBDA in vE820.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Hypervisor use e820_alloc_memory() api to allocate memory for trampoline code
and ept pages, whereas the usable ram in hv_e820 might include efi boot service
region if system boot from uefi environment, this would result in some uefi
service broken in SOS. These boot service region should be filtered from
hv_e820.
This patch will parse the efi memory descriptor entries info from efi memory
map pointer when system boot from uefi environment, and then initialize hv_e820
accordingly, that all efi boot service region would be kept as reserved in
hv_e820.
Please note the original efi memory map could be above 4GB address space,
so the efi memory parsing process must be done after enable_paging().
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
When hypervisor boot from efi environment, the efi memory layout should be
considered as main memory map reference for hypervisor use. This patch add
function that parses the efi memory descriptor entries info from efi memory
map pointer and stores the info into a static hv_memdesc[] array.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
With this patch, the hv_e820 will be initialized after enable paging. This
is because the hv_e820 will be initialized from efi mmap when system boot
from uefi, which the efi mmap could be above 4G space.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The simply rename mi_acpi_rsdp_va in acrn_boot_info struct to acpi_rsdp_va;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch has below changes:
1. rename mi_efi_info to uefi_info in struct acrn_boot_info;
2. remove redundant "efi_" prefix for efi_info struct members;
3. The efi_info structure in acrn_boot_info struct is defined as
same as Linux kernel so the native efi info from boot loader
is passed to SOS zeropage with memcpy() api directly. Now replace
memcpy() with detailed struct member assignment;
4. add boot_from_uefi() api;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Use more generic abi_mmap struct to replace multiboot_mmap struct in
acrn_boot_info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Use more generic abi_module struct to replace multiboot_module struct in
acrn_boot_info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The patch has below changes:
1. rename mi_loader_name in acrn_boot_info struct to loader_name;
2. change loader_name type from pointer to array to avoid accessing
original multiboot info region;
3. remove mi_drivers_length and mi_drivers_addr which are never used;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The name of mi_cmdline in acrn_boot_info structure would cause confusion with
mi_cmdline in multiboot_info structure, rename it to cmdline. At the same time,
the data type is changed from pointer to array to avoid accessing the original
multiboot info region which might be used by other software modules.
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add a wrapper API init_acrn_boot_info() so that it could be used to boot
ACRN with any boot protocol;
Another change is change term of multiboot1 to multiboot because there is
no such term officially;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Given the structure in multiboot.h could be used for any boot protocol,
use a more generic name "boot.h" instead;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The mi_flags is not needed any more so remove it from acrn_boot_info struct;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The acrn_multiboot_info structure stores acrn specific boot info and should
not be limited to support multiboot protocol related structure only.
This patch only do below changes:
1. change name of acrn_multiboot_info to acrn_boot_info;
2. change name of mbi to abi because of the change in 1, also the
naming might bring confusion with native multiboot info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN used to support deprivileged boot mode which do not need multiboot
modules, while direct boot mode need multiboot modules at lease for
service VM bzImage, so ACRN postponed the multiboot modules sanity check
in init_vm_boot_info.
Now deprivileged boot mode was totally removed, so we can do multiboot
module check in sanitize_acrn_multiboot_info().
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Replace rdstc() and get_tsc_khz() with their architectural agnostic
counterparts cpu_ticks() and cpu_tickrate().
Tracked-On: #5920
Signed-off-by: Yi Liang <yi.liang@intel.com>
e820_alloc_memory() splits one E820 entry into two entries. With vEPT
enabled, e820_alloc_memory() is called one more. On some platforms, the
e820 entries might exceed 32.
Enlarge E820_MAX_ENTRIES to 64. Please note, it must be less than 128
due to constrain of zeropage. Linux kernel defines it as 128.
Tracked-On: #6168
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vmptrld_vmexit_handler() has a same code snippet with
vmclear_vmexit_handler(). Wrap the same code snippet as a static
function clear_vmcs02().
There is only a small logic change that add
nested->current_vmcs12_ptr = INVALID_GPA
in vmptrld_vmexit_handler() for the old VMCS. That's reasonable.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
get_ept_entry() actually returns the EPTP of a VM. So rename it to
get_eptp() for readability.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We need to deny accesses from SOS to the HV owned UART device, otherwise
SOS could have direct access to this physical device and mess up the HV
console.
If ACRN debug UART is configured as PIO based, For example,
CONFIG_SERIAL_PIO_BASE is generated from acrn-config tool, or the UART
config is overwritten by hypervisor parameter "uart=port@<port address>",
it could run into problem if ACRN doesn't emulate this UART PIO port
to SOS. For example:
- none of the ACRN emulated vUART devices has same PIO port with the
port of the debug UART device.
- ACRN emulates PCI vUART for SOS (configure "console_vuart" with
PCI_VUART in the scenario configuration)
This patch fixes the above issue by masking PIO accesses from SOS.
deny_hv_owned_devices() is moved after setup_io_bitmap() where
vm->arch_vm.io_bitmap is initialized.
Commit 50d852561 ("HV: deny HV owned PCI bar access from SOS") handles
the case that ACRN debug UART is configured as a PCI device. e.g.,
hypervisor parameter "uart=bdf@<BDF value>" is appended.
If the hypervisor debug UART is MMIO based, need to configured it as
a PCI type device, so that it can be hidden from SOS.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Malicious input 'index' may trigger buffer
overflow on array 'irte_alloc_bitmap[]'.
This patch validate that 'index' shall be
less than 'CONFIG_MAX_IR_ENTRIES' and also
remove unnecessary check on 'index' in
'ptirq_free_irte()' function with this fix.
Tracked-On: #6132
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
vlapic_write handle 'offset' that is valid and ignore
all other invalid 'offset'. so ASSERT on this 'offset'
input is unnecessary.
This patch removes above ASSERT to avoid potential
hypervisor crash by guest malicious input when debug
build is used.
Tracked-On: #6131
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
For a pci BAR, its size aligned bits have fixed to 0(except the memory
type bits, they have another fixed value), they are read-only.
When write ~0U to BAR for sizing, (type_bits | size_mask) is written
into BAR.
So do not need to distinguish between sizing vBAR and programming vBAR.
When write a value to vBAR, always store (value & size_mask | type_bit)
to vfcg.
pci_vdev_read_vbar() is unnecessary, because it is only need to read
vcfg.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
When guest doing BAR re-programming, we should check whether
the base address of the BAR is valid.This patch does this check by:
1. whether the gpa is located in the responding MMIO window
2. whether the gpa is aligned with the BAR size
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Now we use pci_vdev_update_vbar_base to update vBAR base address when
guest re-programming BAR. For a IO BAR, we would calculate the 32 bits
base address then mask the high 16 bits. However, the mask code would
never be called since the first if condition statement is always true.
This patch fix it by move the unamsk code into the first if condition
statement.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
generate_shadow_ept_entry() didn't verify the correctness of the requested
guest EPT mapping. That might leak host memory access to L2 VM.
To simplify the implementation of the guest EPT audit, hide capabilities
'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition,
minimize the attribute bits of EPT entry when create a shadow EPT entry.
Also, for invalid requested mapping address, reflect the EPT_VIOLATION to
L1 VM.
Here, we have some TODOs:
1) Enable large page support in generate_shadow_ept_entry()
2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly.
3) Minimize EPT entry attributes.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
L1 VM changes the guest EPT and do INVEPT to invalidate the previous
TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction
to do the update.
The target shadow EPTs can be found according to the 'type' of INVEPT.
Here are two types and their target shadow EPT,
1) Single-context invalidation
Get the EPTP from the INVEPT descriptor. Then find the target
shadow EPT.
2) Global invalidation
All shadow EPTs of the L1 VM.
The INVEPT emulation handler invalidate all the EPT entries of the
target shadow EPTs.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When a shadow EPT is not used anymore, its resources need to be
released.
free_sept_table() is introduced to walk the whole shadow EPT table and
free the pagetable pages.
Please note, the PML4E page of shadow EPT is not freed by
free_sept_table() as it still be used to present a shadow EPT pointer.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With shadow EPT, the hypervisor walks through guest EPT table:
* If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
to L1 VM and resumes to L1 VM.
* If the entry is present in guest EPT, do the EPT_MISCONFIG check.
Inject EPT_MISCONFIG to L1 VM if the check failed.
* If the entry is present in guest EPT, do permission check.
Reflect EPT_VIOLATION to L1 VM if the check failed.
* If the entry is present in guest EPT but shadow EPT entry is not
present, create the shadow entry and resumes to L2 VM.
* If the entry is present in guest EPT but the GPA in the entry is
invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
* Hide 5 level EPT capability, let L1 guest stick to 4 level EPT.
* Access/Dirty bits are not support currently, hide corresponding EPT
capability bits.
* "Mode-based execute control for EPT" is also not support well
currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'struct nept_desc' is used to associate guest EPTP with a shadow EPTP.
It's created in the first reference and be freed while no reference.
The life cycle seems like,
While guest VMCS VMX_EPT_POINTER_FULL is changed, the 'struct nept_desc'
of the new guest EPTP is referenced; the 'struct nept_desc' of the old
guest EPTP is dereferenced.
While guest VMCS be cleared(by VMCLEAR in L1 VM), the 'struct nept_desc'
of the old guest EPTP is dereferenced.
While a new guest VMCS be loaded(by VMPTRLD in L1 VM), the 'struct
nept_desc' of the new guest EPTP is referenced. The 'struct nept_desc'
of the old guest EPTP is dereferenced.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To shadow guest EPT, the hypervisor needs construct a shadow EPT for each
guest EPT. The key to associate a shadow EPT and a guest EPT is the EPTP
(EPT pointer). This patch provides following structure to do the association.
struct nept_desc {
/*
* A shadow EPTP.
* The format is same with 'EPT pointer' in VMCS.
* Its PML4 address field is a HVA of the hypervisor.
*/
uint64_t shadow_eptp;
/*
* An guest EPTP configured by L1 VM.
* The format is same with 'EPT pointer' in VMCS.
* Its PML4 address field is a GPA of the L1 VM.
*/
uint64_t guest_eptp;
uint32_t ref_count;
};
Due to lack of dynamic memory allocation of the hypervisor, a array
nept_bucket of type 'struct nept_desc' is introduced to store those
association information. A guest EPT might be shared between different
L2 vCPUs, so this patch provides several functions to handle the
reference of the structure.
Interface get_shadow_eptp() also is introduced. To find the shadow EPTP
of a specified guest EPTP.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Shadow EPT uses lots of pages to construct the shadow page table. To
utilize the memory more efficient, a page poll sept_page_pool is
introduced.
For simplicity, total platform RAM size is considered to calculate the
memory needed for shadow page tables. This is not an accurate upper
bound. This can satisfy typical use-cases where there is not a lot
of overcommitment and sharing of memory between L2 VMs.
Memory of the pool is marked as reserved from E820 table in early stage.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Nested VM exits happen when vCPU is in guest mode (VMCS02 is current).
Initially we reflect all nested VM exits to L1 hypervisor. To prepare
the environment to run L1 guest:
- restore some VMCS fields to the value as what L1 hypervisor programmed.
- VMCLEAR VMCS02, VMPTRLD VMCS01 and enable VMCS shadowing.
- load the non-shadowing host states from VMCS12 to VMCS01 guest states.
- VMRESUME to L1 guest with this modified VMCS01.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Since L2 guest vCPU mode and VPID are managed by L1 hypervisor, so we
can skip these handling in run_vcpu().
And be careful that we can't cache L2 registers in struct acrn_vcpu.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
invvpid and invept instructions cause VM exits unconditionally.
For initial support, we pass all the instruction operands as is
to the pCPU.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Implement the VMLAUNCH and VMRESUME instructions, allowing a L1
hypervisor to run nested guests.
- merge VMCS control fields and VMCS guest fields to VMCS02
- clear shadow VMCS indicator on VMCS02 and load VMCS02 as current
- set VMCS12 launch state to "launched" in VMLAUNCH handler
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signature of RTCT ACPI table maybe "PTCT"(v1) or "RTCT"(v2).
and the MAGIC number in CRL header is also changed from "PTCM"
to "RTCM".
This patch refine the code to detect RTCT table for both
v1 and v2.
Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
For post launch VM, ACRN supports PTM under these conditions:
1. HW implements a simple PTM hierarchy: PTM requestor device (ep) is
directly connected to PTM root capable root port. Or
2. ptm requestor itself is root complex integrated ep.
Currently acrn doesn't support emulation of other type of PTM hiearchy, such
as if there is an intermediate PTM node (for example, switch) inbetween
PTM requestor and PTM root.
To avoid VM touching physical hardware, acrn hv ensures PTM is always enabled
in the hardware.
During hv's pci init, if root port is ptm capable,
hv will enable PTM on that root port. In addition,
log error (and don't enable PTM) if ptm root
capability is on intermediate node other than root port.
V2:
- Modify commit messages to clarify the limitation
of current PTM implementation.
- Fix code that may fail FUSA
- Remove pci_ptm_info() and put info log inside pci_enable_ptm_root().
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In physical destination mode, the destination processor is specified by its
local APIC ID. When a CPU switch xAPIC Mode to x2APIC Mode or vice versa,
the local APIC ID is not changed. So a vcpu in x2APIC Mode could use physical
Destination Mode to send an IPI to another vcpu in xAPIC Mode by writing ICR.
This patch adds support for a vCPU A could write ICR to send IPI to another
vCPU B which is in different APIC mode.
Tracked-On: #5923
Signed-off-by: Li Fei1 <fei1.li@intel.com>
So that DM can retrieve physical APIC IDs and use them to fill in the ACPI MADT table for
post-launched VMs.
Note:
1. DM needs to use the same logic as hypervisor to calculate vLAPIC IDs based on physical APIC IDs
and CPU affinity setting
2. Using reserved0[] in struct hc_platform_info to pass physical APIC IDs means we can only support at
most 116 cores. And it assumes LAPIC ID is 8bits (X2APIC mode supports 32 bits).
Cat IDs shift will be used by DM RTCT V2
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Using physical APIC IDs as vLAPIC IDs for pre-Launched and post-launched VMs
is not sufficient to replicate the host CPU and cache topologies in guest VMs,
we also need to passthrough host CPUID leaf.0BH to guest VMs, otherwise,
guest VMs may see weird CPU topology.
Note that in current code, ACRN has already passthroughed host cache CPUID
leaf 04H to guest VMs
Tracked-On: #6020
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
In current code, ACRN uses physical APIC IDs as vLAPIC IDs for SOS,
and vCPU ids (contiguous) as vLAPIC IDs for pre-Launched and post-Launched VMs.
Using vCPU ids as vLAPIC IDs for pre-Launched and post-Launched VMs
would result in wrong CPU and cache topologies showing in the guest VMs,
and could adversely affect performance if the guest VM chooses to detect
CPU and cache topologies and optimize its behavior accordingly.
Uses physical APIC IDs as vLAPIC IDs (and related CPU/cache topology enumeration
CPUIDs passthrough) will replicate the host CPU and cache topologies in pre-Launched
and post-Launched VMs.
Tracked-On: #6020
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Remove the direct calls to exec_vmptrld() or exec_vmclear(), and replace
with the wrapper APIs load_va_vmcs() and clear_va_vmcs().
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
This patch implements the VMREAD and VMWRITE instructions.
When L1 guest is running with an active VMCS12, the “VMCS shadowing”
VM-execution control is always set to 1 in VMCS01. Thus the possible
behavior of VMREAD or VMWRITE from L1 could be:
- It causes a VM exit to L0 if the bit corresponds to the target VMCS
field in the VMREAD bitmap or VMWRITE bitmap is set to 1.
- It accesses the VMCS referenced by VMCS01 link pointer (VMCS02 in
our case) if the above mentioned bit is set to 0.
This patch handles the VMREAD and VMWRITE VM exits in this way:
- on VMWRITE, it writes the desired VMCS value to the respective field
in the cached VMCS12. For VMCS fields that need to be synced to VMCS02,
sets the corresponding dirty flag.
- on VMREAD, it reads the desired VMCS value from the cached VMCS12.
Tracked-On: #5923
Signed-off-by: Alex Merritt <alex.merritt@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
This patch is to emulate VMCLEAR instruction.
L1 hypervisor issues VMCLEAR on a VMCS12 whose state could be any of
these: active and current, active but not current, not yet VMPTRLDed.
To emulate the VMCLEAR instruction, ACRN sets the VMCS12 launch state to
"clear", and if L0 already cached this VMCS12, need to sync it back to
guest memory:
- sync shadow fields from shadow VMCS VMCS to cache VMCS12
- copy cache VMCS12 to L1 guest memory
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Enable VMCS shadowing for most of the VMCS fields, so that execution of
the VMREAD or VMWRITE on these shadow VMCS fields from L1 hypervisor
won't cause VM exits, but read from or write to the shadow VMCS.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Alexander Merritt <alex.merritt@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Software layout of VMCS12 data is a contract between L1 guest and L0
hypervisor to run a L2 guest.
ACRN hypervisor caches the VMCS12 which is passed down from L1 hypervisor
by the VMPTRLD instructin. At the time of VMCLEAR, ACRN syncs the cached
VMCS12 back to L1 guest memory.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
This patch emulates the VMPTRLD instruction. L0 hypervisor (ACRN) caches
the VMCS12 that is passed down from the VMPTRLD instruction, and merges it
with VMCS01 to create VMCS02 to run the nested VM.
- Currently ACRN can't cache multiple VMCS12 on one vCPU, so it needs to
flushes active but not current VMCS12s to L1 guest.
- ACRN creates VMCS02 to run nested VM based on VMCS12:
1) copy VMCS12 from guest memory to the per vCPU cache VMCS12
2) initialize VMCS02 revision ID and host-state area
3) load shadow fields from cache VMCS12 to VMCS02
4) enable VMCS shadowing before L1 Vm entry
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
This patch implements the VMXOFF instruction. By issuing VMXOFF,
L1 guest Leaves VMX Operation.
- cleanup VCPU nested virtualization context states in VMXOFF handler.
- implement check_vmx_permission() to check permission for VMX operation
for VMXOFF and other VMX instructions.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
According to VMXON Instruction Reference, do the following checks in the
virtual hardware environment: vCPU CPL, guest CR0, CR4, revision ID
in VMXON region, etc.
Currently ACRN doesn't support 32-bit L1 hypervisor, and injects an #UD
exception if L1 hypervisor is not running in 64-bit mode.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
This patch emulates VMXON instruction. Basically checks some
prerequisites to enable VMX operation on L1 guest (next patch), and
prepares some virtual hardware environment in L0.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
The commit 2ab70f43e5
HV: cache: Fix page fault by flushing cache for VM trusty RAM in HV
It is wrong in using stac()/clac()
Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Now guest would use `Destination Shorthand` to broadcast IPIs if there're more
than one destination. However, it is not supported when the guest is in LAPIC
passthru situation, and all active VCPUs are working in X2APIC mode. As a result,
the guest would not work properly since this kind broadcast IPIs was ignored
by ACRN. What's worse, ACRN Hypervisor would inject GP to the guest in this case.
This patch extend vlapic_x2apic_pt_icr_access to support more destination modes
(both `Physical` and `Logical`) and destination shorthand (`No Shorthand`, `Self`,
`All Including Self` and `All Excluding Self`).
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The accrss right of HV RAM can be changed to PAGE_USER (eg. trusty RAM
of post-launched VM). So before using clflush(or clflushopt) to flush
HV RAM cache, must allow explicit supervisor-mode data accesses to
user-mode pages. Otherwise, it may trigger page fault.
Tracked-On: #6020
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Hypervisor does not need to care about hugepage settings in SOS kernel, user
could enable these settings in the scenario config file or GRUB menu.
Tracked-On: #5815
Signed-off-by: Victor Sun <victor.sun@intel.com>
changes:
1. The VM load order type condition is not needed, since the function
is called only when create SOS VM or pre-launched VM;
2. Fixed wrong parameter of fill_seed_arg() which introduced by commit
80262f0602.
3. More comments on why multiboot string could override the pre-
configured VM bootargs and why append multiboot cmdline to SOS VM
bootargs;
Tracked-On: #5815
Signed-off-by: Victor Sun <victor.sun@intel.com>
Use BUILD_VERSION an BUILD_TAG variable also for hypervisor,
acrnprobe and crashlog. This eases build from an archive without
git available.
Tracked-On: #6035
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
Make builds reproducible by honoring SOURCE_DATE_EPOCH and USER
environment variables in the respective Makefiles. Just follow the
recommendations at https://reproducible-builds.org/
Build tools (e.g. Debian packaging, Yocto) use this to ensure reproducibility
of packages.
Tracked-On: #6035
Signed-off-by: Helmut Buchsbaum <helmut.buchsbaum@opensource.tttech-industrial.com>
Create virtual root port through add_vdev hypercall. add_vdev
identifies the virtual device to add by its vendor id and device id, then
call the corresponding function to create virtual device.
-create_vrp(): Find the right virtual root port to create
by its secondary bus number, then initialize the virtual root port.
And finally initialize PTM related configurations.
-destroy_vrp(): nothing to destroy
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
Add virtual root port that supports the most basic pci-e bridge and root port operations.
- init_vroot_port(): init vroot_port's basic registers.
- deinit_vroot_port(): reset vroot_port
- read_vroot_port_cfg(): read from vroot_port's virtual config space.
- write_vroot_port_cfg(): write to vroot_port's virtual config space.
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
If PTM can be enabled on passthru device, a virtual root port
is added to vm to act as ptm root. And the passthru device is
connected to the virtual root port instead of the virtual host bridge.
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
1. do not allow external modules to touch internal field of a timer.
2. make timer mode internal, period_in_ticks will decide the mode.
API wise:
1. the "mode" parameter was taken out of initialize_timer().
2. a new function update_timer() was added to update the timeout and
period fields.
3. the timer_expired() function was extended with an output parameter
to return the remaining cycles before expiration.
Also, the "fire_tsc" field name of hv_timer was renamed to "timeout".
With the new API, however, this change should not concern user code.
Tracked-On: #5920
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
x86/timer.[ch] was moved to the common directory largely unchanged.
x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.
Other than these two functions, the timer module is largely arch
agnostic.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Modules that use udelay() should include "delay.h" explicitly.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.
The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
RTCT has been updated to version 2,
this patch updates hypervisor RTCT parser to support
both version 1 and version 2 of RTCT.
Tracked-On: #6020
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Jason CJ Chen <jason.cj.chen@intel.com>
'psram' and 'PSRAM' are legacy names and replaced
with 'ssram' and 'SSRAM' respectively.
Tracked-On: #6012
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Shuang Zheng <shuang.zheng@intel.com>
Define LIST_OF_VMX_MSRS which includes a list of MSRs that are visible to
L1 guests if nested virtualization is enabled.
- If CONFIG_NVMX_ENABLED is set, these MSRs are included in
emulated_guest_msrs[].
- otherwise, they are included in unsupported_msrs[].
In this way we can take advantage of the existing infrastructure to
emulate these MSRs.
Tracked-On: #5923
Spick igned-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In order to support nested virtualization, need to expose the "Enable VMX
outside SMX operation" bit to L1 hypervisor.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For simplification purpose, use 'ssram' instead of
'software sram' for local names inside rtcm module.
Tracked-On: #6015
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move Cache/TLB arch specific parts into cpu.h
After this change, we should not expose arch specific parts out from mmu.h
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Allow guest set CR4_VMXE if CONFIG_NVMX_ENABLED is set:
- move CR4_VMXE from CR4_EMULATED_RESERVE_BITS to CR4_TRAP_AND_EMULATE_BITS
so that CR4_VMXE is removed from cr4_reserved_bits_mask.
- force CR4_VMXE to be removed from cr4_rsv_bits_guest_value so that CR4_VMXE
is able to be set.
Expose VMX feature (CPUID01.01H:ECX[5]) to L1 guests whose GUEST_FLAG_NVMX_ENABLED
is set.
Assuming guest hypervisor (L1) is KVM, and KVM uses EPT for L2 guests.
Constraints on ACRN VM.
- LAPIC passthrough should be enabled.
- use SCHED_NOOP scheduler.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
moving invvpid and invept helper code from mmu.c to mmu.h, so that they
can be accessed by the nested virtualization code.
No logical changes.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
NVMX_ENABLED: ACRN is built to support nested virtualization if set.
GUEST_FLAG_NVMX_ENABLED: indicates that the VMX capability can be present
in this guest to run nested VMs.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
TPAUSE, UMONITOR or UMWAIT instructions execution in guest VM cause
a #UD if "enable user wait and pause" (bit 26) of VMX_PROCBASED_CTLS2
is not set. To fix this issue, set the bit 26 of VMX_PROCBASED_CTLS2.
Besides, these WAITPKG instructions uses MSR_IA32_UMWAIT_CONTROL. So
load corresponding vMSR value during context switch in of a vCPU.
Please note, the TPAUSE or UMWAIT instruction causes a VM exit if the
"RDTSC exiting" and "enable user wait and pause" are both 1. In ACRN
hypervisor, "RDTSC exiting" is always 0. So TPAUSE or UMWAIT doesn't
cause a VM exit.
Performance impact:
MSR_IA32_UMWAIT_CONTROL read costs ~19 cycles;
MSR_IA32_UMWAIT_CONTROL write costs ~63 cycles.
Tracked-On: #6006
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
The current permission-checking and dispatching mechanism of hypercalls is
not unified because:
1. Some hypercalls require the exact vCPU initiating the call, while the
others only need to know the VM.
2. Different hypercalls have different permission requirements: the
trusty-related ones are enabled by a guest flag, while the others
require the initiating VM to be the Service OS.
Without a unified logic it could be hard to scale when more kinds of
hypercalls are added later.
The objectives of this patch are as follows.
1. All hypercalls have the same prototype and are dispatched by a unified
logic.
2. Permissions are checked by a unified logic without consulting the
hypercall ID.
To achieve the first objective, this patch modifies the type of the first
parameter of hcall_* functions (which are the callbacks implementing the
hypercalls) from `struct acrn_vm *` to `struct acrn_vcpu *`. The
doxygen-style documentations are updated accordingly.
To achieve the second objective, this patch adds to `struct hc_dispatch` a
`permission_flags` field which specifies the guest flags that must ALL be
set for a VM to be able to invoke the hypercall. The default value (which
is 0UL) indicates that this hypercall is for SOS only. Currently only the
`permission_flag` of trusty-related hypercalls have the non-zero value
GUEST_FLAG_SECURE_WORLD_ENABLED.
With `permission_flag`, the permission checking logic of hypercalls is
unified as follows.
1. General checks
i. If the VM is neither SOS nor having any guest flag that allows
certain hypercalls, it gets #UD upon executing the `vmcall`
instruction.
ii. If the VM is allowed to execute the `vmcall` instruction, but
attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0).
2. Hypercall-specific checks
i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the
initiating VM must be SOS and the specified target VM cannot be a
pre-launched VM. Otherwise the hypercall returns -EINVAL without
further actions.
ii. If the hypercall requires certain guest flags, the initiating VM
must have all the required flags. Otherwise the hypercall returns
-EINVAL without further actions.
iii. A hypercall with an unknown hypercall ID makes the hypercall
returns -EINVAL without further actions.
The logic above is different from the current implementation in the
following aspects.
1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts
to execute `vmcall` in ring 1, 2 or 3.
2. A pre-launched VM now gets #UD (rather than the return value -EPERM)
when it attempts to execute a trusty hypercall in ring 0.
3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it
attempts to invoke a trusty hypercall.
4. A post-launched VM with trusty support now gets the return value
-EINVAL (rather than #UD) when it attempts to invoke a non-trusty
hypercall or an invalid hypercall.
v1 -> v2:
- Update documentation that describe hypercall behavior.
- Fix Doxygen warnings
Tracked-On: #5924
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Fix a couple of typos in text displayed by a helper script
used when building ACRN. No functional change made to the
script itself.
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".
In other words, we are adopting the same practice in Linux kernel.
Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add "transform" to generate following files with xsltproc in genconf.sh:
- ivshmem_cfg.h
- misc_cfg.h
- pt_intx.c
- vm_configurations.c
- vm_configurations.h
Add code formatter using clang-format. It formats the gernerated code
with customized condfiguration if clang-format package and configuraion
file ".clang-format" exist.
Add sed in genconf.sh "transform" to replace the copyright "YEAR" of generated files.
Tracked-On: #5980
Signed-off-by: Yang,Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
We should only map [low32_max_ram, 4G) MMIO region as UC attribute,
not map [low32_max_ram, low32_max_ram + 4G) region as UC attribute.
Otherwise, the HV will complain [4G, low32_max_ram + 4G) region has
already mapped.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch fixes the 'uart=bdf@XXX' mechanism for the PCI serial
port devices which bar0 is not MMIO.
Tracked-On: #5968
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Signed-off-by: Li Fei <fei1.li@intel.com>
Both Windows guest and Linux guest use the MSR MSR_IA32_CSTAR, while
Linux uses it rarely. Now vcpu context switch doesn't save/restore it.
Windows detects the change of the MSR and rises a exception.
Do the save/resotre MSR_IA32_CSTAR during context switch.
Tracked-On: #5899
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
TLFS spec defines that when a VM is created, the value of
HV_X64_MSR_TIME_REF_COUNT is set to zero. Now tsc_offset is not
supported properly, so guest get a drifted reference time.
This patch implements tsc_offset. tsc_scale and tsc_offset
are calculated when a VM is launched and are saved in
struct acrn_hyperv of struct acrn_vm.
Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
TLFS spec defines that HV_X64_MSR_VP_INDEX and HV_X64_MSR_TIME_REF_COUNT
are read-only MSRs. Any attempt to write to them results in a #GP fault.
Fix the issue by returning error in handler hyperv_wrmsr() of MSRs
HV_X64_MSR_VP_INDEX/HV_X64_MSR_TIME_REF_COUNT emulation.
Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
TLFS spec defines different hypercall ABIs for X86 and x64. Currently
x64 hypercall interface is not supported well.
Setup the hypercall interface page according to the vcpu mode.
Tracked-On: #5956
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
These two MACROs shall be wrapped as a single
value respectively, hence brackets should be used.
Tracked-On: #5951
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
In order to support platform (such as Ander Lake) which physical address width
bits is 46, the current code need to reserve 2^16 PD page ((2^46) / (2^30)).
This is a complete waste of memory.
This patch would reserve PD page by three parts:
1. DRAM - may take PD_PAGE_NUM(CONFIG_PLATFORM_RAM_SIZE) PD pages at most;
2. low MMIO - may take PD_PAGE_NUM(MEM_1G << 2U) PD pages at most;
3. high MMIO - may takes (CONFIG_MAX_PCI_DEV_NUM * 6U) PD pages (may plus
PDPT entries if its size is larger than 1GB ) at most for:
(a) MMIO BAR size must be a power of 2 from 16 bytes;
(b) MMIO BAR base address must be power of two in size and are aligned with
its size.
Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The platform which physical-address width over 39 bits must support
1GB large page (Both MMU and VMX sides ). This could save lots of
page table pages for EPT MMIO mapping.
Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
No one uses get_mem_range_info to get the top/bottom/size of the physical memory.
We could get these informations by e820 table easily.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address of SOS. This assumes the platform must have
high memory space.
This patch removes the assumption. It will set high 64 bits max memory address of
SOS to 4G by default (Which means there's no 64 bits high memory), then update
the high 64 bits max memory address if the SOS really has high memory space.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
SOS's memory size could be calculated by its vE820 Tables easily.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
We used get_mem_range_info to get the top memory address and then use this address
as the high 64 bits max memory address. This assumes the platform must have high
memory space.
This patch calculates the high 64 bits max memory address according the e820 tables
and removes the assumption "The platform must have high memory space" by map the
low RAM region and high RAM region separately.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: eddie Dong <eddie.dong@intel.com>
Now BSP may launch VMs before APs have not done its initilization,
for example, sched_control for per-cpu. However, when we initilize
the vcpu thread data, it will access the object (scheduler) of the
sched_control of APs. As a result, it will trigger the PF.
This patch would waits each physical has done its initilization before
to continue to execute.
Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Using the MFENCE to make sure trampoline code
has been updated (clflush) into memory beforing start APs.
Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Use MFENCE to strengthen the fast string operations execute order to ensure
all trampoline code was updated before flush it into the memory.
Tracked-On: #5929
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For platform with HLAT (Hypervisor-managed Linear Address Translation)
capability, the hypervisor shall hide this feature to its guest.
This patch adds MSR_IA32_VMX_PROCBASED_CTLS3 MSR to unsupported MSR
list.
The presence of this MSR is determined by 1-setting of bit 49 of MSR
MSR_IA32_VMX_PROCBASED_CTLS. which is already in unsupported MSR list. [2]
Related documentations:
[1] Intel Architecture Instruction Set Extensions, version Feb 16, 2021,
Ch 6.12
[2] Intel KeyLocker Specification, Sept 2020, Ch 7.2
Tracked-On: #5895
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds the following dependencies among recipes:
- Building of any C file depends on $(HV_CONFIG_TIMESTAMP) which indicates
the presence of generated configuration files.
- Source files listed in $(VM_CFG_C_SRCS), which are the generated
configuration files, depends on $(HV_CONFIG_TIMESTAMP)
With the dependencies above, the build system can now safely be executed in
parallel, e.g. `make -j4`.
Tracked-On: #5874
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
sanitize_pte is used to set page table entry to map to an sanitized page to
mitigate l1tf. It should belongs to pgtable module. So move it to pagetable.c
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
lookup_address is used to lookup a pagetable entry by an address. So rename it
to pgtable_lookup_entry to indicate this clearly.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
alloc_page/free_page should been called in pagetable module. In order to do this,
we add pgtable_create_root and pgtable_create_trusty_root to create PML4 page table
page for normal world and secure world.
After this done, no one uses alloc_ept_page. So remove it.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add pgtable_create_trusty_root to allocate a page for trusty PML4 page table page.
This function also copy PDPT entries from Normal world to Secure world.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add pgtable_create_root to allocate a page for PMl4 page table page.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Rename mmu_add to pgtable_add_map;
Rename mmu_modify_or_del to pgtable_modify_or_del_map.
And move these functions declaration into pgtable.h
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In VT-d scenario, if MSI interrupt has been enabled,
vCPU writes the content in MSI registers,
and all bits of the content are read-only.
In this case, hypervisor code will call
enable_disable_msi(vdev, false), which will disable MSI.
And there's no chance to call remap_vmsi.
This is wrong behavior, which will result in the disable of MSI.
Tracked-On: #5847
Reviewed-by: Li Fei1 <fei1.li@intel.com>
Signed-off-by: liujunming <junming.liu@intel.com>
Requires explicit arch path name in the include directive.
The config scripts was also updated to reflect this change.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Each .c file includes the arch specific irq header file (with full
path) by itself if required.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Each of them now resides in a separate .c file.
Tracked-On: #5825
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move exception stack layout struct and exception/NMI handling
declarations from x86/irq.h into x86/cpu.h.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()
API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.
Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This is done be adding irq_rsvd_bitmap as an auxiliary bitmap
besides irq_alloc_bitmap.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common IRQ handling routine calls arch specific functions
pre_irq_arch() and post_irq_arch() before and after calling the
registered action function respectively.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common part initializes the global irq_desc data structure while the
arch specific part initialize the HW and its own irq data.
This is one of the preparation steps for spliting IRQ handling into common
and architecture specific parts.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Arch specific IRQ data is now an opaque pointer in irq_desc.
This is a preparation step for spliting IRQ handling into common
and architecture specific parts.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The 'uart=' parameter for the hypervisor takes multiple forms. One
is to specify the BDF (Bus, Device, Function) value of the serial
port PCI device. The description in the documentation used the
previous format (e.g. '0:18.1') but a 16-bit WORD in HEX needs
to be passed nowadays. E.g.: '0:18.1' is specified by 'uart=0xc1'
Tracked-On: #5842
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
Signed-off-by: Benjamin Fitch <benjamin.fitch@intel.com>
This patch moves pgtable definition to pgtable.h and include the proper
header file for page module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move the EPT page table related APIs to ept.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for EPT should defined in EPT module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move the MMU page table related APIs to mmu.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for MMU should defined in MMU module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
We would move the MMU page table related APIs to mmu.c and move the EPT related
APIs to EPT.c. The page table module only provides APIs to add/modify/delete/lookup
page table entry.
This patch separates common APIs and adds separate APIs of page table module
for MMU/EPT.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
post_uos_sworld_memory are used for post-launched VM which support trusty.
It's more VM related. So move it definition into vm.c
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable hypervisor(host) MONITOR-WAIT support and refine
software sram initializaion flow.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Below boolean function are defined in this patch:
- is_software_sram_enabled() to check if SW SRAM
feature is enabled or not.
- set global variable 'is_sw_sram_initialized'
to file static.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The fields and APIs in old 'struct memory_ops' are used to add/modify/delete
page table (page or entry). So rename 'struct memory_ops' to 'struct pgtable'.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use default_access_right field to replace get_default_access_right API.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.
This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
To mitigate the page size change MCE vulnerability (CVE-2018-12207), ACRN would
clear the execution permission in the EPT paging-structure entries for large pages
and then intercept an EPT execution-permission violation caused by an attempt to
execution an instruction in the guest.
However, the current code would clear the execution permission in the EPT paging-
structure entries for small pages too when we clearing the the execution permission
for large pages. This would trigger extra EPT violation VM exits.
This patch fix this issue.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
The top-level Makefile should not define any default value as the
hypervisor may have its own configurations set by previous builds.
This patch also changes the hypervisor default RELEASE to `n`.
Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
This patch resolves the following bugs that break the targets `diffconfig`
and `applydiffconfig`:
- Comments after variable definitions cause the varaible to contain
unintended trailing whitespaces.
- HV_CONFIG_XML is no longer defined; it is now HV_SCENARIO_XML.
- '*.asl' files are also generated and should be involved when comparing
the generated configuration files.
- Strings between diacritic marks (`) are intepreted as shell commands
even they are part of informative messages.
- HV_DIFFCONFIG_LIST should not contain duplicated lines.
Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
For clarity, we now prefer y|n over 0|1 as the values of boolean options on
make command lines. This patch applies this preference to the Makefile of
the device model and tools, while RELEASE=0|1 is still supported for
backward compatibility.
Tracked-On: #5772
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
SOS_RAM_SIZE/UOS_RAM_SIZE Kconfig are only used to calculate how many pages we
should reserve for the VM EPT mapping.
Now we reserve pages for each VM EPT pagetable mapping by the PLATFORM_RAM_SIZE
not the VM RAM SIZE. This could simplify the reserve logic for us: not need to
take care variable corner cases. We could make assume we reserve enough pages
base on the VM could not use the resources beyond the platform hardware resources.
So remove these two unused VM ram size kconfig.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
Add free_page to free page when unmap pagetable.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.
However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.
This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
Current memory allocation algorithm is to find the available address from
the highest possible address below max_address. If the function returns 0,
means all memory is used up and we have to put the resource at address 0,
this is dangerous for a running hypervisor.
Also returns 0 would make code logic very complicated, since memcpy_s()
doesn't support address 0 copy.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In previous code, the rsdp initialization is done in get_rsdp() api implicitly.
The function is called multiple times in following acpi table parsing functions
and the condition (rsdp == NULL) need to be added in each parsing function.
This is not needed since the panic would occur if rsdp is NULL when do acpi
initialization.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In this way, all multiboot standard data structure could be found in
multiboot_std.h. The multiboot_priv.h stores all private definitions
and multiboot.h is the only public API header file.
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Accessing to software SRAM region is not allowed when
software SRAM is pass-thru to prelaunch RTVM.
This patch removes software SRAM region from service VM
EPT if it is enabled for prelaunch RTVM.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Fixing an incorrect struct definition for ir_bits in ioapic_rte. Since bits after
the delivery status in the lower 32 bits are not touched by code,
this has never showed up as an issue. And the higher 32 bits in the RTE
are aligned by the compiler.
Tracked-On: #5773
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Currently the VM bootargs load address is hard-coded at 8KB right before
kernel load address, this should work for Linux kernel only since Linux
kernel is guaranteed to be loadered high than GPA 8K so its load address
would never be overflowed, other OS like Zephyr has no such assumption.
Tracked-On: #5689
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ivshmem spec says that the size of BAR0 is 256 bytes. Windows
ivshmem driver will check the size of BAR0. It will refuse to
load the ivshmem driver if BAR0 size is not 256.
For post-launched VM hv land ivshmem BARs are allocated by
device model. For pre-launched VM hv land ivshmem BARs are
allocated by acrn-config tool. Both device model and acrn-config
tool should make sure that the BAR base addr are aligned to 4K
at least.
Tracked-On: #5717
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch denies Service VM the access permission to device resources
owned by hypervisor.
HV may own these devices: (1) debug uart pci device for debug version
(2) type 1 pci device if have pre-launched VMs.
Current implementation exposes the mmio/pio resource of HV owned devices
to SOS, should remove them from SOS.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
This patch denies Service VM the access permission to device
resources owned by pre-launched VMs.
Rationale:
* Pre-launched VMs in ACRN are independent of service VM,
and should be immune to attacks from service VM. However,
current implementation exposes the bar resource of passthru
devices to service VM for some reason. This makes it possible
for service VM to crash or attack pre-launched VMs.
* It is same for hypervisor owned devices.
NOTE:
* The MMIO spaces pre-allocated to VFs are still presented to
Service VM. The SR-IOV capable devices assigned to pre-launched
VMs doesn't have the SR-IOV capability. So the MMIO address spaces
pre-allocated by BIOS for VFs are not decoded by hardware and
couldn't be enabled by guest. SOS may live with seeing the address
space or not. We will revisit later.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.
IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.
Linux keylocker implementation uses this feature, so they are
introduced by this patch.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.
This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
1) keylocker feature enabled
2) Different from the current loaded one.
Two opportunities to do the load_iwkey():
1) Guest enables CR4.KL bit.
2) vCPU thread context switch.
load_iwkey() costs ~600 cycles when do the load IWKey action.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.
It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.
Keylocker usage:
- New “ENCODEKEY” instructions take original key input and returns HANDLE
crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
- Software can then delete the original key from memory
- Early in boot/software, less likely to have vulnerability that allows
stealing original key
- Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
instructions
- Note:
* Software can use original key without knowing it (use HANDLE)
* HANDLE cannot be used on other systems or after warm/cold reset
* IWKey cannot be read from CPU after it's loaded (this is the
nature of this feature) and only 1 copy of IWKey inside CPU.
The virtualization implementation of Key Locker on ACRN is:
- Each vCPU has a 'struct iwkey' to store its IWKey in struct
acrn_vcpu_arch.
- At initilization, every vCPU is created with a random IWKey.
- Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
VM-exectuion control) of vCPU to capture and save the IWKey if guest
set a new IWKey. Don't support randomization (emulate CPUID to
disable) of the LOADIWKEY as hypervisor cannot capture and save the
random IWKey. From keylocker spec:
"Note that a VMM may wish to enumerate no support for HW random IWKeys
to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
cannot be easily context switched. A guest ENCODEKEY will return the
type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
virtualized a HW random IWKey with a SW specified IWKey."
- In context_switch_in() of each vCPU, hypervisor loads that vCPU's
IWKey into pCPU by LOADIWKEY instruction.
- There is an assumption that ACRN hypervisor will never use the
KeyLocker feature itself.
This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.
This patch enables the tertiary VM-execution controls.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.
This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.
To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
- set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
- load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
- clear cr4_kl_enabled
This patch trap and passthru the CR4_KL bit to guest for operation.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.
This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.
Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Currently, we hardcode the GPA base of Software SRAM
to an address that is derived from TGL platform,
as this GPA is identical with HPA for Pre-launch VM,
This hardcoded address may not work on other platforms
if the HPA bases of Software SRAM are different.
Now, Offline tool configures above GPA based on the
detection of Software SRAM on specific platform.
This patch removes the hardcoding GPA of Software SRAM,
and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
is for Prelaunch VM only.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- RTCM is initialized in hypervisor only
if RTCM binaries are detected.
- Remove address space of RTCM binary from
Software SRAM region.
- Refine parse_rtct() function, validity of
ACPI RTCT table shall be checked by caller.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Physical address to SW SRAM region maybe different
on different platforms, this hardcoded address may
result in address mismatch for SW SRAM operations.
This patch removes above hardcoded address and uses
the physical address parsed from native RTCT.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'ptcm' and 'ptct' are legacy name according
to the latest TCC spec, hence rename below files
to avoid confusing:
ptcm.c -> rtcm.c
ptcm.h -> rtcm.h
ptct.h -> rtct.h
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
This patch fixes the following issues that break the build system:
1. The tag of the root nodes of board/scenario XML files are still acrn-config,
not config_tools. This patch reverts the XPATH that refers to these nodes.
2. HV_PREDEFINED_BOARD_DIR now also relies on BOARD which may not be
available at the time the variable is defined. As both board and
scenario XML files are placed under the same directory, this patch
refines the path calculation logic to get rid of mixing variables of
the different flavors.
Tracked-On: #5644
Fixes: 97c9b24030 ("acrn-config: Reorg config tool folder")
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Simplify multiboot API by removing the global variable efiloader_sig.
Replaced by constant at the use site.
Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Remove include/boot.h since it contains only assembly variables that
should only be accessed in arch/x86/init.c.
Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.
Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This is done by adding the MAX_MMAP_ENTRIES macro in multiboot.h.
This macro has to be sync-ed with E820_MAX_ENTRIES manually though.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The init_multiboot_info() and sanitize_multiboot_ifno() APIs now
require parameters instead of implicitly relying on global boot
variables.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Calling sanitize_multiboot() from init.c instead of cpu.c.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This way, we void exposing acrn_mbi as a global variable.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This function is a derivative of get_multiboot_info().
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move "struct multboot_info" from multiboot.h into multboot.c.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move multiboot specific declarations from boot.h to multiboot.h.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Create multiboot_pri.h and move the relevant declarations into this
file.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Create a multiboot module under the boot directory and move multiboot
files as part of this.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Remove vm_configs folder and move all the XML files and generic code example into config_tools/data
Tracked-On: #5644
Signed-off-by: Xie, nanlin <nanlin.xie@intel.com>
In order to enable changing the generated C configuration files manually,
this patch introduces the target `diffconfig` to the build system.
After generating the configuration files, a developer can manually modify
these sources (which are placed under build/configs) and invoke `make
diffconfig` to generate a patch that shows the made differences. Such
patches can be registered to a build by invoking the `applydiffconfig`
target. The build system will always apply them whenever the configuration
files are regenerated.
A typical workflow to create a patch is as follows.
# The pre_build target relies on generated configuration files
hypervisor$ make BOARD=xxx SCENARIO=yyy pre_build
(manually edit files under build/configs/boards and
build/configs/scenarios)
hypervisor$ make diffconfig # Patch generated to build/config.patch
hypervisor$ cp build/config.patch /path/to/patch
The following steps apply apply the patch to another build.
hypervisor$ make BOARD=xxx SCENARIO=yyy defconfig
hypervisor$ make applydiffconfig PATCH=/path/to/patch-file-or-directory
hypervisor$ make
After any patch is registered for a build, the configuration files will be
automatically regenerated the next time `make` is invoked.
To show a list of registered patches for generated configuration files,
invoke `make applydiffconfig` without specifying `PATCH`.
v2:
* Add target `applydiffconfig` which accepts a PATCH variable to register
an arbitrary patch file or a directory containing patch file(s) for a
build. `.config_patches` is no longer used.
Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
This patch makes the build system of the hypervisor to cache the board and
scenario XML files in the build directory and generate C configuration
files from them at build time. The C configuration files that are cached in
the git repo is no longer used or updated. Paths to these generated files
in the prebuild Makefile is updated accordingly.
The following targets are introduced or modified.
* defconfig: Copy default configuration XMLs to the build directory and
generate C configuration files.
* oldconfig: No action.
* menuconfig: Print a message to redirect users to use the config app
and exit.
* showconfig: Print the BOARD, SCENARIO and RELEASE configured for the
current build.
* update_config: No action.
* (default): Build the hypervisor with defined configurations.
The following variables can be set on the command line to specify the
default configuration to be used.
* BOARD: Either a name of the target board or a path to a customized
board XML. When a board name is specified, the board XML file
is expected to be available under
misc/acrn-config/xmls/board-xmls.
* SCENARIO: Either a name of the scenario of a path to a customized
scenario XML. When a scenario name is specified, the
scenario XML file is expected to be available under
misc/acrn-config/xmls/config-xmls/$(BOARD).
* BOARD_FILE: Path to the board XML file to be used. This is now
obsoleted as BOARD provides the same functionality.
* SCENARIO_FILE: Path to the scenario XML file to be used. This is now
obsoleted as BOARD provides the same functionality.
BOARD/SCENARIO or BOARD_FILE/SCENARIO_FILE shall be used in pair, and
BOARD_FILE/SCENARIO_FILE shall point to valid files when specified. Any
violation to those constraints will stop the build with error
messages. When BOARD/SCENARIO and BOARD_FILE/SCENARIO_FILE are both defined
on the command line, the former takes precedence as the latter are to be
obsoleted.
Additionally, users can define the RELEASE variable to specify a debug or
release build. In case a previous build exists but is configured for a
different build type, the build system will automatically update the
scenario XML and rebuild the sources.
This patch also includes the following tweaks:
1. Do not use `realpath` to process search paths for generated
headers. `realpath` only accepts paths of existing files, while the
directories for generated headers may not be created at the time the
search paths are calculated.
2. Always expect `pci_dev.c` to be in place.
3. HV_CONFIG_* series now encodes absolute paths.
v3:
* Do not validate BOARD_FILE/SCENARIO_FILE if BOARD/SCENARIO are given.
v2:
* `defconfig` now also generates the C configuration files.
* BOARD/SCENARIO now accept either board/scenario names or XML file paths.
* Adapt to the new allocation.xml & unified.xml.
* Cleanup names of internal variables in config.mk for brevity.
Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
In order to remove Kconfig from the build process, acrn-config shall
transform XML configuration files to config.h and config.mk by itself. This
patch adds XSLT scripts that do the trick.
Unfortunately, the scenario XML file along is not sufficient to generate
config.h and config.mk, though. In addition to resource
allocation (i.e. allocating physical RAM for the hypervisor), the
transformation also need to do the following:
1. Translate UART info in board XML into several configuration entries
depending on the UART selected in the scenario XML.
2. Use the MAX_MSIX_TABLE_NUM value in the board XML if the scenario
XML does not specify it.
In order to use XSLT to transform both XMLs in one shot, a template is
provided to create another XML that includes (using XInclude) both board
and scenario XMLs as sub-nodes. It will be instantiated once the
transformations are integrated in the following patch.
v2:
* Add `allocation.xml` to `unified.xml` to include the results from static
allocation.
* Use HV_RAM_START and HV_RAM_SIZE in allocation results if they are not
explicitly specified in the scenario XML.
Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
In order for a unified interface for generating configuration sources from
board and scenario XMLs, this patch introduces a script named genconf.sh
which takes XML files as inputs and generate sources under the specified
directory. Once used in Makefiles, this script helps to minimize the
impacts on the Makefiles when we refine the configuration source generation
process in the future.
This patch also adds a non-zero return value to board_cfg_gen.py and
scenario_cfg_gen.py so that we do not need to inspect the logs to determine
if the generation succeeds.
Tracked-On: #5644
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
The commit 'Fix: HV: VM OS failed to assign new address to pci-vuart
BARs' need more reshuffle.
Tracked-On: #5491
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
When "signal_event" is called, "wait_event" will actually not block.
So it is ok to remove this line.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Now, we use hash table to maintain intx irq mapping by using
the key generated from sid. So once the entry is added,we can
not update source ide any more. Otherwise, we can't locate the
entry with the key generated from new source ide.
For source id change, remove_remapping/add_remapping is used
instead of update source id directly if entry was added already.
Tracked-On: #5640
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch move the split-lock logic into dedicated file
to reduce LOC. This may make the logic more clear.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
This patch adds a cache register for VMX_PROC_VM_EXEC_CONTROLS
to avoid the frequent VMCS access.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
The TF is visible to guest which may be modified by
the guest, so it is not a safe method to emulate the
split-lock. While MTF is specifically designed for
single-stepping in x86/Intel hardware virtualization
VT-x technology which is invisible to the guest. Use MTF
to single step the VCPU during the emulation of split lock.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
For a SMP guest, split-lock check may happen on
multiple vCPUs simultaneously. In this case, one
vCPU at most can be allowed running in the
split-lock emulation window. And if the vCPU is
doing the emulation, it should never be blocked
in the hypervisor, it should go back to the guest
to execute the lock instruction immediately and
trap back to the hypervisor with #DB to complete the
split-lock emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
When wrong BAR address is set for pci-vuart, OS may assign a
new BAR address to it. Pci-vuart BAR can't be reprogrammed,
for its wrong fixed value. That can may because pci_vbar.fixed and
pci_vbar.type has overlap in abstraction, pci_vbar.fixed
has a confusing name, pci_vbar.type has PCIBAR_MEM64HI which is not
really a type of pci BARs.
So replace pci_vbar.type with pci_vbar.is_mem64hi, and change
pci_vbar.fixed to an union type with new name pci_vbar.bar_type.
Tracked-On: #5491
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
We have trapped the #DB for split-lock emulation.
Only fault exception need RIP being retained.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xchg may also cause the #AC for split-lock check.
This patch adds this emulation.
1. Kick other vcpus of the guest to stop execution
if the guest has more than one vcpu.
2. Emulate the xchg instruction.
3. Notify other vcpus (if any) to restart execution.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds the split-lock emulation.
If a #AC is caused by instruction with LOCK prefix then
emulate it, otherwise, inject it back as it used to be.
1. Kick other vcpus of the guest to stop execution
and set the TF flag to have #DB if the guest has more
than one vcpu.
2. Skip over the LOCK prefix and resume the current
vcpu back to guest for execution.
3. Notify other vcpus to restart exception at the end
of handling the #DB since we have completed
the LOCK prefix instruction emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Check hardware support for all features in CR4,
and hide bits from guest by vcpuid if they're not supported
for guests OS.
Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The current code to virtualize CR0/CR4 is not
well designed, and hard to read.
This patch reshuffle the logic to make it clear
and classify those bits into PASSTHRU,
TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.
Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
While following two styles are both correct, the 2nd one is simpler.
bool is_level_triggered;
1. if (is_level_triggered == true) {...}
2. if (is_level_triggered) {...}
This patch cleans up the style in hypervisor.
Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
From SDM Vol.2C - XSETBV instruction description,
If CR4.OSXSAVE[bit 18] = 0,
execute "XSETBV" instruction will generate #UD exception.
From SDM Vol.3C 25.1.1,#UD exception has priority over VM exits,
So if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0,
VM exits won't happen.
While hv inject #GP if vCPU execute "XSETBV" instruction
when CR4.OSXSAVE[bit 18] = 0.
It's a wrong behavior, this patch will fix the bug.
Tracked-On: #4020
Signed-off-by: Junming Liu <junming.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Memory BAR of ivshmem device is 64-bit, 2 BAR registers
are used, counting in one 32-bit MMIO bar and and one
32-bit vMSIX table bar, number of bars "nr_bars" shall
be 4 instead of 3.
Tracked-On: #5490
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
- fix bug in 'hcall_destroy_vdev()', the availability of
vpci device shall be checked on 'target_vm".
- refine 'vpci_update_one_vbar()' to avoid potential NULL
pointer access.
Tracked-On: #5490
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
It is possible for more than one vCPUs to trigger shutdown on an RTVM.
We need to avoid entering VM_READY_TO_POWEROFF state again after the
RTVM has been paused or shut down.
Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can
be paused.
v1 -> v2:
- rename to poweroff_if_rt_vm for better clarity
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Hypercall handlers for post-launched VMs automatically grab the vm_lock
in dispatch_sos_hypercall(). Remove the use of vm_lock inside the
handler.
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Currently, ACRN only support shutdown when triple fault happens, because ACRN
doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This
patch emulate a virtual shutdown component, and the vACPI method for guest OS
to use.
Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status
registers for pre-launched VMs shutdown
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Like post-launched VMs, for pre-launched VMs, the ACPI reset register
is also fixed at 0xcf9 and the reset value is 0xE, so pre-launched VMs
now also use ACPI reset register for rebooting.
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
A VM may transition to VM_PAUSED state while its console is being used.
Jump back to the HV shell if this happens so the console does not appear
stuck.
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.
Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.
v1 -> v2:
- use vm_lock to avoid a race on shutdown
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add two Kconfig pSRAM config:
one for whether to enable the pSRAM on the platfrom or not;
another for if the pSRAM is enabled on the platform whether to enable
the pSRAM in the pre-launched RTVM.
If we enable the pSRAM on the platform, we should remove the pSRAM EPT
mapping from the SOS to prevent it could flush the pSRAM cache.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.Modified the virtual e820 table for pre-launched VM. We added a
segment for pSRAM, and thus lowmem RAM is split into two parts.
Logics are added to deal with the split.
2.Added EPT mapping of pSRAM segment for pre-launched RTVM if it
uses pSRAM.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO,
so we can't use the an exist API to do the EPT mapping and set the EPT cache
attribute to WB for it. Now we assume that SOS must assign the PSRAM area as
a whole and as a separate memory region whose base address is PSRAM_BASE_HPA.
If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this
EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB.
And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use ept_flush_leaf_page to emulate guest WBINVD when PTCM is enabled and skip
the pSRAM in ept_flush_leaf_page.
TODO: do we need to emulate WBINVD in HV side.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename hv_access_memory_region_update to ppt_clear_user_bit to
verb + object style.
Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Temporarily remove NX bit of PTCM binary in pagetable during pSRAM
initialization:
1.added a function ppt_set_nx_bit to temporarily remove/restore the NX bit of
a given area in pagetable.
2.Temporarily remove NX bit of PTCM binary during pSRAM initialization to make
PTCM codes executable.
3. TODO: We may use SMP call to flush TLB and do pSRAM initilization on APs.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The added parse_ptct function will parse native ACPI PTCT table to
acquire information like pSRAM location/size/level and PTCM location,
and save them.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.We added a function init_psram to initialize pSRAM as well as some definitions.
Both AP and BSP shall call init_psram to make sure pSRAM is initialized, which is
required by PTCM.
BSP:
To parse PTCT and find the entry of PTCM command function, then call PTCM ABI.
AP:
Wait until BSP has done the parsing work, then call the PTCM ABI.
Synchronization of AP and BSP is ensured, both inside and outside PTCM.
2. Added calls of init_psram in init_pcpu_post to initialize pSRAM in HV booting phase
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
According 11.5.1 Cache Control Registers and Bits, Intel SDM Vol 3,
change CR0.CD will not flush cache to insure memory coherency. So
it's not needed to call wbinvd to flush cache in ACRN Hypervisor.
That's what the guest should do.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
cleanup vpci structure when shutdown_vm to avoid use uninitialized data
after reboot.
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add cteate method for vmcs9900 vdev in hypercalls.
The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
if vuart type is pci-vuart, then use MSI interrupt
split vuart_toggle_intr() control flow into vuart_trigger_level_intr() &
trigger_vmcs9900_msix(), because MSI is edge triggered, no deassertion
operation. Only trigger MSI for pci-vuart when assert interrupt.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
support pci-vuart type, and refine:
1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type.
2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type.
3.Move io handler code out of setup_vuart(), into init_legacy_vuarts()
4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev.
and some change from requirement:
1.Increase MAX_VUART_NUM_PER_VM to 8.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
The vuart_read()/vuart_write() are coupled with PIO vuart type. Move
the non-type related code into vuart_read_reg()/vuart_write_reg(), so
that we can re-use them to handle MMIO request of pci-vuart type.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- Refactor pci_dev_c.py to insert devices information per VMs
- Add function to get unused vbdf form bus:dev.func 00:00.0 to 00:1F.7
Add pci devices variables to vm_configurations.c
- To pass the pci vuart information form tool, add pci_dev_num and
pci_devs initialization by tool
- Change CONFIG_SOS_VM in hypervisor/include/arch/x86/vm_config.h to
compromise vm_configurations.c
Tracked-On: #5426
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
The new (1.8.17) release of doxygen is complaining about errors in the
doxygen comments that were's reported by our current 1.8.13 release.
Let's fix these now. In a separate PR we'll also update some
configuration settings that will be obsolete, in preparation for moving
to this newer version.
[External_System_ID]ACRN-6774
Tracked-On: #5385
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
In pre-launched VM the GPA of vmsix BAR which is used for vmsix
over msi is calculated/allocated by acrn-config tool. The GPA
needs to be assigned to vdev when vdev is initialized. The
assignment is only needed for pre-launched VM. For SOS kernel
will reprogram the Bar base when startup. For post-launched VM
the Bar GPA will be assigned by device model via hypercall.
Tracked-On: #5316
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When init_vmsix_on_msi is called during the initialization of a pt
device, the vmsix bar used for vmsix over msi is just created. No
mapping/unmapping is done and pci_vdev_write_vbar should be called
instead of vdev_pt_write_vbar at the time. Currently the Bar mapping
is delayed till OS sizing the Bar. Backup vbar base_gpa to mmio_gpa
is not required here becuase it will be done later when Bar mapping.
Tracked-On: #5316
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- Since de-privilege boot is removed, we no longer need to save boot
context in boot time.
- cpu_primary_start_64 is not an entry for ACRN hypervisor any more,
and can be removed.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch enables doorbell feature for hv-land
ivshmem device to support interrupt notification
between VMs that use inter-VM(ivshmem) devices.
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This function can be used by other modules instead of hypercall
handling only, hence move it to vlapic.c
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- write_vmsix_cap_reg(): emulates vmsix cap writes.
write_pt_vmsix_cap_reg(): emulates msix cap write
for PT devices.
- rw_vmsix_table(): emulates vmsix table bar space access.
- vmsix_handle_table_mmio_access(): emulates the vmsix
bar space access only.
- pt_vmsix_handle_table_mmio_access(): emulates the vmsix
bar space access and remap msi entry for PT device if
write operation is executed.
- rename 'init_vmsix()' and 'deinit_vmsix()' to
'init_vmsix_pt()' and 'deinit_vmsix_pt()' respectively,
they're for PT devices only.
- remove below 2 functions,call
'pci_vdev_read_vcfg()' directly in cases they're used.
- 'read_vmsi_cap_reg()'
- 'read_vmsix_cap_reg()'
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Done <eddie.dong@intel.com>
vmsix.c originally covers ptdev case but ACRN hypervisor
need to support pure virtual PCI mediator, such as ivshmem
device in this patch set.
For better understanding the code changes from patch
perspective, split the changes to several small patches.
This patch moves most original vmsix code to pci_pt.c
as they're mixed with ptdev specific operations.
The subsequent patches will start the detail abstraction change.
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot.
Thus the vboot wrapper layer can be removed and the direct boot functions
don't need to be wrapped in direct_boot.c:
- remove call to init_vboot(), and call e820_alloc_memory() directly at the
time when the trampoline buffer is actually needed.
- Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq().
- remove get_ap_trampoline_buf(), since the existing function
get_trampoline_start16_paddr() returns the exact same value.
- merge init_general_vm_boot_info() into init_vm_boot_info().
- remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly.
- move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the
wrapper over two boot modes.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Since now we support direct boot only, we don't have to use FIRMWARE variable
to differentiate between sbl/GRUB and UEFI boot.
After this change:
- "FIRMWARE=sbl/uefi" should be removed from make commands.
- the firmware name is removed from the installed ACRN image. For example,
acrn.apl-up2.sbl.sdc.32.out will be changed to acrn.apl-up2.sdc.32.out.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>