IC_ADD_HV_VDEV -> ACRN_IOCTL_CREATE_VDEV
IC_REMOVE_HV_VDEV -> ACRN_IOCTL_DESTROY_VDEV
struct acrn_emul_dev -> struct acrn_vdev
Also, move struct acrn_vdev to acrn_common.h as this structure is used
by both DM and HV.
Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
IC_ASSIGN_MMIODEV -> ACRN_IOCTL_ASSIGN_MMIODEV
IC_DEASSIGN_MMIODEV -> ACRN_IOCTL_DEASSIGN_MMIODEV
struct acrn_mmiodev has slight change. Move struct acrn_mmiodev into
acrn_common.h because it is used by both DM and HV.
Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
IC_ASSIGN_PCIDEV -> ACRN_IOCTL_ASSIGN_PCIDEV
IC_DEASSIGN_PCIDEV -> ACRN_IOCTL_DEASSIGN_PCIDEV
QUIRK_PTDEV -> ACRN_PTDEV_QUIRK_ASSIGN
struct acrn_assign_pcidev -> struct acrn_pcidev
Move struct acrn_pcidev into acrn_common.h because it is used by both
DM and HV.
Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
struct hc_platform_info -> struct acrn_platform_info
MAX_PLATFORM_LAPIC_IDS -> ACRN_PLATFORM_LAPIC_IDS_MAX
A layout change to the struct hc_platform_info is that move
max_kata_containers to back of vm_config_size,
uint16_t max_vcpus_per_vm;
uint16_t max_vms;
uint32_t vm_config_size;
uint64_t max_kata_containers;
Then, they are nature 64-bits aligned.
Tracked-On: #6282
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Guest may not use INVEPT instruction after enabling any of bits 2:0 from
0 to 1 of a present EPT entry, then the shadow EPT entry has no chance
to sync guest EPT entry. According to the SDM,
"""
Software may use the INVEPT instruction after modifying a present EPT
paging-structure entry (see Section 28.2.2) to change any of the
privilege bits 2:0 from 0 to 1.1 Failure to do so may cause an EPT
violation that would not otherwise occur. Because an EPT violation
invalidates any mappings that would be used by the access that caused
the EPT violation (see Section 28.3.3.1), an EPT violation will not
recur if the original access is performed again, even if the INVEPT
instruction is not executed.
"""
Sync the afterthought of privilege bits from guest EPT entry to shadow
EPT entry to cover above case.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
MSR_IA32_VMX_EPT_VPID_CAP is 64 bits. Using 32 bits MACROs with it may
cause the bit expression wrong.
Unify the MSR_IA32_VMX_EPT_VPID_CAP operation with 64 bits definition.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
It seems important that passthru device's max payload settings match
the settings on the native device otherwise passthru device may not work.
So we have to set vrp's max payload capacity as native root port
otherwise we may accidentally change passthru device's max payload
since during guest OS's pci device enumeration, pass-thru device will
renegotiate its max payload's setting with vrp.
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
For ramdisk, need to double check the limit of ramdisk GPA when locate
ramdisk load addr;
For SOS kernel load addr, need not to consider position of hypervisor
start and end address since the range has been set to e820 RESERVED.
Tracked-On: #5879
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
When hypervisor boots, the multiboot modules have been loaded to host space
by bootloader already. The space range of pre-launched VM modules is also
exposed to SOS VM, so SOS VM kernel might pick this range to extract kernel
when KASLR enabled. This would corrupt pre-launched VM modules and result in
pre-launched VM boot fail.
This patch will try to fix this issue. The SOS VM will not be loaded to guest
space until all pre-launched VMs are loaded successfully.
Tracked-On: #5879
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
We should not hardcode the VM ramdisk load address right after kernel
load address because of two reasons:
1. Per Linux kernel boot protocol, the Kernel need a size of
contiguous memory(i.e. init_size field in zeropage) from
its load address to boot, then the address would overlap
with ramdisk;
2. The hardcoded address could not be ensured as a valid address
in guest e820 table, especially with a huge ramdisk;
Also we should not hardcode the VM kernel load address to its pref_address
which work for non-relocatable kernel only. For a relocatable kernel,
it could run from any valid address where bootloader load to.
The patch will set the VM kernel and ramdisk load address by scanning
guest e820 table with find_space_from_ve820() api:
1. For SOS VM, the ramdisk has been loaded by multiboot bootloader
already so set the load address as module source address,
the relocatable kernel would be relocated to a appropriate address
out space of hypervisor and boot modules to avoid guest memory
copy corruption;
2. For pre-launched VM, the kernel would be loaded to pref_address
first, then ramdisk will be put to a appropriate address out space
of kernel according to guest memory layout and maximum ramdisk
address limit under 4GB;
Tracked-On: #5879
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The SOS VM should not use host efi memmap directly, since there are some
memory ranges which reserved by hypersior and pre-launched VM should not
be exposed to SOS VM. These memory ranges should be filtered from SOS VM
efi memmap, otherwise it would caused unexpected issues. For example, The
SOS kernel kaslr will try to find the random address for extracted kernel
image in EFI table first. So it's possible that these reserved memory is
picked for extracted kernel image. This will make SOS kernel boot fail.
The patch would create efi memmory map for SOS VM and pass the memory map
info to zeropage for loading SOS VM kernel. The boot service related region
in host efi memmap is also kept for SOS VM so that SOS VM could have full
capability of EFI services as host.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The bootargs module represents a string buffer and there is a NULL char at
the end so its size should not be calculated by strnlen_s(), otherwise the
NULL char will be ignored in gpa copy and result in kernel boot fail;
Tracked-On: #6162
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Previously the load GPA of LaaG boot params like zeropage/cmdline and
initgdt are all hard-coded, this would bring potential LaaG boot issues.
The patch will try to fix this issue by finding a 32KB load_params memory
block for LaaG to store these guest boot params.
For other guest with raw image, in general only vgdt need to be cared of so
the load_params will be put at 0x800 since it is a common place that most
guests won't touch for entering protected mode.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The API would search ve820 table and return a valid GPA when the requested
size of memory is available in the specified memory range, or return
INVALID_GPA if the requested memory slot is not available;
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The memory range of [0xA0000, 0xFFFFF] is a known reserved area for BIOS,
actually Linux kernel would enforce this area to be reserved during its
boot stage. Set this area to usable would cause potential compatibility
issues.
The patch set the range to reserved type to make it consistent with the
real world.
BTW, There should be a EBDA(Entended BIOS DATA Area) with reserved type
exist right before 0xA0000 in real world for non-EFI boot. But given ACRN
has no legacy BIOS emulation, we simply skipped the EBDA in vE820.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Hypervisor use e820_alloc_memory() api to allocate memory for trampoline code
and ept pages, whereas the usable ram in hv_e820 might include efi boot service
region if system boot from uefi environment, this would result in some uefi
service broken in SOS. These boot service region should be filtered from
hv_e820.
This patch will parse the efi memory descriptor entries info from efi memory
map pointer when system boot from uefi environment, and then initialize hv_e820
accordingly, that all efi boot service region would be kept as reserved in
hv_e820.
Please note the original efi memory map could be above 4GB address space,
so the efi memory parsing process must be done after enable_paging().
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
When hypervisor boot from efi environment, the efi memory layout should be
considered as main memory map reference for hypervisor use. This patch add
function that parses the efi memory descriptor entries info from efi memory
map pointer and stores the info into a static hv_memdesc[] array.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
With this patch, the hv_e820 will be initialized after enable paging. This
is because the hv_e820 will be initialized from efi mmap when system boot
from uefi, which the efi mmap could be above 4G space.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The simply rename mi_acpi_rsdp_va in acrn_boot_info struct to acpi_rsdp_va;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch has below changes:
1. rename mi_efi_info to uefi_info in struct acrn_boot_info;
2. remove redundant "efi_" prefix for efi_info struct members;
3. The efi_info structure in acrn_boot_info struct is defined as
same as Linux kernel so the native efi info from boot loader
is passed to SOS zeropage with memcpy() api directly. Now replace
memcpy() with detailed struct member assignment;
4. add boot_from_uefi() api;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Use more generic abi_mmap struct to replace multiboot_mmap struct in
acrn_boot_info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Use more generic abi_module struct to replace multiboot_module struct in
acrn_boot_info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The patch has below changes:
1. rename mi_loader_name in acrn_boot_info struct to loader_name;
2. change loader_name type from pointer to array to avoid accessing
original multiboot info region;
3. remove mi_drivers_length and mi_drivers_addr which are never used;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The name of mi_cmdline in acrn_boot_info structure would cause confusion with
mi_cmdline in multiboot_info structure, rename it to cmdline. At the same time,
the data type is changed from pointer to array to avoid accessing the original
multiboot info region which might be used by other software modules.
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add a wrapper API init_acrn_boot_info() so that it could be used to boot
ACRN with any boot protocol;
Another change is change term of multiboot1 to multiboot because there is
no such term officially;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Given the structure in multiboot.h could be used for any boot protocol,
use a more generic name "boot.h" instead;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The mi_flags is not needed any more so remove it from acrn_boot_info struct;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The acrn_multiboot_info structure stores acrn specific boot info and should
not be limited to support multiboot protocol related structure only.
This patch only do below changes:
1. change name of acrn_multiboot_info to acrn_boot_info;
2. change name of mbi to abi because of the change in 1, also the
naming might bring confusion with native multiboot info;
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
ACRN used to support deprivileged boot mode which do not need multiboot
modules, while direct boot mode need multiboot modules at lease for
service VM bzImage, so ACRN postponed the multiboot modules sanity check
in init_vm_boot_info.
Now deprivileged boot mode was totally removed, so we can do multiboot
module check in sanitize_acrn_multiboot_info().
Tracked-On: #5661
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Replace rdstc() and get_tsc_khz() with their architectural agnostic
counterparts cpu_ticks() and cpu_tickrate().
Tracked-On: #5920
Signed-off-by: Yi Liang <yi.liang@intel.com>
e820_alloc_memory() splits one E820 entry into two entries. With vEPT
enabled, e820_alloc_memory() is called one more. On some platforms, the
e820 entries might exceed 32.
Enlarge E820_MAX_ENTRIES to 64. Please note, it must be less than 128
due to constrain of zeropage. Linux kernel defines it as 128.
Tracked-On: #6168
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vmptrld_vmexit_handler() has a same code snippet with
vmclear_vmexit_handler(). Wrap the same code snippet as a static
function clear_vmcs02().
There is only a small logic change that add
nested->current_vmcs12_ptr = INVALID_GPA
in vmptrld_vmexit_handler() for the old VMCS. That's reasonable.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
get_ept_entry() actually returns the EPTP of a VM. So rename it to
get_eptp() for readability.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We need to deny accesses from SOS to the HV owned UART device, otherwise
SOS could have direct access to this physical device and mess up the HV
console.
If ACRN debug UART is configured as PIO based, For example,
CONFIG_SERIAL_PIO_BASE is generated from acrn-config tool, or the UART
config is overwritten by hypervisor parameter "uart=port@<port address>",
it could run into problem if ACRN doesn't emulate this UART PIO port
to SOS. For example:
- none of the ACRN emulated vUART devices has same PIO port with the
port of the debug UART device.
- ACRN emulates PCI vUART for SOS (configure "console_vuart" with
PCI_VUART in the scenario configuration)
This patch fixes the above issue by masking PIO accesses from SOS.
deny_hv_owned_devices() is moved after setup_io_bitmap() where
vm->arch_vm.io_bitmap is initialized.
Commit 50d852561 ("HV: deny HV owned PCI bar access from SOS") handles
the case that ACRN debug UART is configured as a PCI device. e.g.,
hypervisor parameter "uart=bdf@<BDF value>" is appended.
If the hypervisor debug UART is MMIO based, need to configured it as
a PCI type device, so that it can be hidden from SOS.
Tracked-On: #5923
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Malicious input 'index' may trigger buffer
overflow on array 'irte_alloc_bitmap[]'.
This patch validate that 'index' shall be
less than 'CONFIG_MAX_IR_ENTRIES' and also
remove unnecessary check on 'index' in
'ptirq_free_irte()' function with this fix.
Tracked-On: #6132
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
vlapic_write handle 'offset' that is valid and ignore
all other invalid 'offset'. so ASSERT on this 'offset'
input is unnecessary.
This patch removes above ASSERT to avoid potential
hypervisor crash by guest malicious input when debug
build is used.
Tracked-On: #6131
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
For a pci BAR, its size aligned bits have fixed to 0(except the memory
type bits, they have another fixed value), they are read-only.
When write ~0U to BAR for sizing, (type_bits | size_mask) is written
into BAR.
So do not need to distinguish between sizing vBAR and programming vBAR.
When write a value to vBAR, always store (value & size_mask | type_bit)
to vfcg.
pci_vdev_read_vbar() is unnecessary, because it is only need to read
vcfg.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
When guest doing BAR re-programming, we should check whether
the base address of the BAR is valid.This patch does this check by:
1. whether the gpa is located in the responding MMIO window
2. whether the gpa is aligned with the BAR size
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Now we use pci_vdev_update_vbar_base to update vBAR base address when
guest re-programming BAR. For a IO BAR, we would calculate the 32 bits
base address then mask the high 16 bits. However, the mask code would
never be called since the first if condition statement is always true.
This patch fix it by move the unamsk code into the first if condition
statement.
Tracked-On: #6011
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
generate_shadow_ept_entry() didn't verify the correctness of the requested
guest EPT mapping. That might leak host memory access to L2 VM.
To simplify the implementation of the guest EPT audit, hide capabilities
'map 2-Mbyte page' and 'map 1-Gbyte page' from L1 VM. In addition,
minimize the attribute bits of EPT entry when create a shadow EPT entry.
Also, for invalid requested mapping address, reflect the EPT_VIOLATION to
L1 VM.
Here, we have some TODOs:
1) Enable large page support in generate_shadow_ept_entry()
2) Evaluate if need to emulate the invalid GPA access of L2 in HV directly.
3) Minimize EPT entry attributes.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
L1 VM changes the guest EPT and do INVEPT to invalidate the previous
TLB cache of EPT entries. The shadow EPT replies on INVEPT instruction
to do the update.
The target shadow EPTs can be found according to the 'type' of INVEPT.
Here are two types and their target shadow EPT,
1) Single-context invalidation
Get the EPTP from the INVEPT descriptor. Then find the target
shadow EPT.
2) Global invalidation
All shadow EPTs of the L1 VM.
The INVEPT emulation handler invalidate all the EPT entries of the
target shadow EPTs.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When a shadow EPT is not used anymore, its resources need to be
released.
free_sept_table() is introduced to walk the whole shadow EPT table and
free the pagetable pages.
Please note, the PML4E page of shadow EPT is not freed by
free_sept_table() as it still be used to present a shadow EPT pointer.
Tracked-On: #5923
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
With shadow EPT, the hypervisor walks through guest EPT table:
* If the entry is not present in guest EPT, ACRN injects EPT_VIOLATION
to L1 VM and resumes to L1 VM.
* If the entry is present in guest EPT, do the EPT_MISCONFIG check.
Inject EPT_MISCONFIG to L1 VM if the check failed.
* If the entry is present in guest EPT, do permission check.
Reflect EPT_VIOLATION to L1 VM if the check failed.
* If the entry is present in guest EPT but shadow EPT entry is not
present, create the shadow entry and resumes to L2 VM.
* If the entry is present in guest EPT but the GPA in the entry is
invalid, injects EPT_VIOLATION to L1 VM and resumes L1 VM.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
* Hide 5 level EPT capability, let L1 guest stick to 4 level EPT.
* Access/Dirty bits are not support currently, hide corresponding EPT
capability bits.
* "Mode-based execute control for EPT" is also not support well
currently, hide its capability bit from MSR_IA32_VMX_PROCBASED_CTLS2.
Tracked-On: #5923
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>