For platform with HLAT (Hypervisor-managed Linear Address Translation)
capability, the hypervisor shall hide this feature to its guest.
This patch adds MSR_IA32_VMX_PROCBASED_CTLS3 MSR to unsupported MSR
list.
The presence of this MSR is determined by 1-setting of bit 49 of MSR
MSR_IA32_VMX_PROCBASED_CTLS. which is already in unsupported MSR list. [2]
Related documentations:
[1] Intel Architecture Instruction Set Extensions, version Feb 16, 2021,
Ch 6.12
[2] Intel KeyLocker Specification, Sept 2020, Ch 7.2
Tracked-On: #5895
Signed-off-by: Yifan Liu <yifan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
sanitize_pte is used to set page table entry to map to an sanitized page to
mitigate l1tf. It should belongs to pgtable module. So move it to pagetable.c
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
lookup_address is used to lookup a pagetable entry by an address. So rename it
to pgtable_lookup_entry to indicate this clearly.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
alloc_page/free_page should been called in pagetable module. In order to do this,
we add pgtable_create_root and pgtable_create_trusty_root to create PML4 page table
page for normal world and secure world.
After this done, no one uses alloc_ept_page. So remove it.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add pgtable_create_trusty_root to allocate a page for trusty PML4 page table page.
This function also copy PDPT entries from Normal world to Secure world.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Add pgtable_create_root to allocate a page for PMl4 page table page.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Rename mmu_add to pgtable_add_map;
Rename mmu_modify_or_del to pgtable_modify_or_del_map.
And move these functions declaration into pgtable.h
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Requires explicit arch path name in the include directive.
The config scripts was also updated to reflect this change.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Each .c file includes the arch specific irq header file (with full
path) by itself if required.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Each of them now resides in a separate .c file.
Tracked-On: #5825
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()
API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.
Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This is done be adding irq_rsvd_bitmap as an auxiliary bitmap
besides irq_alloc_bitmap.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common IRQ handling routine calls arch specific functions
pre_irq_arch() and post_irq_arch() before and after calling the
registered action function respectively.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common part initializes the global irq_desc data structure while the
arch specific part initialize the HW and its own irq data.
This is one of the preparation steps for spliting IRQ handling into common
and architecture specific parts.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Arch specific IRQ data is now an opaque pointer in irq_desc.
This is a preparation step for spliting IRQ handling into common
and architecture specific parts.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This patch moves pgtable definition to pgtable.h and include the proper
header file for page module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Move the EPT page table related APIs to ept.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for EPT should defined in EPT module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move the MMU page table related APIs to mmu.c. page module only provides APIs to
allocate/free page for page table page. pagetabl module only provides APIs to
add/modify/delete/lookup page table entry. The page pool and the page table
related APIs for MMU should defined in MMU module.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
We would move the MMU page table related APIs to mmu.c and move the EPT related
APIs to EPT.c. The page table module only provides APIs to add/modify/delete/lookup
page table entry.
This patch separates common APIs and adds separate APIs of page table module
for MMU/EPT.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
post_uos_sworld_memory are used for post-launched VM which support trusty.
It's more VM related. So move it definition into vm.c
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable ACRN guest MONITOR-WAIT support if software
SRAM is configured.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per-core software SRAM L2 cache may be flushed by 'mwait'
extension instruction, which guest VM may execute to enter
core deep sleep. Such kind of flushing is not expected when
software SRAM is enabled for RTVM.
Hypervisor disables MONITOR-WAIT support on both hypervisor
and VMs sides to protect above software SRAM from being flushed.
This patch disable hypervisor(host) MONITOR-WAIT support and refine
software sram initializaion flow.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Below boolean function are defined in this patch:
- is_software_sram_enabled() to check if SW SRAM
feature is enabled or not.
- set global variable 'is_sw_sram_initialized'
to file static.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The fields and APIs in old 'struct memory_ops' are used to add/modify/delete
page table (page or entry). So rename 'struct memory_ops' to 'struct pgtable'.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use default_access_right field to replace get_default_access_right API.
Tracked-On: #5830
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
RTVM is enforced to use 4KB pages to mitigate CVE-2018-12207 and performance jitter,
which may be introduced by splitting large page into 4KB pages on demand. It works
fine in previous hardware platform where the size of address space for the RTVM is
relatively small. However, this is a problem when the platforms support 64 bits
high MMIO space, which could be super large and therefore consumes large # of
EPT page table pages.
This patch optimize it by using large page for purely data pages, such as MMIO spaces,
even for the RTVM.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
To mitigate the page size change MCE vulnerability (CVE-2018-12207), ACRN would
clear the execution permission in the EPT paging-structure entries for large pages
and then intercept an EPT execution-permission violation caused by an attempt to
execution an instruction in the guest.
However, the current code would clear the execution permission in the EPT paging-
structure entries for small pages too when we clearing the the execution permission
for large pages. This would trigger extra EPT violation VM exits.
This patch fix this issue.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
SOS_RAM_SIZE/UOS_RAM_SIZE Kconfig are only used to calculate how many pages we
should reserve for the VM EPT mapping.
Now we reserve pages for each VM EPT pagetable mapping by the PLATFORM_RAM_SIZE
not the VM RAM SIZE. This could simplify the reserve logic for us: not need to
take care variable corner cases. We could make assume we reserve enough pages
base on the VM could not use the resources beyond the platform hardware resources.
So remove these two unused VM ram size kconfig.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
Add free_page to free page when unmap pagetable.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.
However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.
This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
memory_ops structure will be changed to store page table related fields.
However, secure world memory base address is not one of them, it's VM
related. So save sworld_memory_base_hva in vm_arch structure directly.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Tracked-On: #5788
Current memory allocation algorithm is to find the available address from
the highest possible address below max_address. If the function returns 0,
means all memory is used up and we have to put the resource at address 0,
this is dangerous for a running hypervisor.
Also returns 0 would make code logic very complicated, since memcpy_s()
doesn't support address 0 copy.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
In previous code, the rsdp initialization is done in get_rsdp() api implicitly.
The function is called multiple times in following acpi table parsing functions
and the condition (rsdp == NULL) need to be added in each parsing function.
This is not needed since the panic would occur if rsdp is NULL when do acpi
initialization.
Tracked-On: #5626
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Accessing to software SRAM region is not allowed when
software SRAM is pass-thru to prelaunch RTVM.
This patch removes software SRAM region from service VM
EPT if it is enabled for prelaunch RTVM.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
This patch denies Service VM the access permission to device resources
owned by hypervisor.
HV may own these devices: (1) debug uart pci device for debug version
(2) type 1 pci device if have pre-launched VMs.
Current implementation exposes the mmio/pio resource of HV owned devices
to SOS, should remove them from SOS.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
This patch denies Service VM the access permission to device
resources owned by pre-launched VMs.
Rationale:
* Pre-launched VMs in ACRN are independent of service VM,
and should be immune to attacks from service VM. However,
current implementation exposes the bar resource of passthru
devices to service VM for some reason. This makes it possible
for service VM to crash or attack pre-launched VMs.
* It is same for hypervisor owned devices.
NOTE:
* The MMIO spaces pre-allocated to VFs are still presented to
Service VM. The SR-IOV capable devices assigned to pre-launched
VMs doesn't have the SR-IOV capability. So the MMIO address spaces
pre-allocated by BIOS for VFs are not decoded by hardware and
couldn't be enabled by guest. SOS may live with seeing the address
space or not. We will revisit later.
Tracked-On: #5615
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The logical processor scoped IWKey can be copied to or from a
platform-scope storage copy called IWKeyBackup. Copying IWKey to
IWKeyBackup is called ‘backing up IWKey’ and copying from IWKeyBackup to
IWKey is called ‘restoring IWKey’.
IWKeyBackup and the path between it and IWKey are protected against
software and simple hardware attacks. This means that IWKeyBackup can be
used to distribute an IWKey within the logical processors in a platform
in a protected manner.
Linux keylocker implementation uses this feature, so they are
introduced by this patch.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Different vCPU may have different IWKeys. Hypervisor need do the iwkey
context switch.
This patch introduce a load_iwkey() function to do that. Switches the
host iwkey when the switch_in vCPU satisfies:
1) keylocker feature enabled
2) Different from the current loaded one.
Two opportunities to do the load_iwkey():
1) Guest enables CR4.KL bit.
2) vCPU thread context switch.
load_iwkey() costs ~600 cycles when do the load IWKey action.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm. These keys are more valuable than what they guard. If stolen
once, the key can be repeatedly used even on another system and even
after vulnerability closed.
It also introduces a CPU-internal wrapping key (IWKey), which is a key-
encryption key to wrap AES keys into handles. While the IWKey is
inaccessible to software, randomizing the value during the boot-time
helps its value unpredictable.
Keylocker usage:
- New “ENCODEKEY” instructions take original key input and returns HANDLE
crypted by an internal wrap key (IWKey, init by “LOADIWKEY” instruction)
- Software can then delete the original key from memory
- Early in boot/software, less likely to have vulnerability that allows
stealing original key
- Later encrypt/decrypt can use the HANDLE through new AES KeyLocker
instructions
- Note:
* Software can use original key without knowing it (use HANDLE)
* HANDLE cannot be used on other systems or after warm/cold reset
* IWKey cannot be read from CPU after it's loaded (this is the
nature of this feature) and only 1 copy of IWKey inside CPU.
The virtualization implementation of Key Locker on ACRN is:
- Each vCPU has a 'struct iwkey' to store its IWKey in struct
acrn_vcpu_arch.
- At initilization, every vCPU is created with a random IWKey.
- Hypervisor traps the execution of LOADIWKEY (by 'LOADIWKEY exiting'
VM-exectuion control) of vCPU to capture and save the IWKey if guest
set a new IWKey. Don't support randomization (emulate CPUID to
disable) of the LOADIWKEY as hypervisor cannot capture and save the
random IWKey. From keylocker spec:
"Note that a VMM may wish to enumerate no support for HW random IWKeys
to the guest (i.e. enumerate CPUID.19H:ECX[1] as 0) as such IWKeys
cannot be easily context switched. A guest ENCODEKEY will return the
type of IWKey used (IWKey.KeySource) and thus will notice if a VMM
virtualized a HW random IWKey with a SW specified IWKey."
- In context_switch_in() of each vCPU, hypervisor loads that vCPU's
IWKey into pCPU by LOADIWKEY instruction.
- There is an assumption that ACRN hypervisor will never use the
KeyLocker feature itself.
This patch implements the vCPU's IWKey management and the next patch
implements host context save/restore IWKey logic.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In order for a VMM to capture the IWKey values of guests, processors
that support Key Locker also support a new "LOADIWKEY exiting"
VM-execution control in bit 0 of the tertiary processor-based
VM-execution controls.
This patch enables the tertiary VM-execution controls.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
KeyLocker is a new security feature available in new Intel CPUs that
protects data-encryption keys for the Advanced Encryption Standard (AES)
algorithm.
This patch emulates Keylocker CPUID leaf 19H to support Keylocker
feature for guest VM.
To make the hypervisor being able to manage the IWKey correctly, this
patch doesn't expose hardware random IWKey capability
(CPUID.0x19.ECX[1]) to guest VM.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Bit19 (CR4_KL) of CR4 is CPU KeyLocker feature enable bit. Hypervisor
traps the bit's writing to track the keylocker feature on/off of guest.
While the bit is set by guest,
- set cr4_kl_enabled to indicate the vcpu's keylocker feature enabled status
- load vcpu's IWKey in host (will add in later patch)
While the bit is clear by guest,
- clear cr4_kl_enabled
This patch trap and passthru the CR4_KL bit to guest for operation.
Tracked-On: #5695
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.
This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.
Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Currently, we hardcode the GPA base of Software SRAM
to an address that is derived from TGL platform,
as this GPA is identical with HPA for Pre-launch VM,
This hardcoded address may not work on other platforms
if the HPA bases of Software SRAM are different.
Now, Offline tool configures above GPA based on the
detection of Software SRAM on specific platform.
This patch removes the hardcoding GPA of Software SRAM,
and also renames MACRO 'SOFTWARE_SRAM_BASE_GPA' to
'PRE_RTVM_SW_SRAM_BASE_GPA' to avoid confusing, as it
is for Prelaunch VM only.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- RTCM is initialized in hypervisor only
if RTCM binaries are detected.
- Remove address space of RTCM binary from
Software SRAM region.
- Refine parse_rtct() function, validity of
ACPI RTCT table shall be checked by caller.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Physical address to SW SRAM region maybe different
on different platforms, this hardcoded address may
result in address mismatch for SW SRAM operations.
This patch removes above hardcoded address and uses
the physical address parsed from native RTCT.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'ptcm' and 'ptct' are legacy name according
to the latest TCC spec, hence rename below files
to avoid confusing:
ptcm.c -> rtcm.c
ptcm.h -> rtcm.h
ptct.h -> rtct.h
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Simplify multiboot API by removing the global variable efiloader_sig.
Replaced by constant at the use site.
Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Remove include/boot.h since it contains only assembly variables that
should only be accessed in arch/x86/init.c.
Tracked-On: #5661
Signed-off-by: Yi Liang <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Split off definition of "struct efi_info" into a separate header
file lib/efi.h.
Tracked-On: #5661
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
The init_multiboot_info() and sanitize_multiboot_ifno() APIs now
require parameters instead of implicitly relying on global boot
variables.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Calling sanitize_multiboot() from init.c instead of cpu.c.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This way, we void exposing acrn_mbi as a global variable.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Move multiboot specific declarations from boot.h to multiboot.h.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
When "signal_event" is called, "wait_event" will actually not block.
So it is ok to remove this line.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Now, we use hash table to maintain intx irq mapping by using
the key generated from sid. So once the entry is added,we can
not update source ide any more. Otherwise, we can't locate the
entry with the key generated from new source ide.
For source id change, remove_remapping/add_remapping is used
instead of update source id directly if entry was added already.
Tracked-On: #5640
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch move the split-lock logic into dedicated file
to reduce LOC. This may make the logic more clear.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
This patch adds a cache register for VMX_PROC_VM_EXEC_CONTROLS
to avoid the frequent VMCS access.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
The TF is visible to guest which may be modified by
the guest, so it is not a safe method to emulate the
split-lock. While MTF is specifically designed for
single-stepping in x86/Intel hardware virtualization
VT-x technology which is invisible to the guest. Use MTF
to single step the VCPU during the emulation of split lock.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
For a SMP guest, split-lock check may happen on
multiple vCPUs simultaneously. In this case, one
vCPU at most can be allowed running in the
split-lock emulation window. And if the vCPU is
doing the emulation, it should never be blocked
in the hypervisor, it should go back to the guest
to execute the lock instruction immediately and
trap back to the hypervisor with #DB to complete the
split-lock emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
We have trapped the #DB for split-lock emulation.
Only fault exception need RIP being retained.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
xchg may also cause the #AC for split-lock check.
This patch adds this emulation.
1. Kick other vcpus of the guest to stop execution
if the guest has more than one vcpu.
2. Emulate the xchg instruction.
3. Notify other vcpus (if any) to restart execution.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch adds the split-lock emulation.
If a #AC is caused by instruction with LOCK prefix then
emulate it, otherwise, inject it back as it used to be.
1. Kick other vcpus of the guest to stop execution
and set the TF flag to have #DB if the guest has more
than one vcpu.
2. Skip over the LOCK prefix and resume the current
vcpu back to guest for execution.
3. Notify other vcpus to restart exception at the end
of handling the #DB since we have completed
the LOCK prefix instruction emulation.
Tracked-On: #5605
Signed-off-by: Jie Deng <jie.deng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Check hardware support for all features in CR4,
and hide bits from guest by vcpuid if they're not supported
for guests OS.
Tracked-On: #5586
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
- The current code to virtualize CR0/CR4 is not
well designed, and hard to read.
This patch reshuffle the logic to make it clear
and classify those bits into PASSTHRU,
TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.
Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
While following two styles are both correct, the 2nd one is simpler.
bool is_level_triggered;
1. if (is_level_triggered == true) {...}
2. if (is_level_triggered) {...}
This patch cleans up the style in hypervisor.
Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
From SDM Vol.2C - XSETBV instruction description,
If CR4.OSXSAVE[bit 18] = 0,
execute "XSETBV" instruction will generate #UD exception.
From SDM Vol.3C 25.1.1,#UD exception has priority over VM exits,
So if vCPU execute "XSETBV" instruction when CR4.OSXSAVE[bit 18] = 0,
VM exits won't happen.
While hv inject #GP if vCPU execute "XSETBV" instruction
when CR4.OSXSAVE[bit 18] = 0.
It's a wrong behavior, this patch will fix the bug.
Tracked-On: #4020
Signed-off-by: Junming Liu <junming.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
It is possible for more than one vCPUs to trigger shutdown on an RTVM.
We need to avoid entering VM_READY_TO_POWEROFF state again after the
RTVM has been paused or shut down.
Also, make sure an RTVM enters VM_READY_TO_POWEROFF state before it can
be paused.
v1 -> v2:
- rename to poweroff_if_rt_vm for better clarity
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Currently, ACRN only support shutdown when triple fault happens, because ACRN
doesn't present/emulate a virtual HW, i.e. port IO, to support shutdown. This
patch emulate a virtual shutdown component, and the vACPI method for guest OS
to use.
Pre-launched VM uses ACPI reduced HW mode, intercept the virtual sleep control/status
registers for pre-launched VMs shutdown
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Like post-launched VMs, for pre-launched VMs, the ACPI reset register
is also fixed at 0xcf9 and the reset value is 0xE, so pre-launched VMs
now also use ACPI reset register for rebooting.
Tracked-On: #5411
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.
Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.
v1 -> v2:
- use vm_lock to avoid a race on shutdown
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add two Kconfig pSRAM config:
one for whether to enable the pSRAM on the platfrom or not;
another for if the pSRAM is enabled on the platform whether to enable
the pSRAM in the pre-launched RTVM.
If we enable the pSRAM on the platform, we should remove the pSRAM EPT
mapping from the SOS to prevent it could flush the pSRAM cache.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.Modified the virtual e820 table for pre-launched VM. We added a
segment for pSRAM, and thus lowmem RAM is split into two parts.
Logics are added to deal with the split.
2.Added EPT mapping of pSRAM segment for pre-launched RTVM if it
uses pSRAM.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO,
so we can't use the an exist API to do the EPT mapping and set the EPT cache
attribute to WB for it. Now we assume that SOS must assign the PSRAM area as
a whole and as a separate memory region whose base address is PSRAM_BASE_HPA.
If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this
EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB.
And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Use ept_flush_leaf_page to emulate guest WBINVD when PTCM is enabled and skip
the pSRAM in ept_flush_leaf_page.
TODO: do we need to emulate WBINVD in HV side.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Rename hv_access_memory_region_update to ppt_clear_user_bit to
verb + object style.
Tracked-On: #5330
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Temporarily remove NX bit of PTCM binary in pagetable during pSRAM
initialization:
1.added a function ppt_set_nx_bit to temporarily remove/restore the NX bit of
a given area in pagetable.
2.Temporarily remove NX bit of PTCM binary during pSRAM initialization to make
PTCM codes executable.
3. TODO: We may use SMP call to flush TLB and do pSRAM initilization on APs.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The added parse_ptct function will parse native ACPI PTCT table to
acquire information like pSRAM location/size/level and PTCM location,
and save them.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
1.We added a function init_psram to initialize pSRAM as well as some definitions.
Both AP and BSP shall call init_psram to make sure pSRAM is initialized, which is
required by PTCM.
BSP:
To parse PTCT and find the entry of PTCM command function, then call PTCM ABI.
AP:
Wait until BSP has done the parsing work, then call the PTCM ABI.
Synchronization of AP and BSP is ensured, both inside and outside PTCM.
2. Added calls of init_psram in init_pcpu_post to initialize pSRAM in HV booting phase
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
According 11.5.1 Cache Control Registers and Bits, Intel SDM Vol 3,
change CR0.CD will not flush cache to insure memory coherency. So
it's not needed to call wbinvd to flush cache in ACRN Hypervisor.
That's what the guest should do.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add cteate method for vmcs9900 vdev in hypercalls.
The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
support pci-vuart type, and refine:
1.Rename init_vuart() to init_legacy_vuarts(), only init PIO type.
2.Rename deinit_vuart() to deinit_legacy_vuarts(), only deinit PIO type.
3.Move io handler code out of setup_vuart(), into init_legacy_vuarts()
4.add init_pci_vuart(), deinit_pci_vuart, for one pci vuart vdev.
and some change from requirement:
1.Increase MAX_VUART_NUM_PER_VM to 8.
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
- Refactor pci_dev_c.py to insert devices information per VMs
- Add function to get unused vbdf form bus:dev.func 00:00.0 to 00:1F.7
Add pci devices variables to vm_configurations.c
- To pass the pci vuart information form tool, add pci_dev_num and
pci_devs initialization by tool
- Change CONFIG_SOS_VM in hypervisor/include/arch/x86/vm_config.h to
compromise vm_configurations.c
Tracked-On: #5426
Signed-off-by: Yang, Yu-chu <yu-chu.yang@intel.com>
- Since de-privilege boot is removed, we no longer need to save boot
context in boot time.
- cpu_primary_start_64 is not an entry for ACRN hypervisor any more,
and can be removed.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
This function can be used by other modules instead of hypercall
handling only, hence move it to vlapic.c
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot.
Thus the vboot wrapper layer can be removed and the direct boot functions
don't need to be wrapped in direct_boot.c:
- remove call to init_vboot(), and call e820_alloc_memory() directly at the
time when the trampoline buffer is actually needed.
- Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq().
- remove get_ap_trampoline_buf(), since the existing function
get_trampoline_start16_paddr() returns the exact same value.
- merge init_general_vm_boot_info() into init_vm_boot_info().
- remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly.
- move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the
wrapper over two boot modes.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
update the help message of config SCENARIO to set 2 standard
post-launched VMs for default hybrid_rt scenario in Kconfig.
Tracked-On: #5390
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Victor Sun <victor.sun@intel.com>
The commit of da81a0041d
"HV: add e820 ACPI entry for pre-launched VM" introduced a issue that the
base_hpa and remaining_hpa_size are also calculated on the entry of 32bit
PCI hole which from 0x80000000 to 0xffffffff, which is incorrect;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Per PCI Firmware Specification Revision 3.0, 4.1.2. MCFG Table Description:
Memory Mapped Enhanced Configuration Space Base Address Allocation Structure
assign the Start Bus Number and the End Bus Number which could decoded by the
Host Bridge. We should not access the PCI device which bus number outside of
the range of [Start Bus Number, End Bus Number).
For ACRN, we should:
1. Don't detect PCI device which bus number outside the range of
[Start Bus Number, End Bus Number) of MCFG ACPI Table.
2. Only trap the ECAM MMIO size: [MMCFG_BASE_ADDRESS, MMCFG_BASE_ADDRESS +
(End Bus Number - Start Bus Number + 1) * 0x100000) for SOS.
Tracked-On: #5233
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The old method of build pre-launched VM vacpi by HV source code is deprecated,
so remove related source code;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we use a pre-defined structure as vACPI table for pre-launched
VM, the structure is initialized by HV code. Now change the method to use a
pre-loaded multiboot module instead. The module file will be generated by
acrn-config tool and loaded to GPA 0x7ff00000, a hardcoded RSDP table at
GPA 0x000f2400 will point to the XSDT table which at GPA 0x7ff00080;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the ACPI table was stored in F segment which might not be big
enough for a customized ACPI table, hence reserve 1MB space in pre-launched
VM e820 table to store the ACPI related data:
0x7ff00000 ~ 0x7ffeffff : ACPI Reclaim memory
0x7fff0000 ~ 0x7fffffff : ACPI NVS memory
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the min load_addr for HV image is hard coded to 0x10000000 when
CONFIG_RELOC is enabled, now use CONFIG_HV_RAM_START as its prefer minimum
address like setting of CONFIG_PHYSICAL_START do in Linux kernel.
With this patch, we can offload the CONFIG_HV_RAM_START algorithm to
acrn-config or manually set it in scenario XML on some special boards.
Tracked-On: #5275
Signed-off-by: Victor Sun <victor.sun@intel.com>
When HV pass through the P2SB MMIO device to pre-launched VM, vgpio
device model traps MMIO access to the GPIO registers within P2SB so
that it can expose virtual IOAPIC pins to the VM in accordance with
the programmed mappings between gsi and vgsi.
Tracked-On: #5246
Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add the capability of forwarding specified physical IOAPIC interrupt
lines to pre-launched VMs as virtual IOAPIC interrupts. This is for the
sake of the certain MMIO pass-thru devices on EHL CRB which can support
only INTx interrupts.
Tracked-On: #5245
Signed-off-by: Toshiki Nishioka <toshiki.nishioka@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BDF string can be parsed by the configuration tool. A 16bit WORD value with
format (B:8, D:5, F:3) can be passed from configuration to the
hypervisor directly to save some BDF string parse code.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When trying to passthru a DHRD-ignored PCI device,
iommu_attach_device shall report success. Otherwise,
the assign_vdev_pt_iommu_domain will result in HV panic.
Same for iommu_detach_device case.
Tracked-On: #5240
Signed-off-by: Stanley Chang <stanley.chang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add HC_CREATE_VDEV and HC_DESTROY_VDEV two hypercalls that are used to
create and destroy an emulated device(PCI device or legacy device) in hypervisor
v3: 1) change HC_CREATE_DEVICE and HC_DESTROY_DEVICE to HC_CREATE_VDEV
and HC_DESTROY_VDEV
2) refine code style
v4: 1) remove unnecessary parameter
2) add VM state check for HC_CREATE_VDEV and HC_DESTROY hypercalls
Tracked-On: #4853
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1.Modify clos_mask and mba_delay as a member of the union type.
2.Move HV_SUPPORTED_MAX_CLOS ,MAX_CACHE_CLOS_NUM_ENTRIES and
MAX_MBA_CLOS_NUM_ENTRIES to misc_cfg.h file.
Tracked-On: #5229
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
HV_SUPPORTED_MAX_CLOS:
This value represents the maximum CLOS that is allowed by ACRN hypervisor.
This value is set to be least common Max CLOS (CPUID.(EAX=0x10,ECX=ResID):EDX[15:0])
among all supported RDT resources in the platform. In other words, it is
min(maximum CLOS of L2, L3 and MBA). This is done in order to have consistent
CLOS allocations between all the RDT resources.
Tracked-On: #5229
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
New board, EHL CRB, does not have legacy port IO UART. Even the PCI UART
are not work due to BIOS's bug workaround(the BARs on LPSS PCI are reset
after BIOS hand over control to OS). For ACRN console usage, expose the
debug UART via ACPI PnP device (access by MMIO) and add support in
hypervisor debug code.
Another special thing is that register width of UART of EHL CRB is
1byte. Introduce reg_width for each struct console_uart.
Tracked-On: #4937
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
-- use an array to fast locate the hypercall handler
to replace switch case.
-- uniform hypercall handler as below:
int32_t (*handler)(sos_vm, target_vm, param1, param2)
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Enhance the help text that accompanies the CONFIG_SCENARIO symbol in Kconfig
Tracked-On: #5203
Signed-off-by: Geoffroy Van Cutsem <geoffroy.vancutsem@intel.com>
The ivshmem memory regions use the memory of the hypervisor and
they are continuous and page aligned.
this patch is used to initialize each memory region hpa.
v2: 1) if CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED is not defined, the
entire code of ivshmem will not be compiled.
2) change ivshmem shared memory unit from byte to page to avoid
misconfiguration.
3) add ivshmem configuration and vm configuration references
v3: 1) change CONFIG_IVSHMEM_SHARED_MEMORY_ENABLED to CONFIG_IVSHMEM_ENABLED
2) remove the ivshmem configuration sample, offline tool provides default
ivshmem configuration.
3) refine code style.
v4: 1) make ivshmem_base 2M aligned.
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Fix the bug for "is_apl_platform" func.
"monitor_cap_buggy" is identical to "is_apl_platform", so remove it.
On apl platform:
1) ACRN doesn't use monitor/mwait instructions
2) ACRN disable GPU IOMMU
Tracked-On:#3675
Signed-off-by: Junming Liu <junming.liu@intel.com>
v3 -> v4:
Refine commit message and code stype
1.
SDM Vol. 2A 3-211 states DisplayFamily = Extended_Family_ID + Family_ID
when Family_ID == 0FH.
So it should be family += ((eax >> 20U) & 0xffU) when Family_ID == 0FH.
2.
IF (Family_ID = 06H or Family_ID = 0FH)
THEN DisplayModel = (Extended_Model_ID « 4) + Model_ID;
While previous code this logic:
IF (DisplayFamily = 06H or DisplayFamily = 0FH)
Fix the bug about calculation of display family and
display model according to SDM definition.
3. use variable name to distinguish Family ID/Display Family/Model ID/Display Model,
then the code is more clear to avoid some mistake
Tracked-On:#3675
Signed-off-by: liujunming <junming.liu@intel.com>
Reviewed-by: Wu Xiangyang <xiangyang.wu@linux.intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch will move the VM configuration check to pre-build stage,
a test program will do the check for pre-defined VM configuration
data before making hypervisor binary. If test failed, the make
process will be aborted. So once the hypervisor binary is built
successfully or start to run, it means the VM configuration has
been sanitized.
The patch did not add any new VM configuration check function,
it just port the original sanitize_vm_config() function from cpu.c
to static_checks.c with below change:
1. remove runtime rdt detection for clos check;
2. replace pr_err() from logmsg.h with printf() from stdio.h;
3. replace runtime call get_pcpu_nums() in ALL_CPUS_MASK macro
with static defined MAX_PCPU_NUM;
4. remove cpu_affinity check since pre-launched VM might share
pcpu with SOS VM;
The BOARD/SCENARIO parameter check and configuration folder check is
also moved to prebuild Makefile.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Remove function of sanitize_vm_config() since the processing of sanitizing
will be moved to pre-build process.
When hypervisor has booted, we assume all VM configurations is sanitized;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- move vm_state_lock to other place in vm structure
to avoid the memory waste because of the page-aligned.
-- remove the memset from create_vm
-- explicitly set max_emul_mmio_regions and vcpuid_entry_nr to 0
inside create_vm to avoid use without initialization.
-- rename max_emul_mmio_regions to nr_emul_mmio_regions
v1->v2:
add deinit_emul_io in shutdown_vm
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously the CPU affinity of SOS VM is initialized at runtime during
sanitize_vm_config() stage, follow the policy that all physical CPUs
except ocuppied by Pre-launched VMs are all belong to SOS_VM. Now change
the process that SOS CPU affinity should be initialized at build time
and has the assumption that its validity is guarenteed before runtime.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously we have complicated check mechanism on platform_acpi_info.h which
is supposed to be generated by acrn-config tool, but given the reality that
all configurations should be generated by acrn-config before build acrn
hypervisor, this check is not needed anymore.
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The SDC scenario configurations will not be validated so remove it from
build makefile;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The old layout configuration source which located in:
hypervisor/arch/x86/configs/ is abandoned, remove it;
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
The make command is same as old configs layout:
under acrn-hypervisor folder:
make hypervisor BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
under hypervisor folder:
make BOARD=xxx SCENARIO=xxx [TARGET_DIR]=xxx [RELEASE=x]
if BOARD/SCENARIO parameter is not specified, the default will be:
BOARD=nuc7i7dnb SCENARIO=industry
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There are 3 kinds of configurations in ACRN hypervisor source code: hypervisor
overall setting, per-board setting and scenario specific per-VM setting.
Currently Kconfig act as hypervisor overall setting and its souce is located at
"hypervisor/arch/x86/configs/$(BOARD).config"; Per-board configs are located at
"hypervisor/arch/x86/configs/$(BOARD)" folder; scenario specific per-VM configs
are located at "hypervisor/scenarios/$(SCENARIO)" folder.
This layout brings issues that board configs and VM configs are coupled tightly.
The board specific Kconfig file and misc_cfg.h are shared by all scenarios, and
scenario specific pci_dev.c is shared by all boards. So the user have no way to
build hypervisor binary for different scenario on different board with one
source code repo.
The patch will setup a new VM configurations layout as below:
misc/vm_configs
├── boards --> folder of supported boards
│ ├── <board_1> --> scenario-irrelevant board configs
│ │ ├── board.c --> C file of board configs
│ │ ├── board_info.h --> H file of board info
│ │ ├── pci_devices.h --> pBDF of PCI devices
│ │ └── platform_acpi_info.h --> native ACPI info
│ ├── <board_2>
│ ├── <board_3>
│ └── <board...>
└── scenarios --> folder of supported scenarios
├── <scenario_1> --> scenario specific VM configs
│ ├── <board_1> --> board specific VM configs for <scenario_1>
│ │ ├── <board_1>.config --> Kconfig for specific scenario on specific board
│ │ ├── misc_cfg.h --> H file of board specific VM configs
│ │ ├── pci_dev.c --> board specific VM pci devices list
│ │ └── vbar_base.h --> vBAR base info of VM PT pci devices
│ ├── <board_2>
│ ├── <board_3>
│ ├── <board...>
│ ├── vm_configurations.c --> C file of scenario specific VM configs
│ └── vm_configurations.h --> H file of scenario specific VM configs
├── <scenario_2>
├── <scenario_3>
└── <scenario...>
The new layout would decouple board configs and VM configs completely:
The boards folder stores kinds of supported boards info, each board folder
stores scenario-irrelevant board configs only, which could be totally got from
a physical platform and works for all scenarios;
The scenarios folder stores VM configs of kinds of working scenario. In each
scenario folder, besides the generic scenario specific VM configs, the board
specific VM configs would be put in a embedded board folder.
In new layout, all configs files will be removed out of hypervisor folder and
moved to a separate folder. This would make hypervisor LoC calculation more
precisely with below fomula:
typical LoC = Loc(hypervisor) + Loc(one vm_configs)
which
Loc(one vm_configs) = Loc(misc/vm_configs/boards/<board>)
+ LoC(misc/vm_configs/scenarios/<scenario>/<board>)
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.c
+ Loc(misc/vm_configs/scenarios/<scenario>/vm_configurations.h
Tracked-On: #5077
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
To hide CET feature from guest VM completely, the MSR IA32_MSR_XSS also
need to be intercepted because it comprises CET_U and CET_S feature bits
of xsave/xstors operations. Mask these two bits in IA32_MSR_XSS writing.
With IA32_MSR_XSS interception, member 'xss' of 'struct ext_context' can
be removed because it is duplicated with the MSR store array
'vcpu->arch.guest_msrs[]'.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Return-oriented programming (ROP), and similarly CALL/JMP-oriented
programming (COP/JOP), have been the prevalent attack methodologies for
stealth exploit writers targeting vulnerabilities in programs.
CET (Control-flow Enforcement Technology) provides the following
capabilities to defend against ROP/COP/JOP style control-flow subversion
attacks:
* Shadow stack: Return address protection to defend against ROP.
* Indirect branch tracking: Free branch protection to defend against
COP/JOP
The full support of CET for Linux kernel has not been merged yet. As the
first stage, hide CET from guest VM.
Tracked-On: #5074
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
On WHL platform, we need to pass through TPM to Secure pre-launched VM. In order
to do this, we need to add TPM2 ACPI Table and add TPM DSDT ACPI table to include
the _CRS.
Now we only support the TPM 2.0 device (TPM 1.2 device is not support). Besides,
the TPM must use Start Method 7 (Uses the Command Response Buffer Interface)
to notify the TPM 2.0 device that a command is available for processing.
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Using ACPI_TABLE_HEADER MACRO to initial the ACPI Table Header.
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Add mmio device pass through support for pre-launched VM.
When we pass through a MMIO device to pre-launched VM, we would remove its
resource from the SOS. Now these resources only include the MMIO regions.
Tracked-On: #5053
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
During context switch in hypervisor, xsave/xrstore are used to
save/resotre the XSAVE area according to the XCR0 and XSS. The legacy
region in XSAVE area include FPU and SSE, we should make sure the
legacy region be saved during contex switch. FPU in XCR0 is always
enabled according to SDM.
For SSE, we enable it in XCR0 during context switch.
Tracked-On: #5062
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change
v1-->v2:
change get_vm_lock/put_vm_lock parameter from vm_id to vm
move lock obtain before vm state check
move all lock from vmcall.c to hypercall.c
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There are some devices (like Samsung NVMe SSD SM981/PM981 which has 33 MSIX tables)
which have more than 16 MSIX tables. Extend the default value to 64 to handle them.
Tracked-On: #4994
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Some OSes assume the platform must have the IOAPIC. For example:
Linux Kernel allocates IRQ force from GSI (0 if there's no PIC and IOAPIC) on x86.
And it thinks IRQ 0 is an architecture special IRQ, not for device driver. As a
result, the device driver may goes wrong if the allocated IRQ is 0 for RTVM.
This patch expose vIOAPIC to RTVM with LAPIC passthru even though the RTVM can't
use IOAPIC, it servers as a place holder to fullfil the guest assumption.
After vIOAPIC has exposed to guest unconditionally, the 'ready' field could be
removed since we do vIOAPIC initialization for each guest.
Tracked-On: #4691
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
will follow this convention for spin lock initialization:
-- for simple global variable locks, use this style:
static spinlock_t xxx_spinlock = {.head = 0U, .tail = 0U,}
-- for the locks inside a data structure, need to call
spinlock_init to initialize.
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
According to SDM 10.12.11, we can know this register is dedicated to the
purpose of sending self-IPIs with the intent of enabling a highly
optimized path for sending self-IPIs. Also sending the IPI via the Self
Interrupt Register ensures that interrupt is delivered to the processor
core. Specifically completion of the WRMSR instruction to the SELF IPI
register implies that the interrupt has been logged into the IRR.
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently, not all platforms support posted interrupt processing of both
VT-x and VT-d. On EHL, VT-d doesn't support posted interrupt processing.
So in such scenario, is_pi_capable() in vcpu_handle_pi_notification()
will bypass the PIR pending bits check which might cause a self-NV-IPI
lost.
With commit "bf1ff8c98 (hv: Offload syncing PIR to vIRR to processor
hardware)", the syncing PIR to vIRR is postponed and it is handled by a
self-NV-IPI in the following VMEnter. The process looks like,
a) vcpu A accepts a virtual interrupt ->
1) ACRN_REQUEST_EVENT is set
2) corresponding bit in PIR is set
3) Posted Interrupt ON bit is set
b) vcpu A does virtual interrupt injection on resume path due to
the pending ACRN_REQUEST_EVENT ->
1) hypervisor disables host interrupt
2) ACRN_REQUEST_EVENT is cleared
3) a self-NV-IPI is sent via ICR of LAPIC.
4) IRR bit of the self-NV-IPI is set
c) (VM-ENTRY) vcpu A returns into non-root mode
1) host interrupt enable(by HW)
2) posted interrupt processing clears the ON bit, sync PIR to vIRR
3) deliver the virtual interrupt if guest rflags.IF=1
d) (VM-EXIT) vcpu A traps due to a instruction execution (e.g. HLT)
1) host interrupt disable(by HW)
2) hypervisor enable host interrupt
Above illustrates a normal process of the virtual interrupt injection
with cpu PI support. However, a failing case is observed. The failing
case is that the self-NV-IPI from b-3 is not accepted by the core until
a timing between d-1 and d-2. b-4 happening between d-1 and d-2 is
observed by debug trace. So the self-NV-IPI will be handled in root-mode
which cannot do the syncing PIR to vIRR processing. Due to the bug
described in the first paragraph, vcpu_handle_pi_notification() cannot
succeed the virtual interrupt injection request. This patch fix it by
removing the wrong check in vcpu_handle_pi_notification() because
vcpu_handle_pi_notification() only happens on platform with cpu PI
support.
Here are some cost data for sending IPI via LAPIC ICR regsiter.
Normally, the cycles between ICR write and IRR got set is 140~260,
which is not accurate due to the MSR read overhead.
And from b-3 to c is about 560 cycles. So b-4 happens during this
period. But in bad case, b-4 doesn't happen even c is triggered.
The worse case i captured is that ICR write and IRR got set costs more
than 1900 cycles. Now, the best GUESS of the huge cost of IPI via ICR is
the ACPI bus arbitration(refer to SDM 10.6.3, 10.7 and Figure 10-17).
Tracked-On: #4937
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Wrap a function to do guest ept flush. This function doesn't do real EPT flush.
It just make the EPT flush request and do the real flush just before vcpu vmenter.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
remove spin lock for micro code update since the guest
operating system will take lock
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
The commit 'HV: Config Splitlock Detection to be disable' allows
using CONFIG_ENFORCE_TURNOFF_AC to turn off splitlock #AC. If
CONFIG_ENFORCE_TURNOFF_AC is not set, splitlock #AC should be turn on
Tracked-On: #4962
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Check bit 48 in IA32_VMX_BASIC MSR, if it is 1, return error, as we only
support Intel 64 architecture.
SDM:
Appendix A.1 BASIC VMX INFORMATION
Bit 48 indicates the width of the physical addresses that may be used for the
VMXON region, each VMCS, anddata structures referenced by pointers in a
VMCS (I/O bitmaps, virtual-APIC page, MSR areas for VMX transitions). If
the bit is 0, these addresses are limited to the processor’s
physical-address width.2 If the bit is 1, these addresses are limited to
32 bits. This bit is always 0 for processors that support Intel 64
architecture.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
We always assume the physical platform has XSAVE, and we always enable
XSAVE at the beginning, so, no need to check the OXSAVE in host.
Tracked-On: #4956
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
As build variants for different board and different scenario growing, users
might make mistake on HV binary distributions. Checking board/scenario info
from log would be the fastest way to know whether the binary matches. Also
it would be of benifit to developers for confirming the correct binary they
are debugging.
Tracked-On: #4946
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
dmar_reserve_irte is added to reserve N coutinuous IRTEs.
N could be 1, 2, 4, 8, 16, or 32.
The reserved IRTEs will not be freed.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
idx_in:
- If the caller of dmar_assign_irte passes a valid IRTE index, it will
be resued;
- If the caller of dmar_assign_irte passes INVALID_IRTE_ID as IRTE index,
the function will allocate a new IRTE.
idx_out:
This paramter return the actual index of IRTE used. The caller need to
check whether the return value is valid or not.
Also this patch adds an internal function alloc_irte.
The function takes count as input paramter to allocate continuous IRTEs.
The count can only be 1, 2, 4, 8, 16 or 32.
This is prepared for multiple MSI vector support.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Script only append 'U' for the config of int with a range.
Add a range to MAX_IR_ENTRIES.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
There're some platforms still doesn't support 1GB large page on CPU side.
Such as lakefield, TNT and EHL platforms on which have some silicon bug and
this case CPU don't support 1GB large page.
This patch tries to release this constrain to support more hardware platform.
Note this patch doesn't release the constrain on IOMMU side.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This patch tries to release hardware platform 1GB large page support constrain
on CPU side.
There're some silicon bug on lakefield, TNT and EHL platforms which cause CPU
couldn't support 1GB large page. As a result, the pre-assumption The platform
which ACRN supports must support 1GB large page on both CPU side and VTD side
is not true any more.
This reverts commit f01aad7e to let trampoline execution use 2MB pages.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The information needed to enable MSI-x emulation.
Only enable MSI-x emuation for the devices in msix_emul_devs array.
Currently, only EHL has the need to enable MSI-x emulation for TSN
devices.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously append_seed_arg() just do fill in seed arg to dest cmd buffer,
so rename the api name to fill_seed_arg().
Since fill_seed_arg() will be called in SOS VM path only, the param of
bool vm_is_sos is not needed and will be replaced by dest buffer size.
The seed_args[] which used by fill_seed_arg() is pre-defined as all-zero,
so memset() is not needed in fill_seed_arg(), buffer pointer check
and strncpy_s() are not needed also.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Per C11 standard (ISO/IEC 9899:2011): K.3.7.1.1
1. Copying shall not take place between objects that overlap;
2. If there is a runtime-constraint violation, the memcpy_s function stores
zeros in the first s1max characters of the object;
3. The memcpy_s function returns zero if there was no runtime-constraint
violation. Otherwise, a nonzero value is returned.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Previously sanitize_multiboot_info() was called after init_debug_pre() because
the debug message can only print after uart is initialized. On the other hand,
multiboot cmdline need to be parsed before init_debug_pre() because the cmdline
could override uart settings and make sure debug message printed successfully.
This cause multiboot info was parsed in two stages.
The patch revise the multiboot parse logic that split sanitize_multiboot_info()
api and use init_acrn_multiboot_info() api for the early stage. The most of
multiboot info will be initialized during this stage and no debug message need
to be printed. After uart is initialized, the sanitize_multiboot_info() would
do sanitize multiboot info and print needed debug messages.
Tracked-On: #4885
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Update efi bootloader image file path for Yocto rootfs in Kconfig.
Tracked-On: #4868
Signed-off-by: Wei Liu <weix.w.liu@intel.com>
Reviewed-by: Victor Sun <victor.sun@intel.com>
Now Host Bridge and PCI Bridge could only be added to SOS's acrn_vm_pci_dev_config.
So For UOS, we always emualte Host Bridge and PCI Bridge for it and assign PCI device
to it; for SOS, if it's the highest severity VM, we will assign Host Bridge and PCI
Bridge to it directly, otherwise, we will emulate them same as UOS.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According PCI Code and ID Assignment Specification Revision 1.11, a PCI device
whose Base Class is 06h and Sub-Class is 00h is a Host bridge.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We should check whether a PCI device is host bridge or not by Base Class (06h)
and Sub-Class (00h).
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As per the BWG a delay should be provided between the
INIT IPI and Startup IPI. Without the delay observe hangs
on certain platforms during MP Init sequence. So Setting
a delay of 10us between assert INIT IPI and Startup IPI.
Also, as per SDM section 10.7 the the de-assert INIT IPI is
only used for Pentium and P6 processors. This is not applicable
for Pentium4 and Xeon processors so removing this sequence.
Tracked-On: #4835
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
in shutdown_vm, it uses guest flags when handling the phyiscal
CPUs whose LAPIC is pass-through. So if it is cleared first,
the related vCPUs and pCPUs can not be switched to correct state.
so move the clear action after the flags used.
Tracked-On: #4848
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
1. context_entry couldn't be NULL in iommu_attach_device since bus
number is checked before the call.
2. root_entry couldn't be NULL in iommu_detach_device since bus number
is checked before the call.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add a function dmar_unit_valid to check if the input dmar uint is valid
or not.
A valid dmar_unit should not be NULL, or ignore flag should not be set.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Snoop control will not be turned on by hypervisor, delete snoop control
related code.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Initialize root_table_addr/ir_table_addr of dmar uint when register the dmar uint.
So no need to check if they are initialzed or not later.
Tracked-On: #4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Invalidate cache by scanning and flushing the whole guest memory is
inefficient which might cause long execution time for WBINVD emulation.
A long execution in hypervisor might cause a vCPU stuck phenomenon what
impact Windows Guest booting.
This patch introduce a workaround method that pausing all other vCPUs in
the same VM when do wbinvd emulation.
Tracked-On: #4703
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
RTVM (with lapic PT) boots hang when maxcpus is
assigned a value less than the CPU number configured
in hypervisor.
In this case, vlapic_state(per VM) is left in TRANSITION
state after BSP boot, which blocks interupts to be injected
to this UOS.
Tracked-On: #4803
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
There's no need to look up MSI ptirq entry by virtual SID any more since the MSI
ptirq entry would be removed before the device is assigned to a VM.
Now the logic of MSI interrupt remap could simplify as:
1. Add the MSI interrupt remap first;
2. If step is already done, just do the remap part.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
Reviewed-by: Grandhi, Sainath <sainath.grandhi@intel.com>
If HV relocation is enabled, either ACRN efi-stub or GRUB relocates
hypervisor image above HPA 256MB, thus we put hvlog and ramoops buffer
under 256MB to avoid conflict with hypervisor owned address.
This patch hardcodes these addresses:
0xa00000 - 0xdfffff: 4MiB for ramoops buffer
0xe00000 - 0xffffff: 2MiB for hvlog buffer
However, user can customize them to other addresses as long as it's under
256MB, available in host e820, and SOS bootarg "nokaslr" is not specified.
If HV relocation is disabled, need to make sure that these buffer
addresses are not between HV_RAM_START and HV_RAM_START + HV_RAM_SIZE.
Tracked-On: #4760
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
For post-launched VMs, the configured CPU affinity could be different
from the actual running CPU affinity. This new field acrn_vm->cpu_affinity
recognizes this difference so that it's possible that CREATE_VM
hypercall won't overwrite the configured CPU afifnity.
Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity.
This is read-only in run time, never overwritten by acrn-dm.
Remove vm_config->vcpu_num, which means the number of vCPUs of the
configured CPU affinity. This is not to be confused with the actual
running vCPU number: vm->hw.created_vcpus.
Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less
confusion.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
ACRN syncs PIR to vIRR in the software in cases the Posted
Interrupt notification happens while the pCPU is in root mode.
Sync can be achieved by processor hardware by sending a
posted interrupt notiification vector.
This patch sends a self-IPI, if there are interrupts pending in PIR,
which is serviced by the logical processor at the next
VMEnter
Tracked-On: #4777
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
CDP is an extension of CAT. It enables isolation and separate prioritization of
code and data fetches to the L2 or L3 cache in a software configurable manner,
depending on hardware support.
This commit adds a Kconfig switch "CDP_ENABLED" which depends on "RDT_ENABLED".
CDP will be enabled if the capability available and "CDP_ENABLED" is selected.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This commit makes some RDT code cleanup, mainling including:
- remove the clos_mask and mba_delay validation check in setup_res_clos_msr(), the check will be done in pre-build;
- rename platform_clos_num to valid_clos_num, which is set as the minimal clos_mas of all enabled RDT resouces;
- init the platform_clos_array in the res_cap_info[] definition;
- remove the unnecessary return values and return value check.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
A RDT resource could be CAT or MBA, so only one of struct rdt_cache and struct rdt_membw
would be used at a time. They should be a union.
This commit merge struct rdt_cache and struct rdt_membw in to a union res.
Tracked-On: #4604
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com
#AC should be normally enabled for slpitlock detection, however,
community developers may want to run ACRN on buggy system.
In this case, CONFIG_ENFORCE_TURNOFF_AC can be used to turn off the
#AC, to let the guest run without #AC.
Tracked-On: #4765
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The virtual MSI information could be included in ptirq_remapping_info structrue,
there's no need to pass another input paramater for this puepose. So we could
remove the ptirq_msi_info input.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We look up PTIRQ entru only by SID. So _by_sid could removed.
And refine function name to verb-obj style.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
For return value of local_gpa2hpa, either INVALID_HPA or NULL
means the EPT walking failure. Current code only take care of
NULL return and leave INVALID_HPA as correct case.
In some cases (if guest page table is filled with invalid memory
address), it could crash ACRN from guest.
Add INVALID_HPA return check as well.
Also add @pre assumptions for some gpa2hpa usages.
Tracked-On: #4730
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
When boot ACRN hypervisor from grub multiboot, HV will be loaded at
CONFIG_HV_RAM_START since relocation is not supported in grub
multiboot1. The CONFIG_HV_RAM_SIZE in industry scenario will take
~330MB(0x14000000), unfortunately the efi memmap on NUC7i7DNB is
truncated at 0x6dba2000 although it is still usable from 0x6dba2000. So
from grub point of view, it could not find a continuous memory from
0x6000000 to load industry scenario. Per efi memmap, there is a big
memory area available from 0x40400000, so put CONFIG_HV_RAM_START to
0x41000000 is much safe for NUC7i7DNB.
Tracked-On: #4641
Signed-off-by: Victor Sun <victor.sun@intel.com>
The original stack used in CPU booting is: ld_bss_end + 4KB;
which could be out of the RAM size limit defined in link_ram file.
So add a specific stack space in link_ram file, and used in
CPU booting.
Tracked-On: #4738
Signed-off-by: Minggui Cao <minggui.cao@intel.com>
Reviewed by: Yin Fengwei <fengwei.yin@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
According to SDM Vol 3, Chap 10.4.7.2 Local APIC State After It Has Been Software Disabled,
The mask bits for all the LVT entries are set when the local APIC has been software disabled.
So there's no need to mask all the LVT entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Currently the vcpu_affinity[] array fixes the vCPU to pCPU mapping.
While the new cpu_affinity_bitmap doesn't explicitly sepcify this
mapping, instead, it implicitly assumes that vCPU0 maps to the pCPU
with lowest pCPU ID, vCPU1 maps to the second lowest pCPU ID, and
so on.
This makes it possible for post-launched VM to run vCPUs on a subset of
these pCPUs only, and not all of them.
acrn-dm may launch post-launched VMs with the current approach: indicate
VM UUID and hypervisor launches all VCPUs from the PCPUs that are masked
in cpu_affinity_bitmap.
Also acrn-dm can choose to launch the VM on a subset of PCPUs that is
defined in cpu_affinity_bitmap. In this way, acrn-dm must specify the
subset of PCPUs in the CREATE_VM hypercall.
Additionally, with this change, a guest's vcpu_num can be easily calculated
from cpu_affinity_bitmap, so don't assign vcpu_num in vm_configuration.c.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add FADT table support to support guest S5 setting.
According to ACPI 6.3 Spec, OSPM must ignored the DSDT and FACS fields if them're zero.
However, Linux kernel seems not to abide by the protocol, it will check DSDT still.
So add an empty DSDT to meet it.
Tracked-On: #4623
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Remove sdc2 scenario since the VM launch requirement under this scenario
could be satisfied by industry scenario now;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
In industry scenario, hypervisor will support 1 post-launched RT VM
and 1 post-launched kata VM and up to 5 post-launched standard VMs;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
The patch enables CPU sharing feature by default, the default scheduler is
set to SCHED_BVT;
Tracked-On: #4661
Signed-off-by: Victor Sun <victor.sun@intel.com>
The current logic puts hpa2 above GPA 4G always, which is incorrect. Need
to set gpa start of hpa2 right after hpa1 when hpa1 size is less then 2G;
Tracked-On: #4458
Signed-off-by: Victor Sun <victor.sun@intel.com>
On most board the MCFG base is set to 0xe0000000, so modify this value in
platform_acpi_info.h for generic boards;
The description of ACPI_PARSE_ENABLED is modified also to match its usage.
Tracked-On: #4157
Signed-off-by: Victor Sun <victor.sun@intel.com>
CONFIG_MAX_KATA_VM_NUM is a scenario specific configuration, so it is better
to put the MACRO in scenario folder directly, to instead the Kconfig item in
Kconfig file which should work for all scenarios;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Basicly ACRN scenario is a configuration name for specific usage. By giving
scenario name ACRN will load corresponding VM configurations to build the
hypervisor. But customer might have their own scenario name, change the
scenario type from choice to string is friendly to them since Kconfig source
file change will not be needed.
With this change, CONFIG_$(SCENARIO) will not exist in kconfig file and will
be instead of CONFIG_SCENARIO, so the Makefile need to be changed accordingly;
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Currently the vm uuid and severity is initilized separately in
vm_config struct, developer need to take care both items carefuly
otherwise hypervisor would have trouble with the configurations.
Given the vm loader_order/uuid and severity are binded tightly, the
patch merged these tree settings in one macro so that developer will
have a simple interface to configure in vm_config struct.
Tracked-On: #4616
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU has MSR_TEST_CTL, show an emulaued one to VCPU
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
If CPU support rise #AC for Splitlock Access, then enable this
feature at each CPU.
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
When the destination of an atomic memory operation located in 2
cache lines, it is called a Splitlock Access. LOCK# bus signal is
asserted for splitlock access which may lead to long latency. #AC
for Splitlock Access is a CPU feature, it allows rise alignment
check exception #AC(0) instead of asserting LOCK#, that is helpful
to detect Splitlock Access.
This feature is enumerated by MSR(0xcf) IA32_CORE_CAPABILITIES[bit5]
Add helper function:
bool has_core_cap(uint32_t bitmask)
Tracked-On: #4496
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Yan, Like <like.yan@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
remove unnecessary state check and
add pre-condition for vcpu APIs.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
check the vm state in hypercall api,
add pre-condition for vm api.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
now it will call pause_vm in shutdown_vm,
move it out from shutdown_vm to reduce coupling.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch wrapps a common function to initialize physical
CPU for the second phase to reduce redundant code.
Tracked-On: #861
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
ACRN disables Snoop Control in VT-d DMAR engines for simplifing the
implementation. Also, since the snoop behavior of PCIE transactions
can be controlled by guest drivers, some devices may take the advantage
of the NO_SNOOP_ATTRIBUTE of PCIE transactions for better performance
when snoop is not needed. No matter ACRN enables or disables Snoop
Control, the DMA operations of passthrough devices behave correctly
from guests' point of view.
This patch is used to clean all the snoop related code.
Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Due to the fact that i915 iommu doesn't support snoop, hence it can't
access memory when the SNOOP bit of Secondary Level page PTE (SL-PTE)
is set, this will cause many undefined issues such as invisible cursor
in WaaG etc.
Current hv design uses EPT as Scondary Leval Page for iommu, and this
patch removes the codes of setting SNOOP bit in both EPT-PTE and SL-PTE
to avoid errors.
And according to SDM 28.2.2, the SNOOP bit (11th bit) will be ignored
by EPT, so it will not affect the CPU address translation.
Tracked-On: #4509
Signed-off-by: Xiaoguang Wu <xiaoguang.wu@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Waag will send NMIs to all its cores during reboot. But currently,
NMI cannot be injected to vcpu which is in HLT state.
To fix the problem, need to wakeup target vcpu, and inject NMI through
interrupt-window.
Tracked-On: #4620
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Align the implementation to SDM Vol.3 4.1.1.
Also this patch fixed a bug that doesn't check paging status first
in some cpu mode.
Tracked-On: #4628
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Currently vlapic_build_id() uses vcpu_id to retrieve the lapic_id
per_cpu variable:
vlapic_id = per_cpu(lapic_id, vcpu->vcpu_id);
SOS vcpu_id may not equal to pcpu_id, and in that case it runs into
problems. For example, if any pre-launched VMs are launched on PCPUs
whose IDs are smaller than any PCPU IDs that are used by SOS.
This patch fixes the issue and simplify the code to create or get
vapic_id by:
- assign vapic_id in create_vlapic(), which now takes pcpu_id as input
argument, and save it in the new field: vlapic->vapic_id, which will
never be changed.
- simplify vlapic_get_apicid() by returning te saved vapid_id directly.
- remove vlapic_build_id().
- vlapic_init() is only called once, merge it into vlapic_create().
Tracked-On: #4268
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Maintain a per-pCPU array of vCPUs (struct acrn_vcpu *vcpu_array[CONFIG_MAX_VM_NUM]),
one VM cannot have multiple vCPUs share one pcpu, so we can utilize this property
and use the containing VM's vm_id as the index to the vCPU array:
In create_vcpu(), we simply do:
per_cpu(vcpu_array, pcpu_id)[vm->vm_id] = vcpu;
In offline_vcpu():
per_cpu(vcpu_array, pcpuid_from_vcpu(vcpu))[vcpu->vm->vm_id] = NULL;
so basically we use the containing VM's vm_id as the index to the vCPU array,
as well as the index of posted interrupt IRQ/vector pair that are assigned
to this vCPU:
0: first vCPU and first posted interrupt IRQs/vector pair
(POSTED_INTR_IRQ/POSTED_INTR_VECTOR)
...
CONFIG_MAX_VM_NUM-1: last vCPU and last posted interrupt IRQs/vector pair
((POSTED_INTR_IRQ + CONFIG_MAX_VM_NUM - 1U)/(POSTED_INTR_VECTOR + CONFIG_MAX_VM_NUM - 1U)
In the posted interrupt handler, it will do the following:
Translate the IRQ into a zero based index of where the vCPU
is located in the vCPU list for current pCPU. Once the
vCPU is found, we wake up the waiting thread and record
this request as ACRN_REQUEST_EVENT
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
This is a preparation patch for adding support for VT-d PI
related vCPU scheduling.
ACRN does not support vCPU migration, one vCPU always runs on
the same pCPU, so PI's ndst is never changed after startup.
VCPUs of a VM won’t share same pCPU. So the maximum possible number
of VCPUs that can run on a pCPU is CONFIG_MAX_VM_NUM.
Allocate unique Activation Notification Vectors (ANV) for each vCPU
that belongs to the same pCPU, the ANVs need only be unique within each
pCPU, not across all vCPUs. This reduces # of pre-allocated ANVs for
posted interrupts to CONFIG_MAX_VM_NUM, and enables ACRN to avoid
switching between active and wake-up vector values in the posted
interrupt descriptor on vCPU scheduling state changes.
A total of CONFIG_MAX_VM_NUM consecutive IRQs/vectors are reserved
for posted interrupts use.
The code first initializes vcpu->arch.pid.control.bits.nv dynamically
(will be added in subsequent patch), the other code shall use
vcpu->arch.pid.control.bits.nv instead of the hard-coded notification vectors.
Rename some functions:
apicv_post_intr --> apicv_trigger_pi_anv
posted_intr_notification --> handle_pi_notification
setup_posted_intr_notification --> setup_pi_notification
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Fill in posted interrupt fields (vector, pda, etc) and set mode to 1 to
enable VT-d PI (posted mode) for this ptdev.
If intr_src->pi_vcpu is 0, fall back to use the remapped mode.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Given the vcpumask, check if the IRQ is single destination
and return the destination vCPU if so, the address of associated PI
descriptor for this vCPU can then be passed to dmar_assign_irte() to
set up the posted interrupt IRTE for this device.
For fixed mode interrupt delivery, all vCPUs listed in vcpumask should
service the interrupt requested. But VT-d PI cannot support multicast/broadcast
IRQs, it only supports single CPU destination. So the number of vCPUs
shall be 1 in order to handle IRQ in posted mode for this device.
Add pid_paddr to struct intr_source. If platform_caps.pi is true and
the IRQ is single-destination, pass the physical address of the destination
vCPU's PID to ptirq_build_physical_msi and dmar_assign_irte
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Add platform_caps.c to maintain platform related information
Set platform_caps.pi to true if all iommus are posted interrupt capable, false
otherwise
If lapic passthru is not configured and platform_caps.pi is true, the vm
may be able to use posted interrupt for a ptdev, if the ptdev's IRQ is
single-destination
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
PCI devices with 64-bit MMIO BARs and requiring large MMIO space
can be assigned with physical address range at the very high end of
platform supported physical address space.
This patch uses the board info for 64-bit MMIO window as programmed
by BIOS and constructs 1G page tables for the same.
As ACRN uses identity mapping from Linear to Physical address space
physical addresses upto 48 bit or 256TB can be supported.
Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add 64-bit MMIO window related MACROs to the supported board files
in the hypervisor source code.
Tracked-On: #4586
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
EPT table can be changed concurrently by more than one vcpus.
This patch add a lock to protect the add/modify/delete operations
from different vcpus concurrently.
Tracked-On: #4253
Signed-off-by: Jian Jun Chen <jian.jun.chen@intel.com>
Reviewed-by: Li, Fei1 <fei1.li@intel.com>
The VMCS field is an embedded array for a vCPU. So there's no need to check for
NULL before use.
Tracked-On: #3813
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Conceptually, the devices unregistration sequence of the shutdown process should be
opposite to create.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This commit allows hypervisor to allocate cache to vcpu by assigning different clos
to vcpus of a same VM.
For example, we could allocate different cache to housekeeping core and real-time core
of an RTVM in order to isolate the interference of housekeeping core via cache hierarchy.
Tracked-On: #4566
Signed-off-by: Yan, Like <like.yan@intel.com>
Reviewed-by: Chen, Zide <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In dmar_issue_qi_request, currently use a global var qi_status, which could
cause potential issue when concurrent call to dmar_issue_qi_request for different
DMAR units.
Use local var instead.
Tracked-On: #4535
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
As ACRN prepares to support servers with large amounts of memory
current logic to allocate space for 4K pages of EPT at compile time
will increase the size of .bss section of ACRN binary.
Bootloaders could run into a situation where they cannot
find enough contiguous space to load ACRN binary under 4GB,
which is typically heavily fragmented with E820 types Reserved,
ACPI data, 32-bit PCI hole etc.
This patch does the following
1) Works only for "direct" mode of vboot
2) reserves space for 4K pages of EPT, after boot by parsing
platform E820 table, for all types of VMs.
Size comparison:
w/o patch
Size of DRAM Size of .bss
48 GB 0xe1bbc98 (~226 MB)
128 GB 0x222abc98 (~548 MB)
w/ patch
Size of DRAM Size of .bss
48 GB 0x1991c98 (~26 MB)
128 GB 0x1a81c98 (~28 MB)
Tracked-On: #4563
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: renamed some static functions from dmar_verb to verb_dmar
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: corrected the return type of get_qi_queue and get_ir_table to
void *
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
hv: vtd: removed is_host (always false) and is_tt_ept (always true) member
variables of struct iommu_domain and related codes since the values are
always determined.
Tracked-On: #4535
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We could use container_of to get vcpu structure pointer from vmtrr. So vcpu
structure pointer is no need in vmtrr structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
We could use container_of to get vcpu/vm structure pointer from vlapic. So vcpu/vm
structure pointer is no need in vlapic structure.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
This function casts a member of a structure out to the containing structure.
So rename to container_of is more readable.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Exend union dmar_ir_entry to support VT-d posted interrupts.
Rename some fields of union dmar_ir_entry:
entry --> value
sw_bits --> avail
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Pass intr_src and dmar_ir_entry irte as pointers to dmar_assign_irte(),
which fixes the "Attempt to change parameter passed by value" MISRA C violation.
A few coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
For CPU side posted interrupts, it only uses bit 0 (ON) of the PI's 64-bit control
, other bits are don't care. This is not the case for VT-d posted
interrupts, define more bit fields for the PI's 64-bit control.
Use bitmap functions to manipulate the bit fields atomically.
Some MISRA-C violation and coding style fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The posted interrupt descriptor is more of a vmx/vmcs concept than a vlapic
concept. struct acrn_vcpu_arch stores the vmx/vmcs info, so put struct pi_desc
in struct acrn_vcpu_arch.
Remove the function apicv_get_pir_desc_paddr()
A few coding style/typo fixes
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
Rename struct vlapic_pir_desc to pi_desc
Rename struct member and local variable pir_desc to pid
pir=posted interrupt request, pi=posted interrupt
pid=posted interrupt descriptor
pir is part of pi descriptor, so it is better to use pi instead of pir
struct pi_desc will be moved to vmx.h in subsequent commit.
Tracked-On: #4506
Signed-off-by: dongshen <dongsheng.x.zhang@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@Intel.com>
The cupid() can be replaced with cupid_subleaf, which is more clear.
Having both APIs makes reading difficult.
Tracked-On: #4526
Signed-off-by: Li Fei1 <fei1.li@intel.com>
To support server platforms with more than 8 IO-APICs
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>