Create virtual root port through add_vdev hypercall. add_vdev
identifies the virtual device to add by its vendor id and device id, then
call the corresponding function to create virtual device.
-create_vrp(): Find the right virtual root port to create
by its secondary bus number, then initialize the virtual root port.
And finally initialize PTM related configurations.
-destroy_vrp(): nothing to destroy
Tracked-On: #5915
Signed-off-by: Rong Liu <rong.l.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Jason Chen <jason.cj.chen@intel.com>
Acked-by: Yu Wang <yu1.wang@intel.com>
1. do not allow external modules to touch internal field of a timer.
2. make timer mode internal, period_in_ticks will decide the mode.
API wise:
1. the "mode" parameter was taken out of initialize_timer().
2. a new function update_timer() was added to update the timeout and
period fields.
3. the timer_expired() function was extended with an output parameter
to return the remaining cycles before expiration.
Also, the "fire_tsc" field name of hv_timer was renamed to "timeout".
With the new API, however, this change should not concern user code.
Tracked-On: #5920
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
x86/timer.[ch] was moved to the common directory largely unchanged.
x86 specific code now resides in x86/tsc_deadline_timer.c and its
interface was defined in hw/hw_timer.h. The interface defines two
functions: init_hw_timer() and set_hw_timeout() that provides HW
specific initialization and timer interrupt source.
Other than these two functions, the timer module is largely arch
agnostic.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Modules that use udelay() should include "delay.h" explicitly.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Generalize and split basic cpu cycle/tick routines from x86/timer:
- Instead of rdstc(), use cpu_ticks() in generic code.
- Instead of get_tsc_khz(), use cpu_tickrate() in generic code.
- Include "common/ticks.h" instead of "x86/timer.h" in generic code.
- CYCLES_PER_MS is renamed to TICKS_PER_MS.
The x86 specific API rdstc() and get_tsc_khz(), as well as TSC_PER_MS
are still available in arch/x86/tsc.h but only for x86 specific usage.
Tracked-On: #5920
Signed-off-by: Rong Liu <rong2.liu@intel.com>
Signed-off-by: Yi Liang <yi.liang@intel.com>
The current permission-checking and dispatching mechanism of hypercalls is
not unified because:
1. Some hypercalls require the exact vCPU initiating the call, while the
others only need to know the VM.
2. Different hypercalls have different permission requirements: the
trusty-related ones are enabled by a guest flag, while the others
require the initiating VM to be the Service OS.
Without a unified logic it could be hard to scale when more kinds of
hypercalls are added later.
The objectives of this patch are as follows.
1. All hypercalls have the same prototype and are dispatched by a unified
logic.
2. Permissions are checked by a unified logic without consulting the
hypercall ID.
To achieve the first objective, this patch modifies the type of the first
parameter of hcall_* functions (which are the callbacks implementing the
hypercalls) from `struct acrn_vm *` to `struct acrn_vcpu *`. The
doxygen-style documentations are updated accordingly.
To achieve the second objective, this patch adds to `struct hc_dispatch` a
`permission_flags` field which specifies the guest flags that must ALL be
set for a VM to be able to invoke the hypercall. The default value (which
is 0UL) indicates that this hypercall is for SOS only. Currently only the
`permission_flag` of trusty-related hypercalls have the non-zero value
GUEST_FLAG_SECURE_WORLD_ENABLED.
With `permission_flag`, the permission checking logic of hypercalls is
unified as follows.
1. General checks
i. If the VM is neither SOS nor having any guest flag that allows
certain hypercalls, it gets #UD upon executing the `vmcall`
instruction.
ii. If the VM is allowed to execute the `vmcall` instruction, but
attempts to execute it in ring 1, 2 or 3, the VM gets #GP(0).
2. Hypercall-specific checks
i. If the hypercall is for SOS (i.e. `permission_flag` is 0), the
initiating VM must be SOS and the specified target VM cannot be a
pre-launched VM. Otherwise the hypercall returns -EINVAL without
further actions.
ii. If the hypercall requires certain guest flags, the initiating VM
must have all the required flags. Otherwise the hypercall returns
-EINVAL without further actions.
iii. A hypercall with an unknown hypercall ID makes the hypercall
returns -EINVAL without further actions.
The logic above is different from the current implementation in the
following aspects.
1. A pre-launched VM now gets #UD (rather than #GP(0)) when it attempts
to execute `vmcall` in ring 1, 2 or 3.
2. A pre-launched VM now gets #UD (rather than the return value -EPERM)
when it attempts to execute a trusty hypercall in ring 0.
3. The SOS now gets the return value -EINVAL (rather than -EPERM) when it
attempts to invoke a trusty hypercall.
4. A post-launched VM with trusty support now gets the return value
-EINVAL (rather than #UD) when it attempts to invoke a non-trusty
hypercall or an invalid hypercall.
v1 -> v2:
- Update documentation that describe hypercall behavior.
- Fix Doxygen warnings
Tracked-On: #5924
Signed-off-by: Junjie Mao <junjie.mao@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Instead of "#include <x86/foo.h>", use "#include <asm/foo.h>".
In other words, we are adopting the same practice in Linux kernel.
Tracked-On: #5920
Signed-off-by: Liang Yi <yi.liang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Requires explicit arch path name in the include directive.
The config scripts was also updated to reflect this change.
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Each .c file includes the arch specific irq header file (with full
path) by itself if required.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
A new x86/guest/virq.h head file now contains all guest
related interrupt handling API.
Tracked-On: #5825
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
The common irq file is responsible for managing the central
irq_desc data structure and provides the following APIs for
host interrupt handling.
- init_interrupt()
- reserve_irq_num()
- request_irq()
- free_irq()
- set_irq_trigger_mode()
- do_irq()
API prototypes, constant and data structures belonging to common
interrupt handling are all moved into include/common/irq.h.
Conversely, the following arch specific APIs are added which are
called from the common code at various points:
- init_irq_descs_arch()
- setup_irqs_arch()
- init_interrupt_arch()
- free_irq_arch()
- request_irq_arch()
- pre_irq_arch()
- post_irq_arch()
Tracked-On: #5825
Signed-off-by: Peter Fang <peter.fang@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Below boolean function are defined in this patch:
- is_software_sram_enabled() to check if SW SRAM
feature is enabled or not.
- set global variable 'is_sw_sram_initialized'
to file static.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For FuSa's case, we remove all dynamic memory allocation use in ACRN HV. Instead,
we use static memory allocation or embedded data structure. For pagetable page,
we prefer to use an index (hva for MMU, gpa for EPT) to get a page from a special
page pool. The special page pool should be big enougn for each possible index.
This is not a big problem when we don't support 64 bits MMIO. Without 64 bits MMIO
support, we could use the index to search addrss not larger than DRAM_SIZE + 4G.
However, if ACRN plan to support 64 bits MMIO in SOS, we could not use the static
memory alocation any more. This is because there's a very huge hole between the
top DRAM address and the bottom 64 bits MMIO address. We could not reserve such
many pages for pagetable mapping as the CPU physical address bits may very large.
This patch will use dynamic page allocation for pagetable mapping. We also need
reserve a big enough page pool at first. For HV MMU, we don't use 4K granularity
page table mapping, we need reserve PML4, PDPT and PD pages according the maximum
physical address space (PPT va and pa are identical mapping); For each VM EPT,
we reserve PML4, PDPT and PD pages according to the maximum physical address space
too, (the EPT address sapce can't beyond the physical address space), and we reserve
PT pages by real use cases of DRAM, low MMIO and high MMIO.
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Tracked-On: #5788
Currently the VM bootargs load address is hard-coded at 8KB right before
kernel load address, this should work for Linux kernel only since Linux
kernel is guaranteed to be loadered high than GPA 8K so its load address
would never be overflowed, other OS like Zephyr has no such assumption.
Tracked-On: #5689
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Current implementation, SOS may allocate the memory region belonging to
hypervisor/pre-launched VM to a post-launched VM. Because it only verifies
the start address rather than the entire memory region.
This patch verifies the validity of the entire memory region before
allocating to a post-launched VM so that the specified memory can only
be allocated to a post-launched VM if the entire memory region is mapped
in SOS’s EPT.
Tracked-On: #5555
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Reviewed-by: Yonghua Huang <yonghua.huang@intel.com>
Physical address to SW SRAM region maybe different
on different platforms, this hardcoded address may
result in address mismatch for SW SRAM operations.
This patch removes above hardcoded address and uses
the physical address parsed from native RTCT.
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'ptcm' and 'ptct' are legacy name according
to the latest TCC spec, hence rename below files
to avoid confusing:
ptcm.c -> rtcm.c
ptcm.h -> rtcm.h
ptct.h -> rtct.h
Tracked-On: #5649
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
The init_multiboot_info() and sanitize_multiboot_ifno() APIs now
require parameters instead of implicitly relying on global boot
variables.
Tracked-On: #5661
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
- The current code to virtualize CR0/CR4 is not
well designed, and hard to read.
This patch reshuffle the logic to make it clear
and classify those bits into PASSTHRU,
TRAP_AND_PASSTHRU, TRAP_AND_EMULATE & reserved bits.
Tracked-On: #5586
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
While following two styles are both correct, the 2nd one is simpler.
bool is_level_triggered;
1. if (is_level_triggered == true) {...}
2. if (is_level_triggered) {...}
This patch cleans up the style in hypervisor.
Tracked-On: #861
Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
- fix bug in 'hcall_destroy_vdev()', the availability of
vpci device shall be checked on 'target_vm".
- refine 'vpci_update_one_vbar()' to avoid potential NULL
pointer access.
Tracked-On: #5490
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Hypercall handlers for post-launched VMs automatically grab the vm_lock
in dispatch_sos_hypercall(). Remove the use of vm_lock inside the
handler.
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
More than one VM may request shutdown on the same pCPU before
shutdown_vm_from_idle() is called in the idle thread when pCPUs are
shared among VMs.
Use a per-pCPU bitmap to store all the VMIDs requesting shutdown.
v1 -> v2:
- use vm_lock to avoid a race on shutdown
Tracked-On: #5411
Signed-off-by: Peter Fang <peter.fang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pSRAM memory should be cachable. However, it's not a RAM or a normal MMIO,
so we can't use the an exist API to do the EPT mapping and set the EPT cache
attribute to WB for it. Now we assume that SOS must assign the PSRAM area as
a whole and as a separate memory region whose base address is PSRAM_BASE_HPA.
If the hpa of the EPT mapping region is equal to PSRAM_BASE_HPA, we think this
EPT mapping is for pSRAM, we change the EPT mapping cache attribute to WB.
And fix a minor bug when SOS trap out to emulate wbinvd when pSRAM is enabled.
Tracked-On: #5330
Signed-off-by: Qian Wang <qian1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add cteate method for vmcs9900 vdev in hypercalls.
The destroy method of ivshmem is also suitable for other emulated vdev,
move it into hcall_destroy_vdev() for all emulated vdevs
Tracked-On: #5394
Signed-off-by: Tao Yuhong <yuhong.tao@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
This function can be used by other modules instead of hypercall
handling only, hence move it to vlapic.c
Tracked-On: #5407
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Li, Fei <fei1.li@intel.com>
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Now ACRN supports direct boot mode, which could be SBL/ABL, or GRUB boot.
Thus the vboot wrapper layer can be removed and the direct boot functions
don't need to be wrapped in direct_boot.c:
- remove call to init_vboot(), and call e820_alloc_memory() directly at the
time when the trampoline buffer is actually needed.
- Similarly, call CPU_IRQ_ENABLE() instead of the wrapper init_vboot_irq().
- remove get_ap_trampoline_buf(), since the existing function
get_trampoline_start16_paddr() returns the exact same value.
- merge init_general_vm_boot_info() into init_vm_boot_info().
- remove vm_sw_loader pointer, and call direct_boot_sw_loader() directly.
- move get_rsdp_ptr() from vboot_wrapper.c to multiboot.c, and remove the
wrapper over two boot modes.
Tracked-On: #5197
Signed-off-by: Zide Chen <zide.chen@intel.com>
Previously we use a pre-defined structure as vACPI table for pre-launched
VM, the structure is initialized by HV code. Now change the method to use a
pre-loaded multiboot module instead. The module file will be generated by
acrn-config tool and loaded to GPA 0x7ff00000, a hardcoded RSDP table at
GPA 0x000f2400 will point to the XSDT table which at GPA 0x7ff00080;
Tracked-On: #5266
Signed-off-by: Victor Sun <victor.sun@intel.com>
Signed-off-by: Shuang Zheng <shuang.zheng@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Some hypercalls to a target VM are only acceptable in some certain
states, else it impacts target VM. Add some restrictive status checks to
avoid that.
Tracked-On: #5208
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Virtual interrupts injection and memory mapping operations can impact
target VM. By design, these type of operations from lower severity VM
to higher severity VM should be blocked by the hypervisor.
While the hypercalls are the interface between SOS VM and the
hypervisor, severity checks can be implemented at the beginning of
hypercalls needed.
Added severity checks in below hypercalls:
* hcall_set_vm_memory_regions()
* hcall_notify_ioreq_finish()
* hcall_set_irqline()
* hcall_inject_msi()
* hcall_write_protect_page()
Tracked-On: #5208
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For ivshmem vdev creation, the vdev vBDF, vBARs, shared memory region
name and size are set by device model. The shared memory name and size
must be same as the corresponding device configuration which is configured
by offline tool.
v3: add a comment to the vbar_base member of the acrn_vm_pci_dev_config
structure that vbar_base is power-on default value
Tracked-On: #4853
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add HC_CREATE_VDEV and HC_DESTROY_VDEV two hypercalls that are used to
create and destroy an emulated device(PCI device or legacy device) in hypervisor
v3: 1) change HC_CREATE_DEVICE and HC_DESTROY_DEVICE to HC_CREATE_VDEV
and HC_DESTROY_VDEV
2) refine code style
v4: 1) remove unnecessary parameter
2) add VM state check for HC_CREATE_VDEV and HC_DESTROY hypercalls
Tracked-On: #4853
Reviewed-by: Wang, Yu1 <yu1.wang@intel.com>
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- use an array to fast locate the hypercall handler
to replace switch case.
-- uniform hypercall handler as below:
int32_t (*handler)(sos_vm, target_vm, param1, param2)
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
2abbb99f6a ("hv: make thread status more accurate") introduced a
transition stage, marked as var be_blocking, between RUNNING->BLOCKED
of thread status. wake_thread() does not work in this transition stage
because it only checks thread->status.
Need to check thread->be_blocking as well in wake_thread(). When
wake_thread() happens in the transition stage, the previous sleep
operation rolled back.
Tracked-On: #5190
Fixes: 2abbb99f6a ("hv: make thread status more accurate")
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
As we only set BLOCKED status in context switch_out, which means, only
running thread can be changed to BLOCKED, but runnable thread can not.
This lead to the deadloop in sleep_thread_sync.
To solve the problem, in sleep_thread, we set the status to BLOCKED
directly when the original thread status is RUNNABLE.
Tracked-On: #5115
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Add two hypercalls to support MMIO device pass through for post-launched VM.
And when we support MMIO pass through for pre-launched VM, we could re-use
the code in mmio_dev.c
Tracked-On: #5053
Signed-off-by: Li Fei1 <fei1.li@intel.com>
kick_thread function is only used by kick_vcpu to kick vcpu out of
non-root mode, the implementation in it is sending IPI to target CPU if
target obj is running and target PCPU is not current one; while for
runnable obj, it will just make reschedule request. So the kick_thread
is not actually belong to scheduler module, we can drop it and just do
the cpu notification in kick_vcpu.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
vcpu->running is duplicated with THREAD_STS_RUNNING status of thread
object. Introduce an API sleep_thread_sync(), which can utilize the
inner status of thread object, to do the sync sleep for zombie_vcpu().
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Update thread status after switch_in/switch_out.
2. Add 'be_blocking' to represent the intermediate state during
sleep_thread and switch_out. After switch_out, the thread status
update to THREAD_STS_BLOCKED.
Tracked-On: #5057
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Reviewed-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
-- replace global hypercall lock with per-vm lock
-- add spinlock protection for vm & vcpu state change
v1-->v2:
change get_vm_lock/put_vm_lock parameter from vm_id to vm
move lock obtain before vm state check
move all lock from vmcall.c to hypercall.c
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
will follow this convention for spin lock initialization:
-- for simple global variable locks, use this style:
static spinlock_t xxx_spinlock = {.head = 0U, .tail = 0U,}
-- for the locks inside a data structure, need to call
spinlock_init to initialize.
Tracked-On: #4958
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
hv: hypercall: restrict the condition to assign/deassign a pci device to
a post-launched VM for safety
For the safety of post-launched VMs, pci devices assignments should
occur only when VM is being created (at VM_CREATED STATUS), and pci
devices de-assignment should occur only when VM is being created or
shutdown/reset (at VM_CREATED or VM_PAUSED status)
Tracked-On: #4995
Acked-by: Eddie Done <eddie.dong@intel.com>
Reviewed-by: Li Fei <Fei1.Li@intel.com>
Signed-off-by: Wang Qian <qian1.wang@intel.com>
For a ptirq_remapping_info entry, when build IRTE:
- If the caller provides a valid IRTE, use the IRET
- If the caller doesn't provide a valid IRTE, allocate a IRET when the
entry doesn't have a valid IRTE, in this case, the IRET will be freed
when free the entry.
Tracked-On:#4831
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
The mask valuei 0x3F was added to prevent out of range in array access.
However, it should not be hardcoded.
Since in ptirq_alloc_entry_id, the valid allocated id is no greater
than CONFIG_MAX_PT_IRQ_ENTRIES, it will not cause out of range array
access without mask.
So this patch removes the mask.
Also, use bitmap_clear_lock instead of bitmap_clear_nolock becuase there
could be the chance that more than 1 core to access a same 64bit var.
Tracked-On: #4828
Signed-off-by: Binbin Wu <binbin.wu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Refine find_ptirq_entry by hashing instead of walk each of the PTIRQ entries one by one.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong<eddie.dong@Intel.com>
For post-launched VMs, the configured CPU affinity could be different
from the actual running CPU affinity. This new field acrn_vm->cpu_affinity
recognizes this difference so that it's possible that CREATE_VM
hypercall won't overwrite the configured CPU afifnity.
Change name cpu_affinity_bitmap in acrn_vm_config to cpu_affinity.
This is read-only in run time, never overwritten by acrn-dm.
Remove vm_config->vcpu_num, which means the number of vCPUs of the
configured CPU affinity. This is not to be confused with the actual
running vCPU number: vm->hw.created_vcpus.
Changed get_vm_bsp_pcpu_id() to get_configured_bsp_pcpu_id() for less
confusion.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For return value of local_gpa2hpa, either INVALID_HPA or NULL
means the EPT walking failure. Current code only take care of
NULL return and leave INVALID_HPA as correct case.
In some cases (if guest page table is filled with invalid memory
address), it could crash ACRN from guest.
Add INVALID_HPA return check as well.
Also add @pre assumptions for some gpa2hpa usages.
Tracked-On: #4730
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
- add a new member cpu_affinity to struct acrn_create_vm, so that acrn-dm
is able to assign CPU affinity through HC_CREATE_VM hypercall.
- if vm_create.cpu_affinity is zero, hypervisor launches the VM with the
statically configured CPU affinity.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
check the vm state in hypercall api,
add pre-condition for vm api.
Tracked-On: #4320
Signed-off-by: Mingqiang Chi <mingqiang.chi@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Hypervisor reports VM configuration information to SOS which can be used to
dynamically allocate VCPU affinity.
Servise OS can get the vm_configs in this order:
1. call platform_info HC (set vm_configs_addr with 0) to get max_vms and
vm_config_entry_size.
2. allocate memory for acrn_vm_config array based on the number of VMs
and entry size that just got in step 1.
3. call platform_info HC again to collect VM configurations.
Tracked-On: #4616
Signed-off-by: Zide Chen <zide.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This function casts a member of a structure out to the containing structure.
So rename to container_of is more readable.
Tracked-On: #4550
Signed-off-by: Li Fei1 <fei1.li@intel.com>
- since now we don't need to print error messages if copy_to/from_gpa()
fails, then in many cases we can simplify the function return handling.
In the following example, my fix could change the 'ret' value from
the original '-1' to the actual errno returned from copy_to_gpa(). But
this is valid. Ideally we may replace all '-1' with the actual errno.
- if (copy_to_gpa() < 0) {
- pr_err("error messages");
- ret = -1;
- } else {
- ret = 0;
- }
+ ret = copy_to_gpa();
- in most cases, 'ret' is declared with a default value 0 or -1, then the
redundant assignment statements can be removed.
- replace white spaces with tabs.
Tracked-On: #3854
Signed-off-by: Zide Chen <zide.chen@intel.com>
For SOS VM, when the target platform has multiple IO-APICs, there
should be equal number of virtual IO-APICs.
This patch adds support for emulating multiple vIOAPICs per VM.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
MADT is used to specify the GSI base for each IO-APIC and the number of
interrupt pins per IO-APIC is programmed into Max. Redir. Entry register of
that IO-APIC.
On platforms with multiple IO-APICs, there can be holes in the GSI space.
For example, on a platform with 2 IO-APICs, the following configuration has
a hole (from 24 to 31) in the GSI space.
IO-APIC 1: GSI base - 0, number of pins - 24
IO-APIC 2: GSI base - 32, number of pins - 8
This patch also adjusts the size for variables used to represent the total
number of IO-APICs on the system from uint16_t to uint8_t as the ACPI MADT
uses only 8-bits to indicate the unique IO-APIC IDs.
Tracked-On: #4151
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Acked-by: Eddie Dong <eddie.dong@Intel.com>
Initialize efi info of acrn mbi when boot from multiboot2 protocol, with
this patch hypervisor could get host efi info and pass it to Linux zeropage,
then make guest Linux possible to boot with efi environment;
Tracked-On: #4419
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Count down number will be decreased at each tick, when it comes to zero,
it will trigger reschedule.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
pick_next function will update the virtual time parameters, and return
the vcpu thread with earlest evt. Calculate the count down number for
the picked vcpu thread, it means how many mcu a thread can run before
the next reschedule occur.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In the wakeup handler, the vcpu_thread object will be inserted into the
runqueue, and in the sleep handler, it will be removed from the queue.
vcpu_thread object is ordered by EVT (effective virtual time).
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add init function for bvt scheduler, creating a runqueue and a period
timer, the timer interval is default as 1ms. The interval is the minimum
charging unit.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
BVT (Borrowed virtual time) scheduler is used to schedule vCPUs on pCPU.
It has the concept of virtual time, vCPU with earliset virtual time is
dispatched first.
Main concepts:
tick timer:
a period tick is used to measure the physcial time in units of MCU
(minimum charing unit).
runqueue:
thread in the runqueue is ordered by virtual time.
weight:
each thread receives a share of the pCPU in proportion to its
weight.
context switch allowance:
the physcial time by which the current thread is allowed to advance
beyond the next runnable thread.
warp:
a thread with warp enabled will have a change to minus a value (Wi)
from virtual time to achieve higher priority.
virtual time:
AVT: actual virtual time, advance in proportional to weight.
EVT: effective virtual time.
EVT <- AVT - ( warp ? Wi : 0 )
SVT: scheduler virtual time, the minimum AVT in the runqueue.
Tracked-On: #4410
Signed-off-by: Conghui Chen <conghui.chen@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Rename BOOT_CPU_ID to BSP_CPU_ID
2. Repace hardcoded value with BSP_CPU_ID when
ID of BSP is referenced.
Tracked-On: #4420
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Now we split passthrough PCI device from DM to HV, we could remove all the passthrough
PCI device unused code.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add assign/deassign PCI device hypercall APIs to assign a PCI device from SOS to
post-launched VM or deassign a PCI device from post-launched VM to SOS. This patch
is prepared for spliting passthrough PCI device from DM to HV.
The old assign/deassign ptdev APIs will be discarded.
Tracked-On: #4371
Signed-off-by: Li Fei1 <fei1.li@intel.com>
On UEFI UP2 board, APs might execute HLT before SOS kernel INIT them.
After SOS kernel take over and will re-init the APs directly. The flows
from HV perspective is like:
HLT trap:
wait_event(VCPU_EVENT_VIRTUAL_INTERRUPT) -> sleep_thread
SOS kernel INIT, SIPI APs:
pause_vcpu(ZOMBIE) -> sleep_thread
-> reset_vcpu
-> launch_vcpu -> wake_vcpu
However, the last wake_vcpu will fail because the cpu event
VCPU_EVENT_VIRTUAL_INTERRUPT had not got signaled.
This patch will reset all vcpu events in reset_vcpu. If the thread was
previously waiting for a event, its waiting status will be cleared and
launch_vcpu will wake it to running.
Tracked-On: #4402
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
1. Align the coding style for these MACROs
2. Align the values of fixed VECTORs
Tracked-On: #4348
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
is_polling_ioreq is more straightforward. Rename it.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
'param' is BDF value instead of GPA when VHM driver
issues below 2 hypercalls:
- HC_ASSIGN_PTEDEV
- HC_DEASSIGN_PTDEV
This patch is to remove related code in hc_assign/deassign()
functions.
Tracked-On: #4334
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
HLT emulation is import to CPU resource maximum utilization. vcpu
doing HLT means it is idle and can give up CPU proactively. Thus, we
pause the vcpu thread in HLT emulation and resume it while event happens.
When vcpu enter HLT, its vcpu thread will sleep, but the vcpu state is
still 'Running'.
VM ID PCPU ID VCPU ID VCPU ROLE VCPU STATE
===== ======= ======= ========= ==========
0 0 0 PRIMARY Running
0 1 1 SECONDARY Running
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Introduce two kinds of events for each vcpu,
VCPU_EVENT_IOREQ: for vcpu waiting for IO request completion
VCPU_EVENT_VIRTUAL_INTERRUPT: for vcpu waiting for virtual interrupts events
vcpu can wait for such events, and resume to run when the
event get signalled.
This patch also change IO request waiting/notifying to this way.
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This simple event implemention can only support exclusive waiting
at same time. It mainly used by thread who want to wait for special event
happens.
Thread A who want to wait for some events calls
wait_event(struct sched_event *);
Thread B who can give the event signal calls
signal_event(struct sched_event *);
Tracked-On: #4329
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Before we assign a PT device to post-launched VM, we should reset the PCI device
first. However, ACRN hypervisor doesn't plan to support PCIe hot-plug and doesn't
support PCIe bridge Secondary Bus Reset. So the PT device must support FLR or PM
reset. This patch do this check when assigning a PT device to post-launched VM.
Tracked-On: #3465
Signed-off-by: Li Fei1 <fei1.li@intel.com>
The patch abstract a vcpu_reset_internal() api for internal usage, the
function would not touch any vcpu state transition and just do vcpu reset
processing. It will be called by create_vcpu() and reset_vcpu().
The reset_vcpu() will act as a public api and should be called
only when vcpu receive INIT or vm reset/resume from S3. It should not be
called when do shutdown_vm() or hcall_sos_offline_cpu(), so the patch remove
reset_vcpu() in shutdown_vm() and hcall_sos_offline_cpu().
The patch also introduced reset_mode enum so that vcpu and vlapic could do
different context operation according to different reset mode;
Tracked-On: #4267
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
We don't use INIT signal notification method now. This patch
removes them.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
We have implemented a new notification method using NMI.
So replace the INIT notification method with the NMI one.
Then we can remove INIT notification related code later.
Tracked-On: #3886
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
ACRN hypervisor needs to kick vCPU off VMX non-root mode to do some
operations in hypervisor, such as interrupt/exception injection, EPT
flush etc. For non lapic-pt vCPUs, we can use IPI to do so. But, it
doesn't work for lapic-pt vCPUs as the IPI will be injected to VMs
directly without vmexit.
Without the way to kick the vCPU off VMX non-root mode to handle pending
request on time, there may be fatal errors triggered.
1). Certain operation may not be carried out on time which may further
lead to fatal errors. Taking the EPT flush request as an example, once we
don't flush the EPT on time and the guest access the out-of-date EPT,
fatal error happens.
2). ACRN now will send an IPI with vector 0xF0 to target vCPU to kick the vCPU
off VMX non-root mode if it wants to do some operations on target vCPU.
However, this way doesn't work for lapic-pt vCPUs. The IPI will be delivered
to the guest directly without vmexit and the guest will receive a unexpected
interrupt. Consequently, if the guest can't handle this interrupt properly,
fatal error may happen.
The NMI can be used as the notification signal to kick the vCPU off VMX
non-root mode for lapic-pt vCPUs. So, this patch uses NMI as notification signal
to address the above issues for lapic-pt vCPUs.
Tracked-On: #3886
Acked-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Kaige Fu <kaige.fu@intel.com>
Input 'vcpu_id' and the state of target vCPU should be validated
properly:
- 'vcpu_id' shall be less than 'vm->hw.created_vcpus' instead
of 'MAX_VCPUS_PER_VM'.
- The state of target vCPU should be "VCPU_PAUSED", and reject
all other states.
Tracked-On: #4245
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
This patch fixes potential hypervisor crash when
calling hcall_write_protect_page() with a crafted
GPA in 'struct wp_data' instance, e.g. an invalid
GPA that is not in the scope of the target VM's
EPT address space.
To check the validity for this GPA before updating
the 'write protect' page.
Tracked-On: #4240
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Fei Li <fei1.li@intel.com>
In current architecutre, the maximum vCPUs number per VM could not
exceed the pCPUs number. Given the MAX_PCPU_NUM macro is provided
in board configurations, so remove the MAX_VCPUS_PER_VM from Kconfig
and add a macro of MAX_VCPUS_PER_VM to reference MAX_PCPU_NUM directly.
Tracked-On: #4230
Signed-off-by: Victor Sun <victor.sun@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
For now, we set NOOP scheduler as default. User can choose IORR scheduler as needed.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Add yield support for schedule, which can give up pcpu proactively.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Implement .sleep/.wake/.pick_next of sched_iorr.
In .pick_next, we count current object's timeslice and pick the next
avaiable one. The policy is
1) get the first item in runqueue firstly
2) if object picked has no time_cycles, replenish it pick this one
3) At least take one idle sched object if we have no runnable object
after step 1) and 2)
In .wake, we start the tick if we have more than one active
thread_object in runqueue. In .sleep, stop the tick timer if necessary.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
sched_control is per-pcpu, each sched_control has a tick timer running
periodically. Every period called a tick. In tick handler, we do
1) compute left timeslice of current thread_object if it's not the idle
2) make a schedule request if current thread_object run out of timeslice
For runqueue maintaining, we will keep objects which has timeslice in
the front of runqueue and the ones get new replenished in tail.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
We set timeslice to 10ms as default, and set tick interval to 1ms.
When init sched_iorr scheduler, we init a periodic timer as the tick and
init the runqueue to maintain objects in the sched_control. Destroy
the timer in deinit.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
IO sensitive Round-robin scheduler aim to schedule threads with
round-robin policy. Meanwhile, we also enhance it with some fairness
configuration, such as thread will be scheduled out without properly
timeslice. IO request on thread will be handled in high priority.
This patch only add a skeleton for the sched_iorr scheduler.
Tracked-On: #4178
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
After changing init_vmcs to smp call approach and do it before
launch_vcpu, it could work with noop scheduler. On real sharing
scheudler, it has problem.
pcpu0 pcpu1 pcpu1
vmBvcpu0 vmAvcpu1 vmBvcpu1
vmentry
init_vmcs(vmBvcpu1) vmexit->do_init_vmcs
corrupt current vmcs
vmentry fail
launch_vcpu(vmBvcpu1)
This patch mark a event flag when request vmcs init for specific vcpu. When
it is running and checking pending events, will do init_vmcs firstly.
Tracked-On: #4178
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
In theory, guest could re-program PCI BAR address to any address. However, ACRN
hypervisor only support [0, top_address_space) EPT memory mapping. So we need to
check whether the PCI BAR re-program address is within this scope.
Tracked-On: #3475
Signed-off-by: Li, Fei1 <fei1.li@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
kick means to notify one thread_object. If the target thread object is
running, send a IPI to notify it; if the target thread object is
runnable, make reschedule on it.
Also add kick_vcpu API in vcpu layer to notify vcpu.
Tracked-On: #3813
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Yu Wang <yu1.wang@intel.com>
Signed-off-by: Shuo A Liu <shuo.a.liu@intel.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>