mirror of
https://github.com/projectacrn/acrn-hypervisor.git
synced 2025-07-31 23:38:24 +00:00
doc: add the doc for 'Error Detection and Handling'
This patch adds the doc for 'Error Detection and Handling'. Signed-off-by: Shiqing Gao <shiqing.gao@intel.com>
This commit is contained in:
parent
72fbc7e79f
commit
18b619da4d
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
@ -32,3 +32,4 @@ project.
|
||||
coding_guidelines
|
||||
doc_guidelines
|
||||
graphviz
|
||||
sw_design_guidelines
|
||||
|
489
doc/developer-guides/sw_design_guidelines.rst
Normal file
489
doc/developer-guides/sw_design_guidelines.rst
Normal file
@ -0,0 +1,489 @@
|
||||
.. _sw_design_guidelines:
|
||||
|
||||
Software Design Guidelines
|
||||
##########################
|
||||
|
||||
Error Detection and Error Handling
|
||||
**********************************
|
||||
|
||||
Workflow
|
||||
========
|
||||
|
||||
Error detection and error handling workflow in the ACRN hypervisor is shown in
|
||||
:numref:`work_flow_of_error_detection_and_error_handling`.
|
||||
|
||||
.. figure:: images/work_flow_of_error_detection_and_error_handling.png
|
||||
:align: center
|
||||
:name: work_flow_of_error_detection_and_error_handling
|
||||
|
||||
Error Detection and Error Handling Workflow
|
||||
|
||||
|
||||
Design Assumption
|
||||
=================
|
||||
|
||||
There are three types of design assumptions in the ACRN hypervisor, as shown
|
||||
below:
|
||||
|
||||
**Pre-condition**
|
||||
Pre-conditions shall be defined right before the definition/declaration of
|
||||
the corresponding function in the C source file or header file.
|
||||
All pre-conditions shall be guaranteed by the caller of the function.
|
||||
Error checking of the pre-conditions are not needed in release version of the
|
||||
function. Developers could use ASSERT to catch design errors in a debug
|
||||
version for some cases. Verification of the hypervisor shall check whether
|
||||
each caller guarantees all pre-conditions of the callee (or not).
|
||||
|
||||
This design assumption applies to the following cases:
|
||||
|
||||
- Input parameters of the function.
|
||||
- Global state, such as hypervisor operation mode.
|
||||
|
||||
**Post-condition**
|
||||
Post-conditions shall be defined right before the definition/declaration of
|
||||
the corresponding function in the C source file or header file.
|
||||
All post-conditions shall be guaranteed by the function. All callers of the
|
||||
function should trust these post-conditions are met.
|
||||
Error checking of the post-conditions are not needed in release version of
|
||||
each caller. Developers could use ASSERT to catch design errors in a debug
|
||||
version for some cases. Verification of the hypervisor shall check whether the
|
||||
function guarantees all post-conditions (or not).
|
||||
|
||||
This design assumption applies to the following case:
|
||||
|
||||
- Return value of the function
|
||||
|
||||
It is used to guarantee that the return value is valid, such as the return
|
||||
pointer is not NULL, the return value is within a valid range, or the
|
||||
members of the return structure are valid.
|
||||
|
||||
|
||||
**Application Constraints**
|
||||
Application constraints of the hypervisor shall be defined in design document
|
||||
and safety manual.
|
||||
All application constraints shall be guaranteed by external safety
|
||||
applications, such as Board Support Package, firmware, safety VM, or Hardware.
|
||||
The verification of application integration shall check whether the safety
|
||||
application meets all application constraints.
|
||||
|
||||
This design assumption applies to the following cases:
|
||||
|
||||
- Configuration data defined by external safety application, such as physical
|
||||
PCI device information specific for each board design.
|
||||
|
||||
- Input data which is only specified by external safety application.
|
||||
|
||||
|
||||
Architecture Level
|
||||
==================
|
||||
|
||||
Fulfillment to FuSa Standards
|
||||
-----------------------------
|
||||
|
||||
The fulfillment matrix of error detection and error handling on architecture
|
||||
level to FuSa Standards [IEC_61508-7]_ in ACRN hypervisor is shown in
|
||||
:numref:`fulfillment_matrix_arch_level` below.
|
||||
|
||||
.. table:: Fulfillment to FuSa Standards on Architecture Level
|
||||
:align: center
|
||||
:widths: auto
|
||||
:name: fulfillment_matrix_arch_level
|
||||
|
||||
+-----------------------+---------------+----------+--------------------------------------+
|
||||
| Technique/Measure | Ref in | SIL3 | Company Solution |
|
||||
| | IEC 61508-7 | | |
|
||||
+=======================+===============+==========+======================================+
|
||||
| Failure assertion | C.3.3 | R | This technique will be used. |
|
||||
| programming | | | Plan to do range check in hypercalls |
|
||||
| | | | and HW capability checks. |
|
||||
+-----------------------+---------------+----------+--------------------------------------+
|
||||
|
||||
|
||||
Error Handling Methods
|
||||
----------------------
|
||||
|
||||
The error handling methods used in the ACRN hypervisor on an architecture level
|
||||
are shown below.
|
||||
|
||||
**Invoke default fatal error handler**
|
||||
The hypervisor shall invoke the default fatal error handler when the below
|
||||
cases occur. Customers can define platform-specific handlers, allowing them to
|
||||
implement additional error reporting (mostly to hardware) if required. The
|
||||
default fatal error handler will invoke platform-specific handlers defined by
|
||||
users at first, then it will panic the system.
|
||||
|
||||
This method applies to the following cases:
|
||||
|
||||
- Related hardware resources are unavailable.
|
||||
- Boot information is invalid during platform initialization.
|
||||
- Unexpected exception occurs in root mode due to hardware failures.
|
||||
- Failures occur in the VM dedicated for error handling.
|
||||
|
||||
**Return error code**
|
||||
The hypervisor shall return an error code to the VM when the below cases
|
||||
occur. The error code shall indicate the error type detected (e.g. invalid
|
||||
parameter, device not found, device busy, resource unavailable, etc).
|
||||
|
||||
This method applies to the following case:
|
||||
|
||||
- The hypercall parameter from the VM is invalid.
|
||||
|
||||
**Inform the safety VM through specific register or memory area**
|
||||
The hypervisor shall inform the safety VM through a specific register or
|
||||
memory area when the below cases occur. The VM will decide how to handle the
|
||||
related error. This shall only be done after the VM (Safety OS or Service OS)
|
||||
dedicated to error handling has started.
|
||||
|
||||
This method applies to the following cases:
|
||||
|
||||
- Machine check errors occur due to hardware failures.
|
||||
|
||||
- Unexpected VM entry failures occur, where the VM is not the one dedicated
|
||||
for error handling.
|
||||
|
||||
**Panic the system via ASSERT**
|
||||
The hypervisor can panic the system when the below cases occur. It shall
|
||||
only be used for debug and used to check pre-conditions and post-conditions
|
||||
to catch design errors.
|
||||
|
||||
This method applies to the following case:
|
||||
|
||||
- Software design errors occur.
|
||||
|
||||
|
||||
Rules of Error Detection and Error Handling
|
||||
-------------------------------------------
|
||||
|
||||
The rules of error detection and error handling on an architecture level are
|
||||
shown in :numref:`rules_arch_level` below.
|
||||
|
||||
.. table:: Rules of Error Detection and Error Handling on Architecture Level
|
||||
:align: center
|
||||
:widths: auto
|
||||
:name: rules_arch_level
|
||||
|
||||
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
||||
| Resource Class | Failure Mode | Error | Error Handling Policy | Example |
|
||||
| | | Detection | | |
|
||||
| | | via | | |
|
||||
| | | Hypervisor | | |
|
||||
+====================+=========================+==============+===========================+=========================+
|
||||
| External resource | Invalid register/memory | Yes | Follow SDM strictly, or | Unsupported MSR |
|
||||
| provided by VM | state on VM exit | | state any deviation to the| or invalid CPU ID |
|
||||
| | | | document explicitly | |
|
||||
| +-------------------------+--------------+---------------------------+-------------------------+
|
||||
| | Invalid hypercall | Yes | The hypervisor shall | Invalid hypercall |
|
||||
| | parameter | | return related error code | parameter provided by |
|
||||
| | | | to the VM | any VM |
|
||||
| +-------------------------+--------------+---------------------------+-------------------------+
|
||||
| | Invalid data in the | Yes | Case by case depending | Invalid data in memory |
|
||||
| | sharing memory area | | on the data | shared with all VMs, |
|
||||
| | | | | such as IO request |
|
||||
| | | | | buffers and sbuf for |
|
||||
| | | | | debug |
|
||||
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
||||
| External resource | Invalid E820 table or | Yes | The hypervisor shall | Invalid E820 table or |
|
||||
| provided by | invalid boot information| | panic during platform | invalid boot information|
|
||||
| bootloader | | | initialization | |
|
||||
| (UEFI or SBL) | | | | |
|
||||
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
||||
| Physical resource | 1GB page is not | Yes | The hypervisor shall | 1GB page is not |
|
||||
| used by the | available on the | | panic during platform | available on the |
|
||||
| hypervisor | platform or invalid | | initialization | platform or invalid |
|
||||
| | physical CPU ID | | | physical CPU ID |
|
||||
+--------------------+-------------------------+--------------+---------------------------+-------------------------+
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
Here is an example to illustrate when error handling codes are required on
|
||||
an architecture level.
|
||||
|
||||
There are two pre-condition statements of ``vcpu_from_vid``. It indicates that
|
||||
it's the caller's responsibility to guarantee these pre-conditions.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @pre vcpu_id < CONFIG_MAX_VCPUS_PER_VM
|
||||
* @pre &(vm->hw.vcpu_array[vcpu_id])->state != VCPU_OFFLINE
|
||||
*/
|
||||
static inline struct acrn_vcpu *vcpu_from_vid(struct acrn_vm *vm, uint16_t vcpu_id)
|
||||
{
|
||||
return &(vm->hw.vcpu_array[vcpu_id]);
|
||||
}
|
||||
|
||||
``vcpu_from_vid`` is called by ``hcall_set_vcpu_regs``, which is a hypercall.
|
||||
``hcall_set_vcpu_regs`` is an external interface and ``vcpu_id`` is provided by
|
||||
VM. In this case, we shall add the error checking codes before calling
|
||||
``vcpu_from_vid`` to make sure that the passed parameters are valid and the
|
||||
pre-conditions are guaranteed.
|
||||
|
||||
Here is the sample codes for error checking before calling ``vcpu_from_vid``:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
status = 0;
|
||||
|
||||
if (vcpu_id >= CONFIG_MAX_VCPUS_PER_VM) {
|
||||
pr_err("vcpu id is out of range \r\n");
|
||||
status = -EINVAL;
|
||||
} else if ((&(vm->hw.vcpu_array[vcpu_id]))->state == VCPU_OFFLINE) {
|
||||
pr_err("vcpu is offline \r\n");
|
||||
status = -EINVAL;
|
||||
}
|
||||
|
||||
if (status == 0) {
|
||||
vcpu = vcpu_from_vid(vm, vcpu_id);
|
||||
...
|
||||
}
|
||||
|
||||
|
||||
Module Level
|
||||
============
|
||||
|
||||
Fulfillment to FuSa Standards
|
||||
-----------------------------
|
||||
|
||||
The fulfillment matrix of error detection and error handling on a module level
|
||||
to FuSa Standards [IEC_61508-7]_ in ACRN hypervisor is shown in
|
||||
:numref:`fulfillment_matrix_module_level` below.
|
||||
|
||||
.. table:: Fulfillment to FuSa Standards on Module Level
|
||||
:align: center
|
||||
:widths: auto
|
||||
:name: fulfillment_matrix_module_level
|
||||
|
||||
+-----------------------+---------------+----------+---------------------------------------------------+
|
||||
| Technique/Measure | Ref in | SIL3 | Company Solution |
|
||||
| | IEC 61508-7 | | |
|
||||
+=======================+===============+==========+===================================================+
|
||||
| Design and coding | C.2.6 | HR | This technique will be used for internal functions|
|
||||
| standards | Table B.1 | | of the hypervisor. Plan to use data verification, |
|
||||
| | | | with explicit specification of pre-conditions and |
|
||||
| | | | post-conditions for functions. |
|
||||
+-----------------------+---------------+----------+---------------------------------------------------+
|
||||
|
||||
|
||||
Error Handling Methods
|
||||
----------------------
|
||||
|
||||
The error handling methods used in the ACRN hypervisor on a module level are
|
||||
shown below.
|
||||
|
||||
**Panic the system via ASSERT**
|
||||
The hypervisor can panic the system when the below cases occur. It shall
|
||||
only be used for debugging, used to check pre-conditions and post-conditions
|
||||
to catch design errors.
|
||||
|
||||
This method applies to the following case:
|
||||
|
||||
- Software design errors occur.
|
||||
|
||||
|
||||
Rules of Error Detection and Error Handling
|
||||
-------------------------------------------
|
||||
|
||||
The rules of error detection and error handling on a module level are shown in
|
||||
:numref:`rules_module_level` below.
|
||||
|
||||
.. table:: Rules of Error Detection and Error Handling on Module Level
|
||||
:align: center
|
||||
:widths: auto
|
||||
:name: rules_module_level
|
||||
|
||||
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
||||
| Resource Class | Failure | Error Detection via | Error Handling Policy | Example |
|
||||
| | Mode | Hypervisor | | |
|
||||
+====================+===========+============================+===========================+=========================+
|
||||
| Internal data of | N/A | Partial. | The hypervisor shall use | virtual PCI device |
|
||||
| the hypervisor | | The related pre-conditions | the internal resource/data| information, defined |
|
||||
| | | are required. | directly. | with array 'pci_vdevs[]'|
|
||||
| | | The design will guarantee | | through static |
|
||||
| | | the correctness and the | | allocation. |
|
||||
| | | test cases will verify the | | |
|
||||
| | | related pre-conditions. | | |
|
||||
| | | If the design can not | | |
|
||||
| | | guarantee the correctness, | | |
|
||||
| | | the related error handling | | |
|
||||
| | | codes need to be added. | | |
|
||||
| | | Note: Some examples of | | |
|
||||
| | | pre-conditions are listed, | | |
|
||||
| | | like non-empty array, valid| | |
|
||||
| | | array size and non-null | | |
|
||||
| | | pointer. | | |
|
||||
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
||||
| Configuration data | Corrupted | No. | The bootloader initializes| 'vm_config->pci_ptdevs' |
|
||||
| of the VM | VM config | The related pre-conditions | hypervisor (including | is configured |
|
||||
| | | are required. | code, data, and bss) and | statically. |
|
||||
| | | Note: VM configuration data| verifies the integrity of | |
|
||||
| | | are auto generated based on| hypervisor image in which | |
|
||||
| | | different board configs, | VM configurations are. | |
|
||||
| | | they are defined | Thus hypervisor does not | |
|
||||
| | | as static structure. | need any additional | |
|
||||
| | | | mechanism. | |
|
||||
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
||||
| Configuration data | N/A | No. | The hypervisor shall use | The maximum number of |
|
||||
| of the hypervisor | | The related pre-conditions | the internal resource/data| PCI devices in the VM, |
|
||||
| | | are required. | directly. | defined with |
|
||||
| | | The design will guarantee | | CONFIG_MAX_PCI_DEV_NUM |
|
||||
| | | the correctness and this | | through configuration. |
|
||||
| | | shall be verified manually.| | |
|
||||
+--------------------+-----------+----------------------------+---------------------------+-------------------------+
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
Here are some examples to illustrate when error handling codes are required on
|
||||
a module level.
|
||||
|
||||
**Example_1: Analyze the function 'partition_mode_vpci_init'**
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @pre vm != NULL
|
||||
* @pre vm->vpci->pci_vdev_cnt <= CONFIG_MAX_PCI_DEV_NUM
|
||||
*/
|
||||
static int32_t partition_mode_vpci_init(const struct acrn_vm *vm)
|
||||
{
|
||||
struct acrn_vpci *vpci = (struct acrn_vpci *)&(vm->vpci);
|
||||
struct pci_vdev *vdev;
|
||||
struct acrn_vm_config *vm_config = get_vm_config(vm->vm_id);
|
||||
struct acrn_vm_pci_ptdev_config *ptdev_config;
|
||||
uint32_t i;
|
||||
|
||||
vpci->pci_vdev_cnt = vm_config->pci_ptdev_num;
|
||||
|
||||
for (i = 0U; i < vpci->pci_vdev_cnt; i++) {
|
||||
vdev = &vpci->pci_vdevs[i];
|
||||
vdev->vpci = vpci;
|
||||
ptdev_config = &vm_config->pci_ptdevs[i];
|
||||
vdev->vbdf.value = ptdev_config->vbdf.value;
|
||||
|
||||
if (vdev->vbdf.value != 0U) {
|
||||
partition_mode_pdev_init(vdev, ptdev_config->pbdf);
|
||||
vdev->ops = &pci_ops_vdev_pt;
|
||||
} else {
|
||||
vdev->ops = &pci_ops_vdev_hostbridge;
|
||||
}
|
||||
|
||||
if (vdev->ops->init != NULL) {
|
||||
if (vdev->ops->init(vdev) != 0) {
|
||||
pr_err("%s() failed at PCI device (vbdf %x)!",
|
||||
__func__, vdev->vbdf);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
``get_vm_config`` is called by ``partition_mode_vpci_init``.
|
||||
There are one pre-condition and two post-conditions of ``get_vm_config``.
|
||||
It indicates that the caller of ``get_vm_config`` shall guarantee these
|
||||
pre-conditions and ``get_vm_config`` itself shall guarantee the post-condition.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @pre vm_id < CONFIG_MAX_VM_NUM
|
||||
* @post retval != NULL
|
||||
* @post retval->pci_ptdev_num <= MAX_PCI_DEV_NUM
|
||||
*/
|
||||
struct acrn_vm_config *get_vm_config(uint16_t vm_id)
|
||||
{
|
||||
return &vm_configs[vm_id];
|
||||
}
|
||||
|
||||
**Question_1: Is error checking required for 'vm_config'?**
|
||||
|
||||
No. Because 'vm_config' is getting data from ``get_vm_config`` and the
|
||||
post-condition of ``get_vm_config`` guarantees that the return value is not NULL.
|
||||
|
||||
|
||||
**Question_2: Is error checking required for 'vdev'?**
|
||||
|
||||
No. Here are the reasons:
|
||||
|
||||
a) The pre-condition of ``partition_mode_vpci_init`` guarantees that 'vm' is not
|
||||
NULL. It indicates that 'vpci' is not NULL. Since 'vdev' is getting data from
|
||||
the array 'pci_vdevs[]' via indexing, 'vdev' is not NULL as long as the index
|
||||
is valid.
|
||||
|
||||
b) The post-condition of ``get_vm_config`` guarantees that 'vpci->pci_vdev_cnt'
|
||||
is less than or equal to 'CONFIG_MAX_PCI_DEV_NUM', which is the array size of
|
||||
'pci_vdevs[]'. It indicates that the index used to get 'vdev' is always
|
||||
valid.
|
||||
|
||||
Given the two reasons above, 'vdev' is always not NULL. So, the error checking
|
||||
codes are not required for 'vdev'.
|
||||
|
||||
|
||||
**Question_3: Is error checking required for 'ptdev_config'?**
|
||||
|
||||
No. 'ptdev_config' is getting data from the array 'pci_vdevs[]', which is the
|
||||
physical PCI device information coming from Board Support Package and firmware.
|
||||
For physical PCI device information, the related application constraints
|
||||
shall be defined in the design document or safety manual. For debug purpose,
|
||||
developers could use ASSERT here to catch the Board Support Package or firmware
|
||||
failures, which does not guarantee these application constraints.
|
||||
|
||||
|
||||
**Question_4: Is error checking required for 'vdev->ops->init'?**
|
||||
|
||||
No. Here are the reasons:
|
||||
|
||||
a) Question_2 proves that 'vdev' is always not NULL.
|
||||
|
||||
b) 'vdev->ops' is fully initialized before 'vdev->ops->init' is called.
|
||||
|
||||
Given the two reasons above, 'vdev->ops->init' is always not NULL. So, the error
|
||||
checking codes are not required for 'vdev->ops->init'.
|
||||
|
||||
|
||||
**Question_5: How to handle the case when 'vdev->ops->init(vdev)' returns non-zero?**
|
||||
|
||||
This case indicates that the initialization of specific virtual device fails.
|
||||
Investigation has to be done to figure out the root-cause. Default fatal error
|
||||
handler shall be invoked here if it is caused by a hardware failure or invalid
|
||||
boot information.
|
||||
|
||||
|
||||
**Example_2: Analyze the function 'partition_mode_vpci_deinit'**
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/**
|
||||
* @pre vdev != NULL
|
||||
* @pre vm->vpci->pci_vdev_cnt <= CONFIG_MAX_PCI_DEV_NUM
|
||||
*/
|
||||
static void partition_mode_vpci_deinit(const struct acrn_vm *vm)
|
||||
{
|
||||
struct pci_vdev *vdev;
|
||||
uint32_t i;
|
||||
|
||||
for (i = 0U; i < vm->vpci.pci_vdev_cnt; i++) {
|
||||
vdev = (struct pci_vdev *) &(vm->vpci.pci_vdevs[i]);
|
||||
if ((vdev->ops != NULL) && (vdev->ops->deinit != NULL)) {
|
||||
if (vdev->ops->deinit(vdev) != 0) {
|
||||
pr_err("vdev->ops->deinit failed!");
|
||||
}
|
||||
}
|
||||
/* TODO: implement the deinit of 'vdev->ops' */
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
**Question_6: Is error checking required for 'vdev->ops' and 'vdev->ops->init'?**
|
||||
|
||||
Yes. Because 'vdev->ops' and 'vdev->ops->init' can not be guaranteed to be
|
||||
not NULL. If the VM called ``partition_mode_vpci_deinit`` twice, it may be NULL.
|
||||
|
||||
References
|
||||
**********
|
||||
|
||||
.. [IEC_61508-7] IEC 61508-7:2010, Functional safety of electrical/electronic/programmable electronic safety-related systems - Part 7: Overview of techniques and measures
|
||||
|
Loading…
Reference in New Issue
Block a user