Chinese tech giant Huawei has proposed a new “SandBox Mode” for the Linux kernel to improve memory safety. The ultimate goal of SandBox Mode is to execute native kernel code in an environment that permits memory access only to predefined addresses. This way, vulnerabilities cannot be exploited or will have no impact on the rest of the kernel. This patch series adds the API and arch-independent infrastructure of SandBox Mode to the kernel. It runs the target function on a vmalloc()’ed copy of all input and output data. This alone prevents some out-of-bounds accesses thanks to guard pages. The SandBox Mode API allows running each component inside an isolated execution environment.
In particular, memory areas used as input and/or output are isolated from the rest of the kernel and surrounded by guard pages. Without arch hooks, this common base provides weak isolation. On architectures that implement the necessary arch hooks, SandBox Mode leverages hardware paging facilities and CPU privilege levels to enforce the use of only these predefined memory areas. With arch support, SBM can also recover from protection violations. This means that SBM forcibly terminates the sandbox and returns an error code (e.g. -EFAULT) to the caller, so execution can continue. Such implementation provides strong isolation.
Linux emails show that Petr Tesarik of Huawei Cloud issued a “request for comments” patch series about the new sandbox mode. Petr described the sandbox mode as:
The ultimate goal of sandbox mode is to execute native kernel code in an environment that only allows memory access to predefined addresses, so potential vulnerabilities cannot be exploited or have no impact on the rest of the kernel.
This patch series adds sandbox-mode APIs and architecture-independent infrastructure to the kernel. It runs the target function on a vmalloc() copy of all input and output data. This alone prevents some out-of-bounds access due to the protected page.
SandBox Mode Proposal
Petr Tesarik with Huawei sent out the “request for comments” patch series on the new SandBox Mode. The sandbox mode document provides further supplementary descriptions as follows:
Gizchina News of the week
The main goal of Sandbox Mode (SBM) is to reduce the impact of potential memory safety bugs in the kernel code by breaking up the kernel. The SBM API allows each component to be run in an isolated execution environment. In particular, memory regions used as inputs and/or outputs are isolated from the rest of the kernel and surrounded by guard pages.
On an architecture that implements the necessary arch hooks, sandbox mode leverages hardware paging facilities and CPU privilege levels to force the use of only these predefined memory regions. With support from Arch, SBM can also recover from protection breaches. This means that SBM forcibly terminates the sandbox and returns an error code (such as “-EFAULT“) to the caller so that execution can continue. This implementation provides *strong isolation*.
GMEM Proposal
Huawei engineer Weixi Zhu announced their work on Tuesday around GMEM hoping to avoid all the code duplication. The GMEM proposal sums up the current issue/challenge rather well. A throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems.
With the proposed GMEM code, the Linux memory management “MM” subsystem is extended to share its machine-independent code while providing only a high- In turn, GMEM should allow more re-use by drivers without reinventing the wheel. GMEM has been tested with Huawei’s neural processing unit device driver. Turning to GMEM allowed the Huawei NPU driver alone to cut down on 26k lines of code. There are other benefits as well as laid out in the GMEM proposal. The GMEM proposal can be found in full on dri-devel while it awaits review and feedback from other Linux device drivers.
Conclusion
In conclusion, Huawei’s proposed SandBox Mode for the Linux kernel is aimed at improving memory safety by executing native kernel code in an environment that permits memory access only to predefined addresses. The SandBox Mode API allows running each component inside an isolated execution environment. On architectures that implement the necessary arch hooks, SandBox Mode leverages hardware paging facilities. It also uses CPU privilege levels to enforce the use of only these predefined memory areas. With arch support, SBM can also recover from protection violations. This means that SBM forcibly terminates the sandbox. It then returns an error code (e.g. -EFAULT) to the caller, so execution can continue. Such implementation provides strong isolation.