Virtual Machines and Hypervisors

September 4, 2016

Virtual Machines and Hypervisors

For reasons of efficiency or convenience, it is desirable that an entire operating system be run as a guest of another operating system. The host operating system creates a software entity that models the hardware of a physical machine, called the virtual machine (VM), and presents this to the guest operating system. One of the earliest examples was the IBM CP/CMS operating system, created in 1968, and adopted then in the IBM VM/370 product in 1972.

There are many ways to implement the hardware model which is a VM. In full emulation, the machine is entirely simulated by software. Each instruction that is to be run is considered by a software emulator for the effects it would have on the hardware platform, and those effects are recorded in data structures that stand proxy for all of the hardware assets being emulation. This includes CPU registers, IO devices, and the contents byte-by-byte of the emulated physical memory. This method is very slow, as a single machine instruction of the guest might involve 100’s if not 1000’s of instructions on the actual hardware.

It would be faster to allow the host operating system to run natively and directly on the hardware, subject to the constraint that no instruction effect should conflict with the correct working of the host operating system. Those instructions that would cause a conflict are redirected to the host and handled by software emulation. If this can be done without any modification of the guest operating system, the method is by full virtualization. Else, if the guest operating system needs to be modified, it is called paravirtualization. Xen is an example of a very successful paravirtualized operating system.

Both para- and full virtualization require that the guest operating system not run with full privilege. Instructions that the guest operating system cannot directly run will cause a trap into a new software entity called the hypervisor or virtual machine monitor (VMM), that will switch context into the host operating system along with presenting the request that the trapped instruction be emulated. Especially for full virtualization, this requires (or at least is aided-by) new hardware capabilities in the CPU. The Linux kvm module leverages Intel’s VT-x and AMD’s AMD-V extended instruction set and hardware capabilities to carry out full virtualization.

Consider the isolation between user and kernel modes, in, for example, the Intel rings of protection model. In full emulation, the emulator simulate the presence of the rings of protection. While an unmodified operating system runs in rings 3 for user and 0 for kernel, a para-virtualized operating system will run in ring 3 for user and 1 for kernel. The hypervisor and the host operating system will continue to run in ring 0. The para-virtualized operating system can mostly run unimpeded, but operations that would interfere with the host or other guests will be reserved to ring 0.

In a fully virtualized solution, all guests and the host will run as normally in rings 3 and 0. However, the hypervisor will be invoked in some manner. Current CPU’s have extended their architecture to understand root and non-root modes, and the host and VMM will run in the root mode. Troublesome operations will not be allowed in non-root mode, and their use by a guest will cause a trap to the root, passing through the VMM, so that the host might emulate the operation that is being requested.

Another area where full virtualization demands new hardware capabilities is in the management of virtual memory. In order that multiple processes do not interfere in the use of memory addresses, the logical addresses as found in the instruction stream need not be equal to the physical addresses where the CPU loads and fetches in physical memory data stored at that logical address. This mapping uses page tables, and the page tables are maintained by the memory management subsystem of the operating system.

With full virtualization, what the guest operating system believes to be a physical address cannot be the true address in physical memory. Guest operating systems are both unaware of each other, and disinterested in mutual cooperation. Hardware extended for virtual machines introduce a third address space, machine addresses, and these are the actual address where the data is stored in physical memory. The electronics of a fetch ignores the usual page tables, which map from virtual addresses to physical addresses, and use shadow page tables, which map from virtual addresses to machine addresses. On manipulation of the page tables, the guest causes a trap to the hypervisor, and the hypervisor or the host prepare a virtual-machine correspondence for the newly updated virtual-physical correspondence, and load that into the shadow page tables, then return to the guest.

Paravirtualization of the virtual memory system would require a rewrite of the guest memory management system. When needing a physical page, the guest must always call the host, and the host will allocate and provide a physical page exclusive to that guest. The guest kernel could not expect that a request for a physical page be at a specified physical address, as that page might already be allocated to another guest. The inability to satisfy requests for specific physical memory locations might prove to be a nearly insurmountable problem to paravirtualizing certain operating systems.

posted in CSC521, Uncategorized by admin

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org