1759 'Virtual Memory and OS/Hardware Interaction' OCT-14

This is a 'Paper Reading' post for Course ECE1759. The topic is 'Virtual Memory and OS/Hardware Interaction'. This paper list is here:

  • H. M. Levy and P. Lipman, "Virtual Memory Management in VAX/VMS", IEEE Computer, Vol. 15, No. 3, March 1982, pp.35-41.
  • Michael J. Feeley, William E. Morgan, Frederic H. Pighin, Anna R. Karlin, and Henry M. Levy, Implementing Global Memory Management in a Workstation Cluster, Proceedings of the 15th ACM Symposium on Operating Systems Principles, Dec. 1995, 29(5): 201-212.
  • Thomas Anderson, Henry Levy, Brian Bershad, and Edward Lazowska. "The Interaction of Architecture and Operating System Design," ACM Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.

Virtual Memory Management in VAX/VMS

Introduction

Three mechanisms to enhance OS performance

  • Caching of pages.
  • Clustering.
  • Process-local replacement, along with swapping.

Core Ideas

Vax Architecture

  • Addresses are 32 bits.
    • 2 bits of segment (called spaces)
    • 21 bits of page number
    • 9 bits of offset (512 byte pages)
  • The four segments are allocated as follows:
    • 00: P0 (for process)
    • 01: p1 (for process)
    • 10: system
    • 11: unused
  • low half of the address space (last bit = 0) is known as the system space, shared by all processes in the system
    • only half system space is used in current arch
    • other half is simply reserved
  • the upper address of the first half address space is process-specific control region, the lower portion is program-specific.
    • program region (p0) grows upwards toward control region
    • control region (p1) grows downwards toward program region.
  • each region defined by page table entries (PTEs)
    • PTE contains
      • valid bit (31)
      • protection field (30:27) indicating required privilege
      • modify bit (26) indicating write has occurred to the page
      • field use by OS (25:21)
      • physical page frame number (20:0)
  • P0 and P1 page tables for every process are located in system space section of the address space.
    • base registers for p0 and p1 can be paged because they are in virtual memory
  • System page tables are allocated in contiguous physical memory.
  • Hardware provides a translation buffer for caching virtual to physical translations.
    • two sections
      • system translations
      • process translations
    • only process translations need to be flushed.
    • The P0 and P1 page tables are process specific, the corresponding base and length registers are changed by a process context switch.

Use of address space by VAX/VMS

  • The first few pages of system space, called the vector region, contain pointers to executive service routines in system space.
  • Users call service routines as they would call any user-written procedure.
  • Prevent access to 1st page of P0 to catch program errors executing on null pointers.

Memory Management Implementation

  • Concerns:
    • effect one a heavily paging programs in the system
    • high cost of program startup and restart time in VM environments
    • increased disk workload due to paging
    • processor time consumed by searching page lists in the OS
  • Management system is divided into pager and swapper.
    • pager is an OS procedure that runs as a result of a page fault
    • swapper is responsible for loading/storing pages
  • page replacement
    • global placement
      • bad for other processes
    • process local page replacement
      • limit on physical memory of a process
      • FIFO replacement for selecting page from resident set.
      • set reference bit whenever referencing a page (May introduce high cost when the program is large)
  • Pages when being removed will look at modify bit to determine whether page can be added to the free list or modified list (written later)
  • Free list also helps reduce fault rates
  • Delays modified page writes (modified list)
    • modified page list acts as a cache of recently removed pages
    • if a page on the list is referenced, it can be returned at low cost
    • when a write request is performed, many pages written at once
    • pages can be arranged on paging file so that they are clustered
    • because writes are delayed, many page writes are entirely avoided because either the pages are faulted and/or program terminates.
  • Demand Zeroing & Copy-On-Write
  • Paging in system space:
    • in system space, paged and non-paged code and data.
    • once the system is full, system page faults cause pages to be removed from system RSS and placed on a free or modified page list.
      • exception is the process page tables
      • process page table is added to process private resident set and not eligible for removal from resident set as long as it contains valid PTEs
  • Swapper
    • keep highest priority processes resident
    • avoid typically high page rates that occur when resuming a process.
  • When a process id removed from memory, the swapper writes all resident-set pages to a contiguous section of the swap file, including the pages with I/O in progress.
  • Process not in memory becomes able to execute, read in pages from the swapper
  • Will not load a process to execute unless there is sufficient physical memory for the entire resident set.

Program Control of Memory

  • VAX/VMS provide a number of executive services:
    • Expand (or contract) its P0 or P1 region.
    • Increase (or decrease) the resident set size.
    • Lock (or unlock) pages in the resident set so that they are always resident with the process.
    • Lock (or unlock) the process in memory (the balance set) so that it is never swapped.
    • Create and/or map a global or private section into the process address space.
    • Produce a record of the page-fault activity of the process.

Lectures

  • VAX was implemented in MIPS, is another series of DEC's products besides PDP
  • First to assume 32-bit address
  • Why the kernel space of each process is shared
    • kernel space and user space has different page table.
    • better performance of system call.

Reference

Implementing Global Memory Management in a Workstation Cluster

Introduction

Describes the design and implementation of global memory management in a workstation cluster.

The objective is to use a single, unified, but distributed memory management algorithm at the lowest level of the operating system.

By inserting a global memory management algorithm at the lowest OS level, the system integrates all cluster memory for use by all higher-level functions, including VM paging, mapped files, and file system buffering.

Comparison

  1. Low level inplementation.

  2. Unlike other systems, GMS make global management that it attempting to make good choices both for the faulting node and the cluster as whole.

  3. Handle addition and deletion of the nodes without user intervention.

  4. Have an implementation that is well integrated into a production operating system:

Core Ideas

Algorithm

The Basic Algorithm

  • Assuming that nodes trust each other but may crash at any time.

  • All nodes run the same algorithm and attempt to make choices that are good in a global cluster sense, as well as for the local node.

  • pages:

    • local pages&global pages
      • global pages are pages stored in P’s memory on behalf of other nodes.
    • private pages&shared pages
      • shared pages occur because two or more nodes might access a common file exported by a file server.
    • A shared page may be found in the active local memories of multiple nodes; however, a page in global memory is always private.
  • 4 possible cases of fault:

    • The faulted page is in the global memory of another node, Q.
    • The faulted page is in the global memory of node Q, but P’s memory contains only local pages.
    • The page is on disk.
    • The faulted page is a shared page in the local memory of another node Q.
  • Overall:

    • active nodes: fill their memories with local pages and will begin using remote memory in the cluster.
    • idle nodes: fill their memories with global pages.
  • Swap:

    • on a fault requiring a disk read, the (active) faulting node grows its local memory, while the cluster node with the oldest page (an “idle” node) loses a page to disk.
  • Goal

    • Minimize the total cost of all memory references within the cluster.
  • Performance

    • A local hit is over three orders of magnitude faster than a global memory or disk access.
    • A global memory hit is two to ten times faster than a disk access.

Managing Global Age Information

  • Each node has only approximate information about global pages
    • Target: provide a reasonable tradeoff between the accuracy of information that is available to nodes and the efficiency of distributing that information.
  • Divides time into epochs
    • Each epoch has a maximum duration, and a maximum number of cluster replacements, M, that will be allowed in that epoch.
    • A new epoch is triggered when (normally 5-10s)
      • The duration of the epoch, T, has elapsed
      • M global pages have been replaced
      • The age information is detected to be inaccurate
  • Age information:
    • On every node: both local and global pages.
    • A designated initiator node is used to collect information and send computed result to nodes.
  • When evict a page:
    • check it is older than MinAge.
      • Yes, discard(swap to disk)
      • No, send to another selected node and become its global page
  • The more old pages there are in the network, the longer T should be (and the larger M and A4inAge are);
  • If the expected discard rate is low, T can be larger.

Node Failures and Coherency

  • Node failures do not cause data loss.
    • all pages sent to global memory are clean: A dirty page moves from local to global memory only when it is being written to disk.
    • If a node housing a requested remote page is down, the requesting node simply fetches the data from disk

Discussion

  • Needs:
    • choose those nodes most likely to have idle memory to house global pages.
    • avoid burdening nodes that are actively using their memory.
    • ultimately maintain in cluster-wide primary memory the pages most likely to be globally reused.
    • maintain those pages in the right places.

Welcome to my other publishing channels