1759 'Bugs' NOV-25

This is a 'Paper Reading' post for Course ECE1759. The topic is 'Bugs'. This paper list is here:

  • Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem, Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, In Proceedings of the 4th Symposium on Operating System Design & Implementation, October 2000, San Diego, California, pp. 1-16.
  • Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson, Eraser: a dynamic data race detector for multithreaded programs, ACM Transactions on Computer Systems (TOCS), Nov. 1997, Pages 391-411
  • Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou, CP-Miner: a tool for finding copy-paste and related bugs in operating system code, Proceedings of the 6th conference on Symposium on Opearting Systems Design and Implementation (OSDI), 2004

Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions

Introduction

  • Systems have rules like write before locked. The paper introduces MC(meta-level compilation) extensions to check these rules.
  • Previous, several ways:
    • veri fication: build an abstract speci cation of the code
      • hard to construct
    • testing:
      • hard to cover
      • running the test cause practical problems like need many devices.
    • detect rule violations is manual inspection:
      • hard to cover
      • reliability problems
  • Assumption of compiler extensions:
    • many rules have a straightforward mapping to program source
  • Implementers need to write extensions in a high-level state-machine language, metal
  • extensions are dynamically linked into extensible compiler, xg++
    • extensions use language-based patterns to recognize operations that they care about. Then, when the input code matches these patterns, they detect rule violations by transitioning between states that allow or disallow other operations.
  • Contribution
    • High-level compilation
    • Systems for finding software errors.
      • more generally effective though with weaker analysis
    • extensions are 2-4 times smaller, have less bugs, and handle more cases
      • Mainly limited to syntax-based tree traversal or transformation and do not have data flow information.

Core Ideas

Meta-level Compilation

  • compiler extensions are written in a high-level, state-machine language, metal.
    • state machine is syntactically similar to a "yacc" speci cation.
    • SMs use patterns to search for interesting source code features. when matched, cause transitions between states.
      • Patterns are written in an extended version of the base language (C++)
    • For a given state, metal checks pattern rules in lexical order. If any code matches the specified patterns, metal processes this matching code, sets the state to the new state (the token after the ==> operator), and executes the action.

Practical issues

  • use cache to prune redundant code paths.
  • For each node in the input flow-graph, it records the set of states in which it has been visited. If an SM arrives at a node in the same state as a previous instance, the system prunes it.

Caveats

  • MC extensions are checkers rather than verifiers
  • produce a number of false positives
  • incompatibility with C language.

Eraser: a dynamic data race detector for multithreaded programs

Introduction

  • Target: detects data races in multithreaded programs.
  • Features:
    • dynamic race detection
  • Before:
    • Tools based on happens-before
      • difficult to implement efficiently
      • dependent on the interleaving produced by the scheduler(if x do not using lock, the data racing will not consider x)

Core Ideas

THE LOCKSET ALGORITHM

  • For each memory shared check lockset. When access join the being used locks with lockset. Finally, if the lockset is empty, return warning.
  • Problems, the practical scenarios are complicated.

Improvement

  • Initialization: Shared variables are frequently initialized without holding a lock.
  • Read-Shared Data: Some shared variables are written during initialization only and are read-only thereafter. These can be safely accessed without locks.
  • Read-Write Locks: Read-write locks allow multiple readers to access a shared variable, but allow only a single writer to do so.

Reference

CP-Miner: a tool for finding copy-paste and related bugs in operating system code

Introduction

  • A major reason why copy-paste introduces bugs is that programmers forget to modify identifiers
  • Previous problems
    • Efficiency
    • Tolerance to modifications
    • Bug detection
  • Main contribution:
    • A scalable copy-paste detection tool for large software
    • Detection of bugs associated with copy-paste
    • Statistical study of copy-pasted code distribution in operating system code

Core Ideas

  • Previous techniques for copy-paste detection
    • string-based
      • does not exploit any lexical information
    • parse-tree-based
      • introduce false positives
    • token-based
  • CP-Miner is token-based

Lectures

  • phase of OS research
OS tech Post OS
Time before 1990s after 1990s
Content Unix, FFS(maybe last one) Virtualization
Soft update Big data
Bugs
static dynamic
Memory Bugs serious MC extensions
Data Race rare Eraser
Semantic common CP-Miner
full coverage
easy to use
can’t reproduce
limited to input
hard to use
always reproduce

Welcome to my other publishing channels