This is a 'Paper Reading' post for Course ECE1759. The topic is 'Bugs'. This paper list is here:
- Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem, Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, In Proceedings of the 4th Symposium on Operating System Design & Implementation, October 2000, San Diego, California, pp. 1-16.
- Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson, Eraser: a dynamic data race detector for multithreaded programs, ACM Transactions on Computer Systems (TOCS), Nov. 1997, Pages 391-411
- Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou, CP-Miner: a tool for finding copy-paste and related bugs in operating system code, Proceedings of the 6th conference on Symposium on Opearting Systems Design and Implementation (OSDI), 2004
Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions
Introduction
- Systems have rules like write before locked. The paper introduces MC(meta-level compilation) extensions to check these rules.
- Previous, several ways:
- verification: build an abstract specication of the code
- hard to construct
- testing:
- hard to cover
- running the test cause practical problems like need many devices.
- detect rule violations is manual inspection:
- hard to cover
- reliability problems
- verification: build an abstract specication of the code
- Assumption of compiler extensions:
- many rules have a straightforward mapping to program source
- Implementers need to write extensions in a high-level state-machine language, metal
- extensions are dynamically linked into extensible compiler, xg++
- extensions use language-based patterns to recognize operations that they care about. Then, when the input code matches these patterns, they detect rule violations by transitioning between states that allow or disallow other operations.
- Contribution
- High-level compilation
- Systems for finding software errors.
- more generally effective though with weaker analysis
- extensions are 2-4 times smaller, have less bugs, and handle more cases
- Mainly limited to syntax-based tree traversal or transformation and do not have data flow information.
Core Ideas
Meta-level Compilation
- compiler extensions are written in a high-level, state-machine language, metal.
- state machine is syntactically similar to a "yacc" specication.
- SMs use patterns to search for interesting source code features. when matched, cause transitions between states.
- Patterns are written in an extended version of the base language (C++)
- For a given state, metal checks pattern rules in lexical order. If any code matches the specified patterns, metal processes this matching code, sets the state to the new state (the token after the ==> operator), and executes the action.
Practical issues
- use cache to prune redundant code paths.
- For each node in the input flow-graph, it records the set of states in which it has been visited. If an SM arrives at a node in the same state as a previous instance, the system prunes it.
Caveats
- MC extensions are checkers rather than verifiers
- produce a number of false positives
- incompatibility with C language.
Eraser: a dynamic data race detector for multithreaded programs
Introduction
- Target: detects data races in multithreaded programs.
- Features:
- dynamic race detection
- Before:
- Tools based on happens-before
- difficult to implement efficiently
- dependent on the interleaving produced by the scheduler(if x do not using lock, the data racing will not consider x)
- Tools based on happens-before
Core Ideas
THE LOCKSET ALGORITHM
- For each memory shared check lockset. When access join the being used locks with lockset. Finally, if the lockset is empty, return warning.
- Problems, the practical scenarios are complicated.
Improvement
- Initialization: Shared variables are frequently initialized without holding a lock.
- Read-Shared Data: Some shared variables are written during initialization only and are read-only thereafter. These can be safely accessed without locks.
- Read-Write Locks: Read-write locks allow multiple readers to access a shared variable, but allow only a single writer to do so.
Reference
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
Introduction
- A major reason why copy-paste introduces bugs is that programmers forget to modify identifiers
- Previous problems
- Efficiency
- Tolerance to modifications
- Bug detection
- Main contribution:
- A scalable copy-paste detection tool for large software
- Detection of bugs associated with copy-paste
- Statistical study of copy-pasted code distribution in operating system code
Core Ideas
- Previous techniques for copy-paste detection
- string-based
- does not exploit any lexical information
- parse-tree-based
- introduce false positives
- token-based
- string-based
- CP-Miner is token-based
Lectures
- phase of OS research
OS tech | Post OS | |
---|---|---|
Time | before 1990s | after 1990s |
Content | Unix, FFS(maybe last one) | Virtualization |
Soft update | Big data | |
Bugs |
static | dynamic | ||
---|---|---|---|
Memory Bugs | serious | MC extensions | |
Data Race | rare | Eraser | |
Semantic | common | CP-Miner | |
full coverage easy to use can’t reproduce |
limited to input hard to use always reproduce |