Notes
Slide Show
Outline
1
Dealing with Software Complexity
  • The Discovery of Structure in Programs
2
Software Development
  • Designing new code
  • Understanding old code
    • Written by others
    • Written by current developer some time ago
  • Maintenance starts from day two
  • Code understanding—the most important and least automated part of development
3
Software Understanding
  • The trivial structure
    • Lexical
    • Grammatical
    • Alphabetical list of classes and functions
    • Diagrams (Booch, etc.)
  • High-level structure
    • Relationships between classes (next slide)
    • Design ideas, patterns
    • Emerging structure
4
Class Dependencies
  • Class A uses class B
    • Strongly: it requires #include B.h
    • Weakly: it requires forward declaration of B
  • Dependency graph analysis
  • Levels of abstraction
    • Tree-like structure (next slide)
    • Cycles and how to break them
5
Hierarchy of Classes
6
Neurology
  • Brain can only work on tree-like structures
    • There are 7 +/- 2 “registers”
    • There is a cache that must be pre-loaded
    • There is long-term, slow-search, memory
  • Reality is infinite, brain is finite. Something must be discarded.
  • Abstracting—creating an item that can fit in a register
7
Programmer’s Brain
  • Context switching—re-loading the brain cache (multitasking is expensive)
  • Small program, easy to restart the brain
  • Maintenance requires code understanding—can take hours or days to fill the cache + a lot is left to debugging and testing
  • Few tools to speed up cache loading
  • Needed: ready-made abstractions that fit the registers
8
Abstractions
  • Subtracting “irrelevant” features
    • What are features?
    • Which features are irrelevant?
  • Abstracting and Categorizing
  • Biology and evolution (next slide)
9
The Origin of Abstraction
  • Primitive organism has access to a featureless input stream—”reality stream”
    • Some things in the stream influence metabolism—selection for primitive detectors
    • Data from detectors are “features”—evolution decides which features are relevant
  • First abstractions: food and danger.
    • Particular combinations of features
10
Imitating Life
  • Cellular automata
    • Image processing, discovering features (lines, squares, faces)
    • Genetic algorithms, training
  • Automata feeding on source code
    • Lexing automata
    • Parsing automata—require infinite number of states
11
Scope Discovery
  • Cellular automata that can count
  • 2-d state space
    • Vertical counts parentheses
    • Horizontal counts braces
12
Bubble Diagrams
  • Glue together matching parentheses
  • Glue together matching braces
13
Document Processing
  • Pre-defined features: documents and words (word breakers)
  • Statistics: distribution of words among documents
  • Relevant words: words that occur more often in a given document than in the rest of the corpus
14
Clustering
  • Documents with similar relevant word profiles form clusters
  • Categorization of documents based of statistical features
  • Categorization—automatic generation of abstractions
15
Clustering
  • Pick a representative set of N relevant words/phrases from the whole corpus
  • Each document is a point in N-dim space
  • Distance between documents in N dim
  • Add gravitational attraction (potential)
  • Documents will start clustering just like galaxies in the early universe
16
Example of Clustering
17
Statistics & Abstractions
  • Nature gathers statistics in a very slow process
  • Science is based on statistics
    • Physics
    • Math
  • Statistical methods (clustering) in program analysis
18
Programs as Documents
  • High-level structure of program
    • Reflects programmers’ ideas
    • Information encoded in vocabulary: names, comments
    • Strong influence of problem domain
    • Depends on personal style
  • Can be processed like document corpus
    • Relevant words, clusters
19
Statistical Discovery
  • Intended structures, for instance Model, View, Controller
  • Hidden structures, for instance separation of UI from the “model”
  • Horizontal structures, aspects
  • Vertical structures, exceptions, exception specification
  • Copy and paste programming, code reuse
20
Integration with Tools
  • Overnight automatic program analysis
    • Creating a map of abstractions
    • Creating various views (bubble diagrams, trees, layers)
    • Discovering hidden structures
  • The ecosphere of a program
    • Evolving automata, feeding on code and on statistical abstractions
  • Maintenance by programmer
    • Encapsulating hidden structures
    • Improving existing abstractions
    • Adding new abstractions