|
1
|
- The Discovery of Structure in Programs
|
|
2
|
- Designing new code
- Understanding old code
- Written by others
- Written by current developer some time ago
- Maintenance starts from day two
- Code understanding—the most important and least automated part of
development
|
|
3
|
- The trivial structure
- Lexical
- Grammatical
- Alphabetical list of classes and functions
- Diagrams (Booch, etc.)
- High-level structure
- Relationships between classes (next slide)
- Design ideas, patterns
- Emerging structure
|
|
4
|
- Class A uses class B
- Strongly: it requires #include B.h
- Weakly: it requires forward declaration of B
- Dependency graph analysis
- Levels of abstraction
- Tree-like structure (next slide)
- Cycles and how to break them
|
|
5
|
|
|
6
|
- Brain can only work on tree-like structures
- There are 7 +/- 2 “registers”
- There is a cache that must be pre-loaded
- There is long-term, slow-search, memory
- Reality is infinite, brain is finite. Something must be discarded.
- Abstracting—creating an item that can fit in a register
|
|
7
|
- Context switching—re-loading the brain cache (multitasking is expensive)
- Small program, easy to restart the brain
- Maintenance requires code understanding—can take hours or days to fill
the cache + a lot is left to debugging and testing
- Few tools to speed up cache loading
- Needed: ready-made abstractions that fit the registers
|
|
8
|
- Subtracting “irrelevant” features
- What are features?
- Which features are irrelevant?
- Abstracting and Categorizing
- Biology and evolution (next slide)
|
|
9
|
- Primitive organism has access to a featureless input stream—”reality
stream”
- Some things in the stream influence metabolism—selection for primitive
detectors
- Data from detectors are “features”—evolution decides which features are
relevant
- First abstractions: food and danger.
- Particular combinations of features
|
|
10
|
- Cellular automata
- Image processing, discovering features (lines, squares, faces)
- Genetic algorithms, training
- Automata feeding on source code
- Lexing automata
- Parsing automata—require infinite number of states
|
|
11
|
- Cellular automata that can count
- 2-d state space
- Vertical counts parentheses
- Horizontal counts braces
|
|
12
|
- Glue together matching parentheses
- Glue together matching braces
|
|
13
|
- Pre-defined features: documents and words (word breakers)
- Statistics: distribution of words among documents
- Relevant words: words that occur more often in a given document than in
the rest of the corpus
|
|
14
|
- Documents with similar relevant word profiles form clusters
- Categorization of documents based of statistical features
- Categorization—automatic generation of abstractions
|
|
15
|
- Pick a representative set of N relevant words/phrases from the whole
corpus
- Each document is a point in N-dim space
- Distance between documents in N dim
- Add gravitational attraction (potential)
- Documents will start clustering just like galaxies in the early universe
|
|
16
|
|
|
17
|
- Nature gathers statistics in a very slow process
- Science is based on statistics
- Statistical methods (clustering) in program analysis
|
|
18
|
- High-level structure of program
- Reflects programmers’ ideas
- Information encoded in vocabulary: names, comments
- Strong influence of problem domain
- Depends on personal style
- Can be processed like document corpus
|
|
19
|
- Intended structures, for instance Model, View, Controller
- Hidden structures, for instance separation of UI from the “model”
- Horizontal structures, aspects
- Vertical structures, exceptions, exception specification
- Copy and paste programming, code reuse
|
|
20
|
- Overnight automatic program analysis
- Creating a map of abstractions
- Creating various views (bubble diagrams, trees, layers)
- Discovering hidden structures
- The ecosphere of a program
- Evolving automata, feeding on code and on statistical abstractions
- Maintenance by programmer
- Encapsulating hidden structures
- Improving existing abstractions
- Adding new abstractions
|