- •Contents
- •Introduction
- •1.1 Reverse Engineering
- •1.2 The eLib Program
- •1.3 Class Diagram
- •1.4 Object Diagram
- •1.5 Interaction Diagrams
- •1.6 State Diagrams
- •1.7 Organization of the Book
- •The Object Flow Graph
- •2.1 Abstract Language
- •2.1.1 Declarations
- •2.1.2 Statements
- •2.2 Object Flow Graph
- •2.3 Containers
- •2.4 Flow Propagation Algorithm
- •2.5 Object sensitivity
- •2.6 The eLib Program
- •2.7 Related Work
- •Class Diagram
- •3.1 Class Diagram Recovery
- •3.2 Declared vs. actual types
- •3.2.2 Visualization
- •3.3 Containers
- •3.4 The eLib Program
- •3.5 Related Work
- •3.5.1 Object identification in procedural code
- •Object Diagram
- •4.1 The Object Diagram
- •4.2 Object Diagram Recovery
- •4.3 Object Sensitivity
- •4.4 Dynamic Analysis
- •4.4.1 Discussion
- •4.5 The eLib Program
- •4.5.1 OFG Construction
- •4.5.2 Object Diagram Recovery
- •4.5.3 Discussion
- •4.5.4 Dynamic analysis
- •4.6 Related Work
- •Interaction Diagrams
- •5.1 Interaction Diagrams
- •5.2 Interaction Diagram Recovery
- •5.2.1 Incomplete Systems
- •5.2.2 Focusing
- •5.3 Dynamic Analysis
- •5.3.1 Discussion
- •5.4 The eLib Program
- •5.5 Related Work
- •State Diagrams
- •6.1 State Diagrams
- •6.2 Abstract Interpretation
- •6.3 State Diagram Recovery
- •6.4 The eLib Program
- •6.5 Related Work
- •Package Diagram
- •7.1 Package Diagram Recovery
- •7.2 Clustering
- •7.2.1 Feature Vectors
- •7.2.2 Modularity Optimization
- •7.3 Concept Analysis
- •7.4 The eLib Program
- •7.5 Related Work
- •Conclusions
- •8.1 Tool Architecture
- •8.1.1 Language Model
- •8.2 The eLib Program
- •8.2.1 Change Location
- •8.2.2 Impact of the Change
- •8.3 Perspectives
- •8.4 Related Work
- •8.4.1 Code Analysis at CERN
- •Index
152 7 Package Diagram
Although no concept partition emerges, it is possible to partition the
classes based |
on the two concepts |
and |
by considering all |
classes |
in |
|
the extent |
of |
as one group, and all classes in the extent of |
but not |
in |
||
the extent |
of |
as a second group. The associated class partition is reported |
in the last line of Table 7.2.
Different techniques and different properties have been exploited to recover a package diagram from the source code of the eLib program. Nonetheless, the results produced in the various settings are very similar with each other (see Table 7.2). They differ at most for the position of one or two classes. A strong cohesion among the classes User, Document, Loan was revealed by all of the considered techniques. Actually, these three classes are related to the overall functionality of this application that deals with loan management. Even if different points of view are adopted (the relationships among classes, the declared types, etc.), such a grouping emerges anyway. The eLib program is a small program that does not need be organized into multiple packages. However, if a package structure is to be superimposed, the package diagram recovery methods considered above indicate that a package about loan management containing the classes User, Document, Loan could be introduced. The class diagram of the eLib program (taken from Fig. 1.1) with such a package structure superimposed is depicted in Fig. 7.10.
7.5 Related Work
The problem of gathering cohesive groups of entities from a software system has been extensively studied in the context of the identification of abstract data types (objects), program understanding, and module restructuring, with reference to procedural code. Some of these works [13, 51, 102] have already
7.5 Related Work |
153 |
Fig. 7.10. Package diagram for the eLib program.
been discussed in Chapter 3. Others [4, 52, 54, 91, 99] are based on variants of the clustering method described above.
Atomic components can be detected and organized into a hierarchy of modules by following the method described in [26]. Three kinds of atomic components are considered: abstract state encapsulations, grouping global variables and accessing procedures, abstract data types, grouping user defined types and procedures with such types in their signature, and strongly connected components of mutually recursive procedures. Dominance analysis is used to hierarchically organize the retrieved components into subsystems.
Some of the approaches to the extraction of software components with high internal cohesion and low external coupling exploit the computation of software metrics. The ARCH tool [73] is one of the first examples embedding the principle of information hiding, turned into a measure of similarity between procedures, within a semi-automatic clustering framework. Such a method incorporates a weight tuning algorithm to learn from the design decisions in disagreement with the proposed modularization. In [11, 22] the purpose of retrieving modular objects is reuse, while in [61] metrics are used to refine the decomposition resulting from the application of formal and heuristic modularization principles. Another different application is presented in [46], where cohesion and coupling measures are used to determine clusters of pro-
154 7 Package Diagram
cesses. The problem of optimizing a modularity quality measure, based on cohesion and coupling, is approached in [54] by means of genetic algorithms, which are able to determine a hierarchical clustering of the input modules. Such a technique is improved in [55] by the possibility to detect and properly assign omnipresent modules, to exploit user provided clusters, and to adopt orphan modules. In [53] a complementary clustering mechanism is applied to the interconnections, resulting in the definition of tube edges between subsystems. Usage of genetic algorithms in software modularization is investigated also in [32], where a new representation of the assignment of components to modules and a new crossover operator are proposed.
Other relevant works deal with the application of concept analysis to the modularization problem. In [24, 45, 77] concept analysis is applied to the extraction of code configurations. Modules associated with specific preprocessor directive patterns are extracted and interferences are detected. In [50, 71, 75, 84, 94], module recovery and restructuring is driven by the concept lattice computed on a context that relates procedures to various attributes, such as global variables, signature types, and dynamic memory access.
The main difference between module restructuring based on clustering and module restructuring based on concepts is that the latter gives a characterization of the modules in terms of shared attributes. On the contrary, modules recovered by means of clustering have to be inspected to trace similarity values back to their commonalities.
Module restructuring methods based on concepts suffer from the difficulty of determining partitions, i.e., non overlapping and complete groupings of program entities. In fact, concept analysis does not assure that the candidate modules (concepts) it determines are disjoint and cover the whole entity set. In the approach proposed in [88], such a problem is overcome by using concept subpartitions, instead of concept partitions, and by providing extension rules to obtain a coverage of all of the entities to be modularized.