Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Springer Science - 2005 - Reverse Engineering of Object Orie.pdf
Скачиваний:
17
Добавлен:
15.08.2013
Размер:
6.11 Mб
Скачать

156 8 Conclusions

8.1 Tool Architecture

Implementation of the algorithms described in the previous chapters is affected by practical concerns, such as the target programming language, the available libraries, the graphical format of the resulting diagrams, etc. However, it is possible to devise a general architecture to be instantiated in each specific case. In this architecture, functionalities are assigned to different modules, so as to achieve a decomposition of the main task into manageable, well-defined sub-tasks. In turn, each module requires a specialization that depends on the specific setting in which the actual implementation is being built.

Fig. 8.1. General architecture of a reverse engineering tool.

Fig. 8.1 shows the main processing steps performed by the modules composing a reverse engineering tool. The first module, Parser, is responsible for handling the syntax of the source programming language. It contains the grammar that defines the language under analysis. It parses the source code and builds the derivation tree associated with the grammar productions. A higher-level view of the derivation tree is preferable, in order to decouple successive modules from the specific choices made in the definition of the grammar for the target language. Specifically, the intermediate non-terminals used in each grammar production are quite variable, being strongly dependent on the way the parser handles ambiguity (e.g., bottom-up and top-down parsers require very different organizations of the non-terminals). For this reason, it is convenient to transform the derivation tree into a more abstract tree representation of the program, called the Abstract Syntax Tree (AST). In this program representation, chains of intermediate non-terminals are collapsed, and only the main syntactic categories of the language are represented [2].

The AST is a program representation that reflects the syntactic structure of the code. However, reverse engineering tools are based on a somewhat different view of the source code. In the remainder of this chapter, this view is referenced as the language model assumed by a reverse engineering tool. In a language model, several syntactic details can be safely ignored. For example, the tokens delimiting blocks of statements (curly braces, begin, end, etc.) are irrelevant, while the information of interest is the actual presence of a

8.1 Tool Architecture

157

sequence of statements. Thus, in the language model, tokens such as delimiters of statement blocks and parameters, separators in parameter lists and statement sequences, etc., are absent. On the other hand, information not explicitly represented in the AST is made directly available in the language model. For example, each variable involved in an expression is linked to its declaration. Each method call is resolved in terms of all the type-compatible definitions of the invoked method. Each class is associated with its superclass, as well as the interfaces it implements. Such cross-references are not obtained by means of plain identifiers, as in the AST, but are links toward the referenced elements in the language model. For example, if class A extends class B, the AST for class A contains just a child node for the extends clause, leading to the identifier B, while in the language model an association exists between the model element for class A and the model element for class B. An example of (simplified) language model for the Java language is described in detail below. The module responsible for building the language model out of the AST of an input program is the Model Extractor (see Fig. 8.1).

Based upon the language model of the input program, reverse engineering algorithms can be executed to recover alternative design views. The output is a set of diagrams to be displayed to the user. In some cases, a further abstraction of the language model that Reverse Engineering algorithms have in input is necessary. For example, most (but not all) of the techniques described in the previous chapters require that the data flows in the target Object Oriented program be abstracted into a data structure called the Object Flow Graph (OFG). Such a data structure is built internally into the Reverse Engineering module and is shared by all the algorithms that depend on it. Flow propagation of proper information inside the OFG leads to the recovery of the design views of interest. These are converted into a graphical format of choice, in order for the final user to be able to visualize them.

8.1.1 Language Model

Since reverse engineering techniques span over a wide spectrum, depending on the kind of high-level information being recovered, it is quite important to design a general language model that supports all of the alternative algorithms. In turn, each algorithm may have an internal representation of the source code, different from the language model itself. However, the main requirement on the language model is that all the information necessary for the reverse engineering algorithms to work and (possibly) build their own internal data structures must be available in the language model. Thus, the language model plays a critical, central role in the architecture described above and should be designed very carefully. An example of such a model is given in Fig. 8.2 for the Java language. Only the most important entities are shown (for space reasons), with no indication of their properties.

A Java source file contains the definition of classes within a name space called package. In turn, packages can be nested. Thus, the topmost entity

158 8 Conclusions

Fig. 8.2. Simplified Java language model. Containment and inheritance relationships are shown.

in the language model for Java (see Fig. 8.2, left) is the package and a selfcontainment relationship in the package entity represents nesting. Eventually, packages contain classes (containment from package to class in Fig. 8.2). The main property of the entity package (not shown in Fig. 8.2) is its name, that uniquely identifies it.

The properties of the entity class include the name, visibility, as well as its superclass, implemented interfaces, etc. The entities in turn contained inside classes are the class members. Thus, the entity class is connected to the entity attribute and to the entity method. Moreover, classes can be nested inside other classes. This is the reason for the self-containment outgoing from the entity class.

The entity attribute has properties such as name, type, visibility, initializer, etc. Similarly, the entity method has properties such as name, formal parameters, return type, visibility, etc. The body of each method is represented as a sequence of statements in the language model (containment from method to statement labeled body in Fig. 8.2).

Statements can be of different types. Some of them are enumerated in Fig. 8.2, connected to their abstraction statement by an inheritance relationship. Conditional statements are used for constructs such as if and switch. Among their properties, they hold a reference to the expression entity used in the tested condition (not shown in Fig. 8.2). The if conditional statement has a then-part and an else-part, which are in turn sequences of statements (similarly to the body of a method). The switch statement is associated with a sequence of cases, each containing the respective statements to execute.

Loop statements include while, for and do-while loops. Their main properties are the tested condition (an expression entity, not shown in Fig. 8.2) and the loop body (a sequence of statements). For loops have also an initializer and an increment part.

Assignment statements have two main components, the left hand side and the right hand side. While the latter is a generic expression, the former must eventually reference a location. This is achieved by constraining it to a unary expression, instead of a generic expression.