Generating Requirement Dependency Graph Based on Class Dependency

⎯A set of software requirements is an important element in software development. Engineers realize that requirements are interrelated. The interconnections between requirements indicate interdependences between requirements. This interdependence is crucial in decision-making processes of requirements engineering, such as a requirements change management, version launch plan, and requirements quality control. Researchers have been focused on visualizing dependency between requirements, analyzing the impact of changes in software by using changes to UML class diagrams, and predicting bug occurrences based on dependencies between requirements. Previous studies assumed that the requirements dependency information was pre-build by requirements engineer during the previous development process. This paper introduces a method that builds a requirements dependency model. The model was built based on realization associations between requirements and classes in the system design as well as dependencies between classes. The modeling process used semantic similarities between the requirements and the classes. A class is said to have a realization association with a requirement if and only if the semantic similarity is higher than a certain threshold. The output obtained from the dependent software development method was compared with the output produced by annotators. The method reliability was measured by the level of agreement between the method and the annotator using kappa statistical index. The preliminary result shows that the method was fair agreement (0.37) reliable as an annotator when generating requirements dependency graph. Keywords⎯ class dependencies, requirements, requirements dependency graph, semantic similarity, threshold.


I. INTRODUCTION 1
oftware requirements engineering is a series of activities includes eliciting, specifying, validating, and managing software requirements. Those activities produce a requirement specification document. It is an iterative and revolutionary process which occurs throughout the development process. Requirements change could happen during the development process. Requirements change statements may affect other requirement statements inevitably. There are several reasons why it is needed [1]. First, dependency requirement can be used to anticipate the impact of changes that occur if a requirement changes. Second, by knowing the impact of changes in a requirement to the other requirements, project manager could estimate the total cost due to the impact of a single requirement changes. Lastly, in the development of a requirement recommendation system, the developer can looks for other depending requirements given a predefined requirement. Interdependence requirements provide necessary information as how requirements dependencies affect activity in software engineering and how interdependence knowledge can facilitates software development.
This paper introduces a methodology to model the impact of requirement changes of a software project. The modelling process produces a requirements dependency graph which is built based on class dependency 1 Hernawati Samosir Informatics Department, Institut Teknologi Del, Toba Samosir, Indonesia 2 Daniel O. Siahaan Informatics Department, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia e-mail: 1 hernawati@del.ac.id, 2 daniel@if.its.ac.id information extracted from class diagrams. The process of generating the model can be taken place after each iteration within a software development cycle. Classes in the class diagram, as a realization of previously defined software requirements, are mapped to a set of requirements from the respective software project. This mapping is based on class-requirement semantic similarity and dependencies between each class.
There are a number of studies related to the graph modeling dependencies [2]- [6]. Widiastuti and Siahaan (2008b) introduced the visualization of requirements dependency in Labeled Transition System for Requirement Change (LTS-RC). LTS-RC is a state transition system of requirement changes which is helpful to visualize the requirements dependency in term of transitions of changes in requirements. The labels represent a predefined weight of changes dependency between requirements. The visualization facilitates the stakeholders to observe the flow of requirements changes and their impact. This method can play a role in the preparation of an optimal need of change strategy [7].
Furthermore, Muller and Rumpe analysed software changes impact by using some changes in UML class diagrams [8]. This study models the impact of changes by using dependencies information between classes. If there is any change, the proposed model is expected to identify the object changed and also its impact. However, this study does not relate the change impact with the level of requirement. In addition, Wang and Wang investigated how the requirements dependencies correlate with software integration bugs and predict the bugs [9]. This study provides early estimation regarding software quality and S facilitate decision making process early in the software lifecycle.

II. METHODOLOGY
This section explains the method to produce modelling requirement dependency. The steps of the methodology proposed are described as follows. 1. Prepare the requirement data and class diagrams, 2. Mapping the requirements and classes, 3. Generate the dependencies based on class dependencies, 4. Generate the requirement dependency model Figure 1 explains that SRS documents and class diagrams are inputs to the dataset. Two datasets element used are requirements statement and class information like class names, attributes and methods. The next step is pre-processing for both inputs. Furthermore, this process generates two types of data, namely: text of requirement and text of class. Value of similarity of those texts is calculated. Furthermore, next process is mapping requirements and class to generate requirement dependency graph. The output of this process is requirement dependency graph.
The detail of those methods is described as follows: a) Prepare the requirements data and class diagrams.
The Software Requirement Specification (SRS) is used to define the requirement data. This requirement data includes the requirement statement and class diagram. For the sake of illustration, a library system is used as an example. Table 1 lists the requirements of the library system. The first column is the requirements identity. The 'F' alphabet in the first character indicates that the respective requirements is a functional requirement statement.  Figure 1 shows the classes that become part of the requirement list. The library system has 10 classes and 2 interface classes. There are Book, Author, Book Item, Account, Library, Catalog, Patron, Librarian, Account, and Library. The interface classes are Search and Manage.
The next step is mapping each functionality to a class in the class diagram. The mapping of each class in class diagram is shown in Table 2. Each requirements statement and class are pre-processed. Pre-processing aims to convert the text input of the requirement statement and text of the class diagram information into current format for the further analysis. The preprocessing includes cleaning process to remove the noise [10]. In the general process, the text must be proceeded first. Unnecessary elements in the text such as: symbols, punctuation, spaces, conjunctures and affixes is needed to be omitted. This process will help in processing and analysing the text for the next process.
The pre-processing phase is shown in Figure 3. The first step is splitting the text into set of words. This step is also known as tokenization. The letters in the alphabet is converted to the lowercase. Furthermore, punctuation removal is used to omit numbers, symbol. The last step is stemming. This step is to remove conjunction and affixes. This will result the only important words. There are two types of the input text: the requirement statement and the information of class diagram including the code, class name, attributes and methods. The required statement text is stored in the txt file that contain the requirement statement. This file is shown in Figure 4.
The text in the class diagram is also stored in a txt formatted file. From the list of classes that have been provided previously, the text is separated based on the code, class name, attributes, and methods. This is shown in Figure 5. The following illustrates how the preprocessing was carried out on (F01) "Patron or Library can manage account".  Catalog To map the requirement into the class, first a matrix of m×n is created. The m denotes the number of requirements, while the n denotes the number of class. Any information of a class, such as ID, names, attributes, and methods should be mapped against the existing functionalities of the library system. The similarity value of each text in the class diagram information should be mapped to the text on the requirement list. Table 4 shows an ilustration on how the mapping between a class (CO1) and a requirements statement (F01) is done.
The value of word similarity from each column (text of requirement) and row (text in class) was obtained using Wu-Palmer's word similarity method. Since the method relies on Wordnet Thesaurus, the method would only return valid values on word pairs that are the same word type (part of speech). Therefore, for word pairs that are different word type, our solution used Levenshtein Distance as word similarity method. The similarity between the requirements statement (F01) and the class (C01) was obtained using Greedy Algorithm

Text in functionality and class
Tokenization Convert text data to lowercase Eliminate numbers, symbols and space Stemming Text to be processed Start Finish [11]. The algorithm start by selecting a cell with the highest value, i.e. cell of 'publication-library'-pair. The rest of cells of the same column and rows are removed. If there are still cells exist, the process is repeated. If no more cell left to be selected, the process stops. Given Table 4, the grayed cells are the best set of cells with the highest possible values according to the algorithm. The result of similarity is shown by Equation 1.  The similarity value of all requirements-class pairs are stored in into a matrix as shown in Table 5. The next step is determining which pairs are considered correct pair, i.e. the class realizes the requirements. To determine the correct pairs, this method uses a threshold. Any pair that has similarity value higher than the threshold should be considered correct pair. In this experiment, the value of the threshold was defined based on expert judgement, i.e. 0.40. As shown in Table 5, cells marked bold are considered correct pairs. For instance, C01 is considered realizing requirements F03 and F04. The same interpretation applies on the rest of bolded cells.  Table 5, the method produces Table 6. This table describes all requirements with its respected implementation classes. A check mark (√) denotes the a requirement was implemented by a specific classes. One requirement statement may be realized by one or more classes. One class may realize one or more requirements. F01 is implemented by C03, C04, C05, C06, C07 and C10. F02 is implemented by C08 and C09. F03 is implemented by C01, C05, C06, C07. F04 is implemented by C01, C05 and C07. According to the experimentation, a class may have no correct pair with any requirements, as well as a requirement may have no correct pair with any class. This may happen due to the following two situations. First, the designer missed a requirements statement. Second, the requirements engineer failed to identify a necessary feature during the requirements specification process. c) Generate the dependencies based on class dependencies The next step is mapping the source class (source) into the destination class. The relation between the source class and the destination class is taken from the class diagram. The mapping results of each class toward to the other classes shown in Table 7 should be mapped again to the available functionality in the system. Table 7 shows the dependency in the class diagram. There are a number of dependencies of class diagram, i.e. s, c, h, i, u. The s stands for specializes, h stands for has (strong aggregation), c stands for contain (weak aggregation), u stands for uses, i stands for implements, and d stands for dependency. For example, the relation between C02 and CO1 is specialization, the relation between C03 and C01 is weak aggregation, the relation between C05 and C08 is strong aggregation, and the relation between C07 and C09 is dependency. d) Generate the requirement dependency model.
After getting the result of class relations from the class diagram, the destination class should be mapped to the requirements statement list based on the class dependencies. Table 8 represents mapping the dependency between one functionality and other functionality. For instance F01 has a strong aggregation with F02. F01 correlates weak aggregation with F03 and F04. F03 and F04 have the same relation to F01, that is weak aggregation and uses. F03 and F04 have the same relation to F02, which is strong aggregation and uses. Table 8 shows that the relation between functionalities based on class dependencies. For example: From the table, it is known that the relation F01 to F02 is "h" (strong aggregation). The strong aggregation relationship is derived from the following steps: 1. From Table 6 it is known that F1 is implemented by C03, C04, C05, C06, C07 and C10 or F1 = {C03, C04, C05, C06, C07, C10}, 2. One of the functionalities used is F01 implemented by C05 (see step 1). Then in Table 7 it is known  that C05 has "c / weak aggregation" relation to C02, C04, C08, 3. In Table 6 it is known that C02 is not implemented by any functionality, C04 is implemented by functionality 1 (F01), C08 is implemented by Functionality 2 (F02). It denotes F01 has "h (strong aggregation)"relation to F02.
Detail description of Table 8 is presented in Table 9. This table represents the dependencies between the requirements obtained based on the inter-class dependencies on the class diagram. The weak aggregation relationship is not included in Table 9 because there is no pair definition about that relation previously.
Furthermore, the type of dependency used in this research were adopted from Dahlstedt (2001). It describes several dependency types within requirements. Part of those dependencies are described in Table 10.
After analyzing those dependencies between requirements [12] and diagram class, a number of dependencies were considered relevant with the respected case, i.e. class diagram dependencies. The relevant types are: and, requires, and temporal. The detail of that pair of requirement and class diagram dependency is described in Table 11.
Given the result from Table 6 and 7, the requirements dependency can be derived based on the pre-defined mapping as shown in Table 11. The results of dependency mapping requirements based on class diagram dependencies can be seen in Table 12.

III. RESULTS AND DISCUSSION
The results of the requirements dependencies can be represented as a graph of requirement dependency model. Figure 6 shows the requirements dependency graph of the library system. The dependency graph consists of source and destination requirements. The graph shows dependency model between requirements which is formed in library system case study. The dependency model obtained from the previous figure was visualized as a graph. The graph consists of a node of origin, destination and direction. Node represents requirements statement, the directed line represents the relation between source and destination requirements statements. Then, Table 13 shows that propose method is the smallest value than the others experts. Proposed method has 0.37. The higher value is from the third expert, which agreement value is 0.82.  F04 requires, temporal F02 Figure 6. Graph of dependency model requirements The purpose of the small scale experimentation was to answer whether the proposed method was as reliable as an expert in creating requirements dependency graph given a set of project artifacts, i.e. requirements statements and class diagram. In this research, the questionnaire was disseminated to three experts. These experts served as annotators. They annotated every pair of requirements and classes that were considered as implementation class of a respected requirements statement. In addition, annotators also annotated interrelated pairs of requirement with their dependency types. These experts have at least working experience in software requirement engineering or course teaching related to software engineering.
The reliability of the proposed method is measured by calculating the level of agreement between the method and the experts. This level of agreement calculation was based on the kappa statistic method, which is Gwet's AC1. The method was treated as one of the experts whose answers would be compared against the other human experts. The result shows that the method has moderate level of agreement with the three human experts. The reason is because the expert were able to identify more dependencies between requirements. This may be due to the fact that the expert has implicit knowledge with respect to the domain problem. This implicit knowledge is unknown to the method.

IV. CONCLUSION
Proposed method can identify a number of dependency types between requirements. Although the method was in fair agreement level of agreement with the human expert, where Gwet's Ac1 is 0.37. This is because the method only used explicit knowledge, i.e. requirements statements and class diagram, of the respected project. Further work would be involving more artifact within the software project. These artifacts, i.e. use case diagram, sequence diagram, component diagram, etc., may provide additional dependency information that can be used by the method to identify different type of requirements dependency.